We asked our partners to translate the article – the translation and localization agency Workogram.
When it comes to the news feed algorithm, many theories and myths come to mind. Most people understand that there is a certain algorithm at work, and many know some factors that affect its functioning (do you like the message, are you interested in it, etc.). But there are still many other questions.
We publicly share the details and features of the news feed. But behind the scenes is the incredibly complex and multi-level machine learning ranking system that powers the news feed. We’re sharing new insights into how our ranking system works, the challenges of building a content personalization system for over 2 billion people and displaying content that is relevant and meaningful to them every time they visit Facebook.
What’s so hard about that?
First, a huge amount. More than 2 billion people worldwide use Facebook. Each of them has over a thousand potential posts (or posts that could theoretically appear in their feed). Now we are talking about trillions of publications of all people on the Facebook network.
Note that there are thousands of signals for every Facebook user that need to be assessed to determine what might be most relevant to that person. These are trillions of messages and thousands of signals – and we need to instantly predict what each of these people will want to see in their feed. When you open Facebook, the process takes place in the background for just a second or so when your news feed needs to load.
And as soon as it all works, the situation may change. We need to take into account the emergence of new issues, such as clickbait and the spread of false information. When this happens, we need to look for new solutions.
In fact, the ranking system is not the only algorithm. We use multi-level machine learning and ranking models to predict the most relevant and meaningful content for each user. As we progress through each stage, the ranking system narrows down those thousands of potential posts to a few hundred that appear in someone’s news feed at some point.
How does it work?
The system determines which posts appear in your news feed and in what order, predicting what will interest you. These predictions are based on various factors: what and who you followed, what you liked, or who have you talked to recently, etc.
To understand how this works in practice, let’s take an example of what happens to a user who logs into Facebook. Let’s call him Juan.
After Huang logged in yesterday, his friend Wei posted a picture of his cocker spaniel. And Saanvi’s friend posted a video of her morning run. His favorite page has posted an interesting article on how to better view the Milky Way at night, while his favorite culinary group has posted 4 new sourdough recipes.
All of this content might interest Juan because he himself chose to follow certain people or pages. To decide which of these should come up higher in Juan’s news feed, we need to predict what is most important to him, what content is of the most value to him. Mathematically, we have to define an objective function for Juan and perform one-way optimization.
We can use post characteristics (such as when a photo was tagged and when it was posted) to predict if Juan will like it. For example, if Juan often reacts to Saanvi’s posts (shares or comments), and her jogging video appeared quite recently, then there is a high probability that Juan will like the post. While Huang had previously reacted more to video than to photography, interest in photographing Wei with his cocker spaniel could be quite low. In this case, our ranking algorithm would put Saanvi’s jogging video higher than Wei’s dog photo because it predicts a higher likelihood that Juan will like it.
But likes are not the only way people express their preferences on Facebook. Every day, people share articles they find interesting, watch videos of ordinary people or celebrities they follow, or leave comments on their friends’ posts.
From a mathematical point of view, things get complicated when we need to optimize our work to achieve several goals that make up our main goal: to create the most long-term value for people by showing them content that is meaningful and relevant to them .
Various machine learning models give Juan multiple predictions: the likelihood that he will be interested in Wei’s photo, Saanvi’s video, an article on the Milky Way, or sourdough recipes. Each model tries to rank these pieces of content for Juan. Sometimes they disagree – there is a high likelihood that Juan will like the video from Saanvi’s run more than the article from the Milky Way, but he would rather comment on the article than the video.
Therefore, we need to find a way to combine these different forecasts into one result, optimized for our ultimate goal of long-term value .
How can you tell if something is of long-term value to a person?
We just ask them. For example, we interview people to find out how important communication with friends turned out to be for them, whether the publication was worth the time spent. So our system will be able to display what people say they like, what they think is significant. We can then take into account each prediction for Juan based on the actions people think (through polls) are more meaningful and worth their time.
Editorial. The “SMM specialist” course is designed for those who want to master a new profession of an SMM specialist. You will be able to independently create a plan for publishing content, learn how to launch advertising campaigns and analyze their results. Bring leads and sales from social networks to customers.
Step by step instructions
In order to rank over a thousand posts per user per day for over 2 billion people in real time, we need to make this process more efficient. We are doing this in stages. Each stage is strategically organized to speed up the process and limit the amount of required computing resources.
The system collects all potential posts that we can rank for Juan (cocker spaniel photo, jogging video, etc.). This list includes all posts shared with Juan by a friend, group, or page he is associated with, that have been made since his last login and have not been deleted.
But how should we handle posts created before Juan’s last login that he hasn’t seen yet?
In order to make sure that unseen posts were viewed, we apply unread bumping logic: Recent posts ranked for (but not noticed by) Juan in previous sessions are added to the list of acceptable posts for this session. We also apply action-response logic . That is, all the posts that Juan has already seen and that have since caused an interesting conversation between his friends are also added to the list of acceptable for this session.
Next, the system should rate each post based on various factors such as the type of post, the similarity to other posts, and how well the post matches what Juan is usually interested in. To calculate this for over 1000 publications, for each of the billions of users, all in real time, we run these models in parallel for all potential stories across multiple devices, called predictors.
Before we combine all of these predictions into one result, there are some rules to consider. We are waiting until we get these first predictions to reduce the number of posts to rank and apply them in multiple steps to save computing power.
Certain integrity principles apply to each post. These are intended to clarify what integrity measures should be applied to the content selected for ranking. In the next step, a simple model reduces the number of potential publications to about 500 that are most relevant to Juan. Ranking less material allows us to use more powerful neural network models in the next steps.
Next comes the main selection stage, where most of the personalization is done. Here the score of each story is calculated separately, and then all 500 publications are arranged in order according to the results obtained. In some cases, the rating for likes can be higher than for comments, as some people like to express their preferences more through likes than through comments. Any action in which a person rarely takes part (for example, the forecast for likes is almost zero), automatically gets the minimum role in the rating , since the predicted value is very low.
Finally, we move on to the contextual stage, which adds features such as content type diversity rules . This is to make sure that Juan’s feed has a good mix of content types and doesn’t see multiple video posts one after the other.
All of these ranking steps take place in the length of time it takes for Huang to open the Facebook app. Within seconds, he receives a news feed result that he can view and enjoy.
Other contributors to this article are Meihong Wang, CTO, Tak Yang, Production Director.
If you find an error, please select a piece of text and press Ctrl + Enter