The keywords newsfeed and timeline are used interchangeably. Some of the similar services to Social media newsfeeds are the following:
- Facebook newsfeed
- Twitter Timeline
- Instagram feed
- Google podcast feed
- Google news timeline
- Etsy feed
- Feedly
- Reddit feed
- Medium feed
- Quora feed
Requirements
- The user newsfeed must be generated in near real-time based on the feed activity from the people that a user follows
- The feed items contain text and media files (images, videos)
Data storage
Database schema
- The primary entities of the database are the Users table, the FeedItems table, and the Follows table
- The relationship between the Users and the FeedItems tables is 1-to-many
- The relationship between the Users and the Follows tables is many-to-many
- The Follows is a join table to represent the relationship (follower-followee) between the users
Type of data store
- The media files (images, videos) are stored in a managed object storage such as AWS S3
- A SQL database such as Postgres stores the metadata of the user (followers, personal data)
- A NoSQL data store such as Cassandra stores the user timeline
- A cache server such as Redis stores the pre-generated timeline of a user
High-level design
- The server stores the feed items in cache servers and the NoSQL store
- The newsfeed generated is stored on the cache server
- There is no feed publishing for inactive users but uses a pull model (fanout-on-load)
- The feed publishing for active non-celebrity users is based on a push model (fanout-on-write)
- The feed publishing for celebrity users is based on a hybrid push-pull model
- The client fetches the newsfeed from the cache servers
Write path
- The client creates an HTTP connection to the load balancer to create a feed item
- The load balancer delegates the client connection to a web server with free capacity
- The write requests to create feed items are rate limited
- The feed item is stored on the message queue for asynchronous processing and the client receives an immediate response
- The fanout service distributes the feed item to multiple services to generate the newsfeed for followers of the client
- The object store persists the video or image files embedded in the feed item
- The NoSQL store persists the timeline of users (feed items in reverse chronological order)
- The SQL database stores the metadata of the users (user relationships) and the feed items
- A limited number of feed items for users with a certain threshold of followers are stored on the cache server
- The IDs of feed items are stored on the user timeline cache server for deduplication
- The feed generation service subscribes to the fanout service for any updates
- The feed generation service queries the in-memory user info service to identify the followers of a user and the category of a user (active non-celebrity users, inactive, celebrity users)
- The feed generation service creates the home timeline for active non-celebrity users using a push model (fanout-on-write) in linear time O(n), where n is the number of followers
- The feed items are ranked, sorted, and merged to generate the home timeline for a user
- The home timeline for active users is stored on the cache server for quick lookups
- There is no feed publishing for inactive users but uses a pull model (fanout-on-load)
- The feed publishing for celebrity users is based on a hybrid push-pull model (merge celebrity feed items to the home timeline of a user on demand)
- As an alternative, the feed publishing for celebrity users can use a push model only for the online followers in batches (not optimal solution)
Read path
- The client executes a DNS query to resolve the domain name
- The client queries the CDN to check if the feed items for the home timeline are cached on the CDN
- The client creates an HTTP connection to the load balancer
- The load balancer delegates the client connection to a web server with free capacity
- The read requests to fetch the newsfeed are rate limited
- The web server queries the timeline service to fetch the newsfeed
- The timeline service queries the user info service to get the list of followee and identify the category of the user (active, inactive, following celebrities)
- The home timeline cache is queried to fetch the list of feed item IDs
- The feed items are fetched from the feed items cache server by executing an MGET operation on Redis
- When the client executes a request to fetch the timeline of another user, the timeline service queries the user timeline cache server
- The SQL database follower (replica) is queried on a cache miss
- The media files embedded on feed items are fetched from the object store
- The NoSQL data store is queried to fetch the user timeline on a cache miss
- The inactive users fetch the home timeline using a pull model (fanout-on-load)
- The active users following celebrity users use a hybrid model to fetch the home timeline (the feed items from celebrities are merged on demand)
References
- Raffi Krikorian, Timelines at Scale, infoq.com
- How feed works, facebook.com
Featured image source.
Also published here.