Skip to main content

System architecture

X’s Recommendation Algorithm is built on a shared set of data sources, machine learning models, and software frameworks that power multiple product surfaces. This architecture enables code reuse, consistent quality, and rapid iteration across different recommendation experiences.

Architecture overview

Product surfaces at X are built on three core layers:
  1. Data Layer - Real-time user actions, post metadata, and user signals
  2. Model Layer - Graph embeddings, ranking models, and content understanding
  3. Service Layer - Candidate generation, ranking, filtering, and serving
This modular architecture allows different product surfaces to leverage shared components while customizing for their specific use cases.

For You Timeline architecture

The diagram below illustrates how major services and jobs interconnect to construct a For You Timeline: System Architecture Diagram The For You Timeline represents the most complex product surface, utilizing nearly all components in the recommendation system.

Data components

The data layer provides foundational signals and storage for the recommendation system.
ComponentDescriptionLocation
TweetypieCore service that handles the reading and writing of post datatweetypie/server/
Unified User ActionsReal-time stream of user actions on Xunified_user_actions/
User Signal ServiceCentralized platform to retrieve explicit (likes, replies) and implicit (profile visits, tweet clicks) user signalsuser-signal-service/

Tweetypie

Tweetypie is the core tweet service that manages all tweet data operations. It provides:
  • Tweet creation, reading, and mutation APIs
  • Hydration of tweet metadata and features
  • Denormalization of tweet data for efficient serving
  • Caching and storage optimization
Reference: tweetypie/server/README.md

Unified User Actions

Provides a real-time stream of all user actions across X, including:
  • Favorites, retweets, replies, quotes
  • Follows, unfollows, mutes, blocks
  • Clicks, video views, profile visits
  • Notification opens and tab clicks
This stream feeds into multiple downstream systems for real-time personalization. Reference: unified_user_actions/README.md

User Signal Service

Centralizes retrieval of user signals used across recommendation systems:
  • Explicit signals - Direct user actions (likes, follows, bookmarks)
  • Implicit signals - Behavioral data (clicks, dwell time, video views)
  • Aggregated and filtered for privacy and quality
Reference: user-signal-service/README.md

Model components

The model layer includes graph-based algorithms, embeddings, and neural networks for understanding users and content.
ComponentDescriptionLocation
SimClustersCommunity detection and sparse embeddings into those communitiessrc/scala/com/twitter/simclusters_v2/
TwHINDense knowledge graph embeddings for Users and Poststhe-algorithm-ml
Trust and Safety ModelsModels for detecting NSFW or abusive contenttrust_and_safety_models/
Real GraphModel to predict the likelihood of an X User interacting with another Usersrc/scala/com/twitter/interaction_graph/
TweepCredPage-Rank algorithm for calculating X User reputationsrc/scala/com/twitter/graph/batch/job/tweepcred/
Recos InjectorStreaming event processor for building input streams for GraphJet based servicesrecos-injector/
Graph Feature ServiceServes graph features for a directed pair of usersgraph-feature-service/
Topic Social ProofIdentifies topics related to individual poststopic-social-proof/
Representation ScorerCompute scores between pairs of entities using embedding similarityrepresentation-scorer/

SimClusters

SimClusters is a general-purpose representation layer based on overlapping communities. It provides:
  • KnownFor - Which communities a producer (account) is known for
  • InterestedIn - Which communities a consumer (user) is interested in
  • Tweet embeddings - Community representation of tweets based on favs
  • Topic embeddings - Community representation of topics
SimClusters covers the top 20M producers and ~145K communities, enabling:
  • Consumer-based tweet recommendations
  • Producer-based tweet recommendations
  • Tweet similarity calculations
  • Topic-based content discovery
Reference: src/scala/com/twitter/simclusters_v2/README.md
SimClusters was published at KDD 2020. Read the research paper for technical details.

TwHIN

Twitter Heterogeneous Information Network (TwHIN) provides dense graph embeddings learned from the full user-tweet interaction graph. Unlike SimClusters’ sparse community-based embeddings, TwHIN creates dense vector representations that capture fine-grained relationships.

Real Graph

Predicts the probability that one user will interact with another user, used for:
  • Follow recommendations
  • Out-of-network content discovery
  • Social graph understanding
Reference: src/scala/com/twitter/interaction_graph/README.md

Software frameworks

The service layer provides frameworks for building, serving, and monitoring recommendation systems.
ComponentDescriptionLocation
NaviHigh performance, machine learning model serving written in Rustnavi/
Product MixerSoftware framework for building feeds of contentproduct-mixer/
Timelines Aggregation FrameworkFramework for generating aggregate features in batch or real timetimelines/data_processing/ml_util/aggregation_framework/
Representation ManagerService to retrieve embeddings (SimClusters and TwHIN)representation-manager/
TWMLLegacy machine learning framework built on TensorFlow v1twml/

Product Mixer

Product Mixer is the core framework for building recommendation products. It provides:
  • Pipelines - Structured execution flow (Product → Mixer → Candidate → Scoring)
  • Components - Reusable building blocks for candidate sources, filters, scorers
  • Composition - Mix heterogeneous content (tweets, ads, users)
  • Monitoring - Built-in observability and debugging
All modern recommendation surfaces (For You, Following, Notifications) are built on Product Mixer. Reference: product-mixer/README.md High-performance model serving infrastructure written in Rust:
  • Serves TensorFlow, PyTorch, and ONNX models
  • Optimized for low latency and high throughput
  • Powers real-time ranking in the recommendation pipeline
Reference: navi/README.md

For You Timeline components

The For You Timeline uses specialized components for each stage of the recommendation pipeline.

Candidate sources

ComponentDescriptionContribution
Search Index (Earlybird)Find and rank In-Network posts~50% of posts
Tweet MixerCoordination layer for fetching Out-of-Network tweet candidatesVariable
User Tweet Entity Graph (UTEG)Maintains an in-memory User to Post interaction graph, finds candidates via graph traversalsSignificant
Follow Recommendation Service (FRS)Provides recommendations for accounts to follow and posts from those accountsSupplementary

Search Index (Earlybird)

Earlybird is X’s real-time search engine, providing:
  • Inverted index of recent tweets
  • In-network tweet retrieval
  • Light Ranker scoring for initial ranking
  • Powers ~50% of For You Timeline content
Reference: src/java/com/twitter/search/README.md

User Tweet Entity Graph (UTEG)

Built on the GraphJet framework, UTEG maintains an in-memory graph of user-tweet interactions:
  • Real-time updates from user actions
  • Graph traversal for candidate generation
  • Supports multiple edge types (favorite, retweet, reply)
  • Enables collaborative filtering at scale
Reference: src/scala/com/twitter/recos/user_tweet_entity_graph/README.md

Ranking components

ComponentDescriptionLocation
Light RankerLight Ranker model used by search index (Earlybird) to rank postssrc/python/twitter/deepbird/projects/timelines/scripts/models/earlybird/
Heavy RankerNeural network for ranking candidate posts. Main signal for selecting timeline poststhe-algorithm-ml

Heavy Ranker

The Heavy Ranker is a deep neural network that:
  • Uses approximately 6,000 features per tweet
  • Predicts multiple engagement types (like, retweet, reply, etc.)
  • Multi-task learning to optimize for various objectives
  • Primary determinant of final tweet ranking

Mixing and filtering

ComponentDescriptionLocation
Home MixerMain service to construct and serve the Home Timeline. Built on Product Mixerhome-mixer/
Visibility FiltersFilters content for legal compliance, product quality, user trust, and revenue protectionvisibilitylib/
Timeline RankerLegacy service providing relevance-scored posts from Earlybird and UTEGtimelineranker/

Home Mixer

Home Mixer orchestrates the entire For You Timeline construction:
  1. Fetches candidates from multiple sources in parallel
  2. Hydrates features for ranking
  3. Applies Heavy Ranker scoring
  4. Filters and applies heuristics (diversity, balance, feedback)
  5. Mixes tweets with ads, who-to-follow modules, prompts
  6. Adds product features (conversation modules, social context)
Reference: home-mixer/README.md

Visibility Filters

Ensures content safety and quality through:
  • Hard filtering (blocked, muted authors)
  • Legal compliance (DMCA, country-specific restrictions)
  • NSFW content filtering based on user settings
  • Abusive content detection
  • Coarse-grained downranking for quality
Reference: visibilitylib/README.md Recommended Notifications use a similar but specialized architecture:
ComponentDescriptionLocation
PushServiceMain recommendation service for surfacing recommendations via notificationspushservice/
PushService Light RankerPre-selects highly-relevant candidates from initial poolpushservice/src/main/python/models/light_ranking/
PushService Heavy RankerMulti-task learning model predicting open and engagement probabilitiespushservice/src/main/python/models/heavy_ranking/
Reference: pushservice/README.md

Data flow

The typical data flow through the system:
1

User action

User performs an action (like, retweet, click) on X
2

Unified User Actions

Action is captured in real-time stream
3

Model updates

Streaming jobs update graph structures (UTEG) and embeddings (SimClusters tweets)
4

Candidate generation

User requests timeline → Multiple candidate sources generate candidates in parallel
5

Feature hydration

Candidates enriched with ~6,000 features from various services
6

Ranking

Heavy Ranker scores all candidates using neural network
7

Filtering & mixing

Apply filters, heuristics, and mix with ads/modules
8

Serving

Return final timeline to client with social context and metadata
Many components operate in real-time with strict latency requirements (typically under 1 second for timeline requests).

Scalability

The architecture handles massive scale:
  • ~1 billion tweets evaluated down to thousands of candidates
  • ~145K communities in SimClusters covering 20M producers
  • Real-time updates to graphs and embeddings
  • Billions of requests daily across product surfaces

Next steps

How it works

Learn how these components work together in the recommendation pipeline

Core services

Deep dive into individual services and their APIs

Build docs developers (and LLMs) love