Skip to main content

Overview

The candidate sourcing stage within the Twitter Recommendation algorithm serves to significantly narrow down the item size from approximately 1 billion tweets to just a few thousand candidates. This process utilizes Twitter user behavior as the primary input for the algorithm.
This document comprehensively enumerates all the signals during the candidate sourcing phase and how they’re used across different retrieval algorithms.

Signal Types

The following table describes all available signals used for candidate retrieval:
SignalDescription
Author FollowThe accounts which user explicitly follows
Author UnfollowThe accounts which user recently unfollows
Author MuteThe accounts which user have muted
Author BlockThe accounts which user have blocked
Tweet FavoriteThe tweets which user clicked the like button
Tweet UnfavoriteThe tweets which user clicked the unlike button
RetweetThe tweets which user retweeted
Quote TweetThe tweets which user retweeted with comments
Tweet ReplyThe tweets which user replied
Tweet ShareThe tweets which user clicked the share button
Tweet BookmarkThe tweets which user clicked the bookmark button
Tweet ClickThe tweets which user clicked and viewed the tweet detail page
Tweet Video WatchThe video tweets which user watched certain seconds or percentage
Tweet Don’t LikeThe tweets which user clicked “Not interested in this tweet” button
Tweet ReportThe tweets which user clicked “Report Tweet” button
Notification OpenThe push notification tweets which user opened
Ntab ClickThe tweets which user click on the Notifications page
User AddressBookThe author accounts identifiers of the user’s addressbook

Signal Usage by Component

Twitter uses these user signals as training labels and/or ML features in each candidate sourcing algorithm. The following table shows how they are used across different components:
Features: Used as input features for the model
Labels: Used as training objectives
Features / Labels: Used for both purposes
SignalUSSSimClustersTwHINUTEGFRSLight Ranking
Author FollowFeaturesFeatures / LabelsFeatures / LabelsFeaturesFeatures / LabelsN/A
Author UnfollowFeaturesN/AN/AN/AN/AN/A
Author MuteFeaturesN/AN/AN/AFeaturesN/A
Author BlockFeaturesN/AN/AN/AFeaturesN/A
Tweet FavoriteFeaturesFeaturesFeatures / LabelsFeaturesFeatures / LabelsFeatures / Labels
Tweet UnfavoriteFeaturesFeaturesN/AN/AN/AN/A
RetweetFeaturesN/AFeatures / LabelsFeaturesFeatures / LabelsFeatures / Labels
Quote TweetFeaturesN/AFeatures / LabelsFeaturesFeatures / LabelsFeatures / Labels
Tweet ReplyFeaturesN/AFeaturesFeaturesFeatures / LabelsFeatures
Tweet ShareFeaturesN/AN/AN/AFeaturesN/A
Tweet BookmarkFeaturesN/AN/AN/AN/AN/A
Tweet ClickFeaturesN/AN/AN/AFeaturesLabels
Tweet Video WatchFeaturesFeaturesN/AN/AN/ALabels
Tweet Don’t LikeFeaturesN/AN/AN/AN/AN/A
Tweet ReportFeaturesN/AN/AN/AN/AN/A
Notification OpenFeaturesFeaturesFeaturesN/AFeaturesN/A
Ntab ClickFeaturesFeaturesFeaturesN/AFeaturesN/A
User AddressBookN/AN/AN/AN/AFeaturesN/A

Component Overview

USS

User Signal Service
Centralizes all signals as features

SimClusters

Similarity Clusters
Uses engagement signals for clustering

TwHIN

Twitter Heterogeneous Information Network
Graph-based candidate retrieval

UTEG

User-Tweet Entity Graph
Real-time graph traversal for candidates

FRS

Follow Recommendation Service
Social graph signals for recommendations

Light Ranking

Lightweight Ranker
Fast first-stage ranking

Key Signal Patterns

Positive Engagement Signals

Signals that indicate strong user interest:
The most widely used signal across all components. Used as both features and labels in multiple systems including SimClusters, TwHIN, FRS, and Light Ranking.
Strong engagement signal used as labels in TwHIN, FRS, and Light Ranking. Indicates user wants to share content with their followers.
Similar to retweet but with user commentary. Used as labels in TwHIN, FRS, and Light Ranking.
Social graph signal used extensively for understanding user preferences. Used as both features and labels in SimClusters, TwHIN, and FRS.

Negative Signals

Signals that indicate user disinterest or spam:

Author Block/Mute

Used in USS and FRS to filter out unwanted content and authors

Tweet Don't Like

Used in USS to understand content preferences

Tweet Report

Used in USS for spam and abuse detection

Author Unfollow

Used in USS to track changing user interests

Implicit Signals

Weaker signals that provide contextual information:
  • Tweet Click: Used in FRS as features and Light Ranking as labels
  • Video Watch: Used in SimClusters and Light Ranking
  • Notification Open: Used across multiple systems for engagement tracking
Implicit signals are noisier than explicit engagement signals and require careful calibration when used as training labels.

Signal Processing Flow

Best Practices

1

Signal Selection

Choose signals based on the specific retrieval algorithm and use case. Not all signals are relevant for all components.
2

Feature vs Label

Use strong engagement signals (favorites, retweets) as labels. Use broader signals as features.
3

Signal Freshness

Recent signals are more predictive. Consider recency weighting in feature engineering.
4

Negative Signals

Don’t ignore negative signals - they’re crucial for filtering and personalization.

Unified User Actions

Source of all user action signals

User Signal Service

Centralized signal processing platform

Aggregation Framework

Computes aggregate features from signals

Build docs developers (and LLMs) love