Skip to main content

Overview

Graph Feature Service (GFS) is a distributed system that provides various graph-based features for pairs of users. It answers questions about relationships and interactions between a source user and candidate user to power personalized recommendations.

What It Does

Given a source user A and candidate user C, GFS can answer:

Follow Graph Features

How many of A’s followings are following C?

Engagement Features

How many of A’s followings have favorited C’s tweets?

Similarity Features

How similar is C to users that A has favorited?

Interaction Features

What is the interaction history between A and C?

How It Works

Feature Computation

GFS computes features by analyzing the graph structure and interaction patterns:
Source User A → Candidate User C

Features:
- mutual_follows: users who follow both A and C
- follower_favorited: A's followers who favorited C's tweets
- following_following: A's followings who follow C
- similarity_score: embedding similarity between A and C
- interaction_count: direct interactions between A and C

Distributed Architecture

GFS is built as a distributed system to handle high query volumes:
1

Query Reception

Receives requests for (source_user, candidate_user) pairs
2

Graph Traversal

Traverses follow and interaction graphs to compute features
3

Feature Aggregation

Aggregates counts, scores, and metrics across graph edges
4

Response

Returns computed features for downstream ranking models

Example Features

Mutual Follows
Count of users who follow both A and C
Following Overlap
|A.following ∩ C.following| / |A.following|
Follower Overlap
|A.followers ∩ C.followers| / |A.followers|

Where It’s Used

Ranking Models

GFS features are critical inputs to ranking models across X:
Uses graph features to score tweet candidates based on social proof and user similarity
Ranks account recommendations using mutual follows and engagement overlap
Incorporates graph features to determine which notifications to send
Personalizes search results using graph-based relevance features

Candidate Generation

Some candidate sources use GFS features for filtering:
  • Social Proof Filtering: Only show tweets if enough of user’s followings engaged
  • Similarity Thresholding: Filter out candidates below minimum similarity score

Performance Characteristics

GFS is optimized for low-latency, high-throughput feature serving to support real-time ranking.
Key Metrics:
  • Latency: Sub-millisecond p50, single-digit milliseconds p99
  • Throughput: Handles millions of requests per second
  • Feature Count: Returns dozens of features per user pair
  • Cache Hit Rate: High cache hit rate for frequently queried users

Architecture

Location: graph-feature-service/

Components

  1. Graph Storage: In-memory or distributed graph representation
  2. Feature Extractors: Specialized modules for different feature types
  3. Aggregators: Efficiently compute counts and similarities
  4. Caching Layer: Cache frequently accessed features
  5. API Server: RESTful or Thrift API for feature requests

Data Sources

GFS consumes data from:
  • Follow Graph: User follow relationships
  • Real Graph: Interaction predictions and aggregated engagements
  • Engagement Events: Favorites, retweets, clicks from UUA
  • Embeddings: SimClusters and TwHIN from Representation Manager
GFS acts as a bridge between raw graph data and machine learning models, providing pre-computed features at serving time.

Build docs developers (and LLMs) love