TwHIN

Overview

TwHIN (Twitter Heterogeneous Information Network) creates dense knowledge graph embeddings for users and tweets. Unlike SimClusters which produces sparse community-based embeddings, TwHIN generates dense vector representations by learning from the heterogeneous interaction graph on X.

For detailed information about TwHIN, see the TwHIN project documentation in the algorithm-ml repository.

How It Works

Heterogeneous Graph Learning

TwHIN models X as a heterogeneous information network containing:

Users: Account entities
Tweets: Post content
Interactions: Follows, favorites, retweets, replies, and other engagement types

The model learns dense embeddings by capturing the structure and relationships within this heterogeneous graph.

Dense Embeddings

Unlike SimClusters’ sparse community vectors, TwHIN produces dense embeddings:

User A: [0, 0, 0.8, 0, 0.2, 0, 0, ...] (145K dims, mostly zeros)

Dense embeddings can capture more nuanced relationships but require more computation and storage compared to sparse embeddings.

Key Characteristics

Dense Representations

All dimensions have non-zero values, capturing rich latent features from the graph structure.

Heterogeneous Learning

Learns from multiple entity types (users, tweets) and relationship types (follows, favorites, etc.) simultaneously.

Knowledge Graph Approach

Treats X as a knowledge graph where embeddings preserve structural and semantic relationships.

Complementary to SimClusters

Works alongside SimClusters to provide both sparse interpretable and dense expressive representations.

Where It’s Used

TwHIN embeddings are used across X’s recommendation systems:

Tweet Recommendations

Powers candidate generation and ranking in For You timeline

User Recommendations

Suggests accounts to follow based on embedding similarity

Content Understanding

Represents semantic meaning of tweets and user interests

Architecture Integration

Representation Manager

TwHIN embeddings are served via the Representation Manager service, which:

Stores pre-computed embeddings
Provides fast retrieval APIs
Handles both SimClusters and TwHIN embeddings

Representation Scorer

The Representation Scorer uses TwHIN embeddings to:

Compute similarity scores between entities
Rank candidates based on embedding distance
Combine with other signals for final recommendations

Comparison with SimClusters

Aspect	SimClusters	TwHIN
Vector Type	Sparse (145K dims, ~5-10 non-zero)	Dense (all dims non-zero)
Interpretability	High (community-based)	Lower (latent features)
Computation	Fast (sparse operations)	Slower (dense operations)
Expressiveness	Good for community patterns	Better for nuanced relationships
Use Case	Community-based recommendations	Semantic similarity matching

X uses both SimClusters and TwHIN embeddings in the recommendation pipeline, leveraging the strengths of each approach.

Training and Updates

For information about:

Model architecture
Training procedures
Update frequency
Performance characteristics

Refer to the TwHIN project README in the algorithm-ml repository.

SimClusters - Sparse community-based embeddings
Representation Manager - Embedding storage and retrieval service (representation-manager/)
Representation Scorer - Similarity scoring using embeddings (representation-scorer/)

Overview

Core Services

Models & Embeddings

Machine Learning

Data Pipeline

Development

Overview

How It Works

Heterogeneous Graph Learning

Dense Embeddings

Key Characteristics

Where It’s Used

Tweet Recommendations

User Recommendations

Content Understanding

Similar Content Discovery

Architecture Integration

Representation Manager

Representation Scorer

Comparison with SimClusters

Training and Updates

Build docs developers (and LLMs) love

Overview

Core Services

Models & Embeddings

Machine Learning

Data Pipeline

Development

Documentation Index

​Overview

​How It Works

​Heterogeneous Graph Learning

​Dense Embeddings

​Key Characteristics

​Where It’s Used

Tweet Recommendations

User Recommendations

Content Understanding

Similar Content Discovery

​Architecture Integration

​Representation Manager

​Representation Scorer

​Comparison with SimClusters

​Training and Updates

​Related Components

Build docs developers (and LLMs) love

Overview

How It Works

Heterogeneous Graph Learning

Dense Embeddings

Key Characteristics

Where It’s Used

Architecture Integration

Representation Manager

Representation Scorer

Comparison with SimClusters

Training and Updates

Related Components