Overview
TwHIN (Twitter Heterogeneous Information Network) creates dense knowledge graph embeddings for users and tweets. Unlike SimClusters which produces sparse community-based embeddings, TwHIN generates dense vector representations by learning from the heterogeneous interaction graph on X.For detailed information about TwHIN, see the TwHIN project documentation in the algorithm-ml repository.
How It Works
Heterogeneous Graph Learning
TwHIN models X as a heterogeneous information network containing:- Users: Account entities
- Tweets: Post content
- Interactions: Follows, favorites, retweets, replies, and other engagement types
Dense Embeddings
Unlike SimClusters’ sparse community vectors, TwHIN produces dense embeddings:Dense embeddings can capture more nuanced relationships but require more computation and storage compared to sparse embeddings.
Key Characteristics
Dense Representations
Dense Representations
All dimensions have non-zero values, capturing rich latent features from the graph structure.
Heterogeneous Learning
Heterogeneous Learning
Learns from multiple entity types (users, tweets) and relationship types (follows, favorites, etc.) simultaneously.
Knowledge Graph Approach
Knowledge Graph Approach
Treats X as a knowledge graph where embeddings preserve structural and semantic relationships.
Complementary to SimClusters
Complementary to SimClusters
Works alongside SimClusters to provide both sparse interpretable and dense expressive representations.
Where It’s Used
TwHIN embeddings are used across X’s recommendation systems:Tweet Recommendations
Powers candidate generation and ranking in For You timeline
User Recommendations
Suggests accounts to follow based on embedding similarity
Content Understanding
Represents semantic meaning of tweets and user interests
Similar Content Discovery
Finds related tweets and accounts using dense vector similarity
Architecture Integration
Representation Manager
TwHIN embeddings are served via the Representation Manager service, which:- Stores pre-computed embeddings
- Provides fast retrieval APIs
- Handles both SimClusters and TwHIN embeddings
Representation Scorer
The Representation Scorer uses TwHIN embeddings to:- Compute similarity scores between entities
- Rank candidates based on embedding distance
- Combine with other signals for final recommendations
Comparison with SimClusters
| Aspect | SimClusters | TwHIN |
|---|---|---|
| Vector Type | Sparse (145K dims, ~5-10 non-zero) | Dense (all dims non-zero) |
| Interpretability | High (community-based) | Lower (latent features) |
| Computation | Fast (sparse operations) | Slower (dense operations) |
| Expressiveness | Good for community patterns | Better for nuanced relationships |
| Use Case | Community-based recommendations | Semantic similarity matching |
Training and Updates
For information about:- Model architecture
- Training procedures
- Update frequency
- Performance characteristics
Related Components
- SimClusters - Sparse community-based embeddings
- Representation Manager - Embedding storage and retrieval service (
representation-manager/) - Representation Scorer - Similarity scoring using embeddings (
representation-scorer/)