Skip to main content

Overview

The recommendation system is a two-stage pipeline: a lightweight retrieval model narrows the full catalogue to 500 candidates, then a ranking model orders those candidates using rich per-user feature vectors. The full pipeline must complete within a 150 ms budget per request. Features are served from Redis (online, low-latency) and updated from Kafka (near-real-time event stream).
Architecture note: The two-tower neural network was chosen over simpler alternatives specifically for cold-start handling and catalogue size. At 3 million+ content items, brute-force similarity search is not viable at inference time. See ADR-006 for the full decision record.

Behavioural Signal Collection

All engagement events are produced by the Engagement Service to the user.engagement.events Kafka topic. The Event Collector service consumes this topic and writes enriched events to the Feature Store’s offline store (Parquet on object storage) and increments online feature counters in Redis.
EventContent TypesSignal Weight
media_completed (>90% consumed)Video + AudioStrongest positive
media_replayVideo + AudioStrong positive
media_likedVideo + AudioStrong positive
media_sharedVideo + AudioStrong positive
media_added_to_playlistVideo + AudioStrong affinity
media_started (any consumption)Video + AudioWeak positive
media_paused_early (under 30% consumed)Video + AudioMild negative
media_dislikedVideo + AudioModerate negative
media_skipped (auto-play skipped)Video + AudioModerate negative
subscription_to_creatorVideo + AudioStrong affinity

Feature Store Design

The Feature Store (Feast or Tecton) maintains two feature classes: Online features (Redis — low latency, served at inference):
Feature GroupFeatures
Userrecent_media_7d, top_categories_7d, avg_session_duration, preferred_language, preferred_content_type (VIDEO/AUDIO/MIXED), subscriber_count
Contentplay_count_24h, like_ratio, completion_rate, category, duration, content_type, creator_subscriber_count, publish_age_hours
Trendingtrending_score = views_1h × recency_weight × engagement_multiplier — refreshed every 15 minutes
Offline features (Parquet on object storage — used for training):
Feature GroupFeatures
User-Content InteractionHistorical watch sequences, long-term category affinity, creator affinity scores

Model Architecture

Stage 1: Retrieval

A two-tower neural network (user embedding tower + content embedding tower) trained offline retrieves the top 500 candidate items from the full catalogue.
  • User embedding computed online at inference time (sub-10 ms via Feature Store lookup)
  • Content embeddings pre-computed offline and indexed in an ANN index (FAISS or ScaNN)
  • content_type is a feature in the content tower — a single retrieval model handles video and audio without separate models
  • New content enters the ANN index 10–30 minutes after publish (embedding computation lag)

Stage 2: Ranking

A deep learning ranking model (DLRM or similar) scores the 500 retrieval candidates against the user’s feature vector.
  • Applies a preferred_content_type signal: users who predominantly consume audio see audio ranked higher
  • A configurable diversity parameter modulates cross-type discovery — surfacing audio to video-first users and vice versa
  • Total inference budget for both stages: < 100 ms

Inference Pipeline Flow

1

Home feed request

Client calls GET /api/v1/content/feed. Content Service receives the request and calls the Model Server (Triton / TorchServe) via gRPC.
2

Online feature lookup

Model Server fetches the user’s online feature vector from the Feature Store (Redis). Feature lookup target: < 5 ms.
3

Retrieval — ANN search

User embedding is computed. ANN search across the pre-indexed content embedding space returns the top 500 candidate IDs. Target: < 10 ms.
4

Ranking

DLRM model scores all 500 candidates using the combined user + content feature vectors. Returns a ranked list. Target: < 80 ms.
5

Feed response

Content Service fetches metadata for the top N ranked content IDs from Postgres/Redis cache and returns the personalised feed to the client.

Kafka Topics Used

TopicDirectionPurpose
user.engagement.eventsConsumedBehavioural signals for feature updates and training data
media.publishedConsumedNew content added to embedding index pipeline

Model Deployment and A/B Testing

1

Training

Offline training runs on a Spark + Ray cluster on a daily or weekly schedule. Training ingests the offline feature store (Parquet). Models are versioned in MLflow.
2

Canary deployment

New model version deployed to a canary pod set receiving 5% of traffic. Evaluated against the incumbent on CTR, completion rate, and session depth metrics for 48 hours.
3

Significance gate

Statistical significance thresholds (p < 0.05, minimum detectable effect) must be met before promotion. Automated by the A/B testing framework.
4

Bias audit

Monthly bias audit on model outputs. The model must not exhibit statistically significant content or creator bias by demographic or language group. Audit failure blocks promotion.
5

Full promotion

Model promoted to 100% traffic. Previous version retained for rollback.

Failure Handling

FailureBehaviour
Model Server unavailableContent Service falls back to the trending content list. Graceful degradation per Principle 11.
Feature Store unavailableModel Server degrades to cached features (last known values). Recommendation quality degrades but requests succeed.
ANN index staleNew content published in the last 10–30 minutes is not yet in the index. This is expected behaviour — new content appears in trending and notification feeds before it enters personalised recommendations.

Build docs developers (and LLMs) love