Skip to main content

Navi: High-Performance ML Serving

Navi is a high-performance, versatile machine learning serving server implemented in Rust and tailored for production usage at scale. It’s designed to efficiently serve models within X’s tech stack, offering top-notch performance while focusing on core features.

Overview

Navi serves as X’s primary ML model serving infrastructure, handling real-time inference requests across the recommendation pipeline. Built with a minimalist design philosophy, it prioritizes ultra-high performance, stability, and availability for production workloads.

Key Features

Production-Optimized

Minimalist design delivering ultra-high performance, stability, and availability for real-world application demands

TensorFlow Compatible

gRPC API compatibility with TensorFlow Serving for seamless integration with existing clients

Multi-Runtime Support

Pluggable architecture supporting TensorFlow, ONNX Runtime, and experimental PyTorch support

Rust Performance

Built in Rust for maximum performance and memory safety in production environments

Architecture

Navi’s plugin architecture enables support for different ML runtimes while maintaining a consistent serving interface:
Client Request (gRPC)
         |
         v
   Navi Server
         |
    +---------+---------+
    |         |         |
    v         v         v
TensorFlow  ONNX    PyTorch
Runtime   Runtime   Runtime

Supported Runtimes

TensorFlow

Most Feature-Complete: Navi for TensorFlow is production-ready with full support for multiple input tensors of different types.
Supported Input Types:
  • Float tensors
  • Integer tensors
  • String tensors
  • Multiple input tensors per request
Use Cases:
  • Heavy ranker models
  • Multi-task learning models
  • Feature-rich ranking models

ONNX Runtime

Current Capabilities:
  • Primary support: Single input tensor of type string
  • Used in X’s home recommendation pipeline
  • Proprietary BatchPredictRequest format
Use Cases:
  • Home timeline ranking
  • Optimized inference for ONNX-exported models

PyTorch

PyTorch support is experimental and not yet production-ready in terms of performance and stability.

Directory Structure

The Navi codebase is organized into several key components:
X-specific converter that transforms BatchPredictionRequest Thrift to ndarray format for model inference
X-specific configuration specifying how to retrieve feature values from BatchPredictionRequest
Generated Thrift code for BatchPredictionRequest protocol

Running Navi

1

Create Model Directory Structure

Set up the models directory with versioned subdirectories using epoch timestamps:
mkdir -p models/web_click/1679693908377
mkdir -p models/web_click/1679693908400
The structure should look like:
models/
  └── web_click/
      ├── 1679693908377/
      └── 1679693908400/
2

Run TensorFlow Serving

Execute the TensorFlow runtime script:
cd navi/navi
./scripts/run_tf2.sh
3

Run ONNX Serving

Execute the ONNX runtime script:
cd navi/navi
./scripts/run_onnx.sh

Building from Source

cd navi/navi
cargo build --release --features tensorflow

Integration with X’s Recommendation Pipeline

Navi plays a critical role in X’s recommendation infrastructure:
  1. Home Timeline: Serves ONNX models for rapid candidate scoring
  2. Heavy Ranking: Provides TensorFlow model inference for detailed ranking
  3. Push Notifications: Powers real-time scoring for notification candidates

BatchPredictionRequest Format

For ONNX runtime, Navi uses a proprietary BatchPredictionRequest format:
// Example structure (simplified)
struct BatchPredictionRequest {
    // Dense features stored in segmented format
    dense_features: Vec<f32>,
    // Sparse features with indices
    sparse_features: HashMap<i64, f32>,
    // Feature configuration
    feature_config: FeatureConfig,
}
The dr_transform component converts this Thrift-based format into ndarray tensors suitable for model inference.

Performance Characteristics

Low Latency

Optimized for sub-millisecond inference latency at p99

High Throughput

Handles thousands of requests per second per instance

Memory Efficient

Rust’s zero-cost abstractions minimize memory overhead

Production Stable

Battle-tested in X’s production environment

API Compatibility

Navi implements the TensorFlow Serving gRPC API, making it compatible with existing TensorFlow Serving clients:
service PredictionService {
  rpc Predict(PredictRequest) returns (PredictResponse);
  rpc GetModelMetadata(GetModelMetadataRequest) returns (GetModelMetadataResponse);
}
This allows for drop-in replacement of TensorFlow Serving with Navi for improved performance.

Learn More

Ranking Systems

Learn how Navi integrates with light and heavy rankers

Product Mixer

Explore the service framework that orchestrates ML serving

Build docs developers (and LLMs) love