Documentation Index
Fetch the complete documentation index at: https://mintlify.com/nubskr/walrus/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Distributed Walrus transforms the single-node Walrus storage engine into a fault-tolerant, distributed streaming log system. It combines Walrus’s high-performance write-ahead logging with Raft consensus to provide automatic load balancing, fault tolerance, and seamless horizontal scaling.What is Distributed Walrus?
Distributed Walrus is a distributed log streaming engine built on top of the Walrus storage engine. It provides:- Automatic load balancing via segment-based leadership rotation
- Fault tolerance through Raft consensus (3+ nodes minimum)
- Simple client protocol - connect to any node, auto-forwarding handles routing
- Sealed segments for historical reads from any replica
- Consensus on metadata only - data path remains fast and direct
Unlike standalone Walrus which runs on a single machine, Distributed Walrus coordinates multiple nodes to provide redundancy and automatic failover.
Architecture Overview
High-Level System Design
- Routes writes to the current segment leader
- Serves reads from the appropriate node (leader for active segment, original leader for sealed segments)
- Rotates leadership across nodes as segments fill up
Core Components
Each node in the cluster contains four key components:Node Controller
Routes client requests to appropriate segment leaders, manages write leases, tracks logical offsets, and forwards operations to remote leaders when needed.
Raft Engine (Octopii)
Maintains consensus for metadata changes only (not data). Handles leader election and log replication. Syncs metadata across all nodes via AppendEntries RPCs.
Cluster Metadata
Raft state machine storing topic → segment → leader mappings, sealed segments with entry counts, and node addresses for routing. Replicated identically across all nodes.
Bucket Storage
Wraps Walrus engine with lease-based write fencing. Only accepts writes if node holds lease for that segment. Stores actual data in Walrus WAL files on disk.
Key Differences from Standalone Walrus
| Aspect | Standalone Walrus | Distributed Walrus |
|---|---|---|
| Deployment | Single process/machine | 3+ node cluster |
| Fault Tolerance | Process restart only | Automatic failover |
| Coordination | None required | Raft consensus for metadata |
| Load Distribution | All on one node | Automatic via segment rotation |
| Network Ports | Storage only | Client (:9091-9093) + Raft (:6001-6003) |
| Write Path | Direct to storage | Leader-based with forwarding |
| Read Path | Local only | Can read from any node with the data |
| Segment Management | Manual or time-based | Automatic rollover and leader rotation |
How Data Flows Through the System
Write Path
- Client connects to any node (Node 2)
- Node 2 checks metadata to determine leader (Node 1)
- Node 2 forwards write to Node 1 via RPC
- Node 1 verifies it holds the write lease
- Node 1 writes to local Walrus storage
- Response flows back to client
If the client had connected directly to Node 1 (the leader), steps 2-3 would be skipped for lower latency.
Read Path with Cursor Management
The system maintains a shared cursor per topic that automatically advances across segments as they’re sealed.Metadata Replication via Raft
All cluster state changes (topic creation, segment rollovers, node additions) go through Raft consensus:Segment-Based Partitioning
Topics are divided into segments, with each segment having a single leader node responsible for writes:- Active segment: Current segment accepting writes, owned by leader node
- Rollover trigger: Monitor detects segment has exceeded threshold (default: 1M entries)
- Seal operation: Raft commits metadata change to seal segment with final entry count
- Leadership transfer: New segment created with next node as leader (round-robin)
- Sealed segment: Read-only, served by original leader node
Sealed segments never move between nodes. The original leader retains the data for reads, eliminating data migration overhead.
Lease-Based Write Fencing
Write safety is enforced through a lease system:- Only the leader node for a segment can write to it
- Leases are derived from Raft-replicated metadata
- 100ms sync loop ensures lease consistency across nodes
- Prevents split-brain writes during leadership changes
- Old leader: Next sync removes lease → writes fail with
NotLeaderError - New leader: Next sync grants lease → starts accepting writes
- Maximum inconsistency window: 100ms
Network Topology
Port Layout
- Client ports (
:9091-9093): TCP connections forREGISTER,PUT,GET,STATE,METRICS - Raft ports (
:6001-6003): Internal RPC for metadata sync and node-to-node forwarding
Communication Patterns
- Client ↔ Node: Length-prefixed text protocol over TCP
- Node ↔ Node (Data): Internal RPC for forwarding writes/reads to segment leaders
- Node ↔ Node (Metadata): Raft consensus via AppendEntries and RequestVote RPCs
Monitoring and Observability
STATE Command
Query topic metadata to understand segment distribution:METRICS Command
Inspect Raft cluster health:Next Steps
Deployment Guide
Learn how to deploy a 3-node cluster with Docker or manually
Client Protocol
Understand the TCP protocol and available commands
Segment Management
Deep dive into segment rollover and lease mechanics
Failure Recovery
Handle node failures and recovery procedures