Walrus is designed for high-performance streaming workloads. This guide covers optimization strategies, performance characteristics, and tuning parameters to maximize throughput and minimize latency.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/nubskr/walrus/llms.txt
Use this file to discover all available pages before exploring further.
Performance Overview
Benchmark Results
Walrus delivers exceptional performance compared to similar systems:Without Fsync (Maximum Throughput)
| System | Avg Throughput (writes/s) | Avg Bandwidth (MB/s) | Max Throughput (writes/s) | Max Bandwidth (MB/s) |
|---|---|---|---|---|
| Walrus | 1,205,762 | 876.22 | 1,593,984 | 1,158.62 |
| Kafka | 1,112,120 | 808.33 | 1,424,073 | 1,035.74 |
| RocksDB | 432,821 | 314.53 | 1,000,000 | 726.53 |
With Fsync (Durability Enabled)
| System | Avg Throughput (writes/s) | Avg Bandwidth (MB/s) | Max Throughput (writes/s) | Max Bandwidth (MB/s) |
|---|---|---|---|---|
| RocksDB | 5,222 | 3.79 | 10,486 | 7.63 |
| Walrus | 4,980 | 3.60 | 11,389 | 8.19 |
| Kafka | 4,921 | 3.57 | 11,224 | 8.34 |
Benchmarks compare single Kafka broker (no replication) and RocksDB’s WAL against Walrus using
pwrite() syscalls. With io_uring batching enabled, Walrus can achieve significantly higher throughput.io_uring: The Performance Multiplier
The most significant performance optimization in Walrus is io_uring support on Linux. io_uring provides:- Batched I/O submission: Submit multiple operations in a single syscall
- Kernel-level parallelism: Concurrent disk operations
- Reduced context switches: Fewer transitions between user and kernel space
- Zero-copy operations: Direct memory access for reads/writes
Enabling io_uring
io_uring is enabled by default on Linux when using the FD (file descriptor) backend:Disabling io_uring (Not Recommended)
Only disable io_uring if you encounter compatibility issues:io_uring Requirements
- OS: Linux kernel 5.1+ (5.6+ recommended)
- Libraries:
liburinginstalled - Architecture: x86_64, ARM64, or other supported platforms
Batch Operations
Batch operations are the key to high throughput. They allow Walrus to amortize overhead across multiple entries.Batch Writes
- Batch write: 1 io_uring submission for 1,000 entries
- Single writes: 1,000 separate syscalls
- Speedup: 3-10x depending on entry size
Batch Reads
- Set
max_bytesbased on your processing buffer size - At least 1 entry is always returned if available
- Maximum 2,000 entries per call (configurable limit)
Batch Size Limits
Maximum entries per batch operation (read or write).Why 2,000? The default io_uring submission queue size is 2,047 entries. Staying below this ensures all operations fit in a single submission.
Maximum total payload size per batch write.Why 10GB? Large enough for most use cases while keeping memory usage bounded during planning phase.
Segment Size Tuning
Segment size affects rollover frequency and load distribution across nodes.Default Configuration
- Segment size: ~1GB
- Rollover time at 10k writes/sec: ~100 seconds
- Leadership rotates every segment
Tuning Guidelines
High Write Rate (>100k writes/sec)
High Write Rate (>100k writes/sec)
Increase segment size to reduce rollover overhead:Benefits:
- Fewer Raft consensus operations
- Less frequent leadership rotation
- Reduced metadata updates
- Longer time to distribute load to new nodes
- Larger disk space per segment
Dynamic Load Balancing
Dynamic Load Balancing
Decrease segment size for faster load distribution:Benefits:
- More frequent leadership rotation
- Faster load distribution across nodes
- Quicker adaptation to cluster changes
- More rollover overhead
- More frequent Raft consensus operations
Large Entry Sizes (>10KB per entry)
Large Entry Sizes (>10KB per entry)
Decrease segment entries to maintain reasonable segment sizes:Rule of thumb: Target 1-10GB per segment regardless of entry count.
Monitor Interval Tuning
Fsync Scheduling
Fsync scheduling controls the durability vs. throughput tradeoff.Strategies
- High Throughput (Default)
- Maximum Throughput
- Maximum Durability
- Balanced (Recommended)
- Batches fsyncs across multiple writes
- Good balance of durability and throughput
- Risk: Up to 200ms of data loss on crash
Read Consistency Tuning
Read consistency affects how often read cursors are persisted.StrictlyAtOnce (Default)
- Every read checkpoint is immediately persisted
- Guarantees exactly-once delivery after restart
- Higher durability, moderate read throughput
AtLeastOnce (Higher Throughput)
- Persist cursor every N reads
- After restart, may replay up to N entries
- Lower durability, higher read throughput
Tuning persist_every
Storage Backend Selection
FD Backend (Default, Recommended)
- io_uring support on Linux
- Better batch operation performance
- O_SYNC support for
SyncEachmode
- Unix-only (not available on Windows)
Mmap Backend
- Cross-platform (works on Windows)
- Direct memory access for reads
- No file descriptor limits
- No io_uring support
- Lower batch operation performance
- More complex memory management
Distributed Cluster Tuning
Lease Synchronization
The lease synchronization loop runs every 100ms by default. This is hardcoded but provides good balance:- 100ms: Fast enough for quick leader transitions
- 100ms: Low enough overhead (0.1% CPU per node)
If you need faster leader transitions, you’ll need to modify the source code in
controller/mod.rs. However, values below 50ms may increase CPU usage noticeably.Raft Configuration
While most Raft parameters are configured through Octopii (the Raft implementation), key tuning points:How often the leader sends heartbeats to followers.Tuning: Lower values detect failures faster but increase network traffic.
Time before a follower starts an election.Tuning: Higher values reduce false elections, lower values speed up failure recovery.
System-Level Optimizations
Linux I/O Scheduler
- NVMe drives:
none(direct I/O) - SSD:
mq-deadlineorbfq - HDD:
deadline(not recommended for Walrus)
File System Tuning
- XFS - Best for large sequential writes
- ext4 - Good all-around performance
- ZFS - Advanced features, higher overhead
CPU Governor
Performance Profiling
Measuring Throughput
Profiling with perf
io_uringsubmission/completion- Memory allocation
- Serialization/deserialization
Performance Checklist
Next Steps
Monitoring
Monitor performance metrics in production
Troubleshooting
Diagnose performance issues