This guide covers performance optimization strategies for NativeLink deployments, from cache configuration to worker allocation.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/TraceMachina/nativelink/llms.txt
Use this file to discover all available pages before exploring further.
Cache Optimization
Memory Cache Configuration
Memory caches provide the fastest access but are limited by available RAM.Sizing recommendations
Sizing recommendations
- CAS: 5-10 GB
- AC: 1-2 GB
- CAS: 20-50 GB
- AC: 5-10 GB
- CAS: 100+ GB
- AC: 20+ GB
nativelink_cache_size and eviction rates to right-size your cache.Tiered Storage (FastSlow)
Combine fast memory cache with slower persistent storage:- Checks fast tier first on reads
- Promotes slow tier hits to fast tier
- Writes to both tiers simultaneously
- Assumes fast tier presence implies slow tier presence
Deduplication Store
For workloads with similar files (e.g., incremental builds):When to use deduplication
When to use deduplication
- Incremental builds with mostly unchanged files
- Large binary artifacts with common sections
- Uncompressed content
- Compressed or encrypted content
- Highly diverse files
- When upload/download isn’t the bottleneck
- CPU overhead for rolling hash computation
- Storage reduction: 30-70% for typical builds
- Network reduction: Similar to storage reduction
Size Partitioning
Route small and large objects to different stores:Compression
Reduce network transfer and storage at the cost of CPU:Compression algorithm comparison
Compression algorithm comparison
- Compression ratio: 2-3x
- Speed: Very fast (500+ MB/s)
- CPU usage: Low
- Best for: Most use cases, hot path caches
- Compression ratio: 3-5x
- Speed: Fast (200-400 MB/s)
- CPU usage: Medium
- Best for: Cold storage, WAN transfers
- Network bandwidth is limited
- Storage is expensive
- CPU capacity is available
- Content is already compressed (images, videos)
- CPU is constrained
- Local/datacenter networking with high bandwidth
Scheduler Optimization
Worker Allocation Strategy
least_recently_used (default)
least_recently_used (default)
- Balanced resource utilization
- Prevents worker overload
- Better for heterogeneous workloads
- Lower cache locality
- More cache misses on workers
most_recently_used
most_recently_used
- Higher cache hit rate on workers
- Better for repeated builds
- Fewer cold starts
- Can create hot spots
- Some workers may be underutilized
Timeout Configuration
worker_timeout_s (default: 5)
worker_timeout_s (default: 5)
- Faster failure detection
- Quicker reallocation of stuck actions
- Risk: Network hiccups remove healthy workers
- Tolerates transient network issues
- Reduces worker churn
- Risk: Slow to detect truly dead workers
client_action_timeout_s (default: 60)
client_action_timeout_s (default: 60)
- 300s (5 min) for interactive builds
- 600s (10 min) for CI/CD
- Match your client’s expected update interval
max_action_executing_timeout_s (default: 0/disabled)
max_action_executing_timeout_s (default: 0/disabled)
- Workers occasionally hang on specific actions
- Need hard limit on execution time
- Want to enforce build time SLOs
- 1800s (30 min) for typical builds
- 3600s (1 hour) for long-running tests
- 0 (disabled) if relying only on
worker_timeout_s
retain_completed_for_s (default: 60)
retain_completed_for_s (default: 60)
- Less memory usage
- Risk:
WaitExecutioncalls may miss results
- Better for slow clients
- More memory usage
- Useful for debugging
Retry Configuration
max_job_retries times, the scheduler returns the last error to the client instead of retrying indefinitely.- 2-3 retries: Most deployments (default: 3)
- 0-1 retries: Flaky infrastructure, prefer failing fast
- 5+ retries: Very unreliable workers (investigate root cause instead)
Worker Configuration
Concurrent Actions
Control how many actions a worker executes simultaneously:Sizing guidelines
Sizing guidelines
- 1 action per CPU core
- Example: 8-core machine →
max_concurrent_actions: 8
- 2-4 actions per CPU core
- Example: 8-core machine →
max_concurrent_actions: 16-32
- Start with 1.5x CPU cores
- Monitor CPU and I/O wait
- Adjust based on utilization
- Calculate per-action memory:
total_memory / max_concurrent_actions - Ensure sufficient memory for largest expected action
Platform Properties
Optimize worker matching:Property type strategies
Property type strategies
minimum:- Worker must have at least the requested value
- Used for: cpu_count, memory_gb, disk_gb
- Example: Action requests
cpu_count: 8, worker with 16 cores matches
exact:- Worker must exactly match requested value
- Used for: os, cpu_arch, gpu_type
- Example: Action requests
os: linux, only Linux workers match
priority:- Informational only, doesn’t restrict matching
- Passed to worker but not enforced
- Future: May influence worker preference
ignore:- Allows property in actions
- Doesn’t require workers to have it
- Used for optional capabilities
Network Optimization
gRPC Connection Pooling
connections_per_endpoint
connections_per_endpoint
- Less memory overhead
- Fewer file descriptors
- May bottleneck on high throughput
- Better throughput for concurrent requests
- More resource usage
- Diminishing returns beyond 10
rpc_timeout_s
rpc_timeout_s
- Fail fast on network issues
- Better for small objects
- May fail for large uploads/downloads
- Tolerates slow networks
- Required for large objects
- Slower to detect hung connections
- 5m for typical deployments
- 30m if transferring multi-GB objects
- Match to largest expected object transfer time
Retry Configuration
max_retries: Number of retry attempts (exponential backoff)delay: Initial delay in secondsjitter: Random factor (0.0-1.0) to prevent thundering herd
delay * (2 ^ attempt) * (1 + random(-jitter, jitter))Monitoring-Driven Optimization
Key Metrics to Track
Cache Hit Rate
Worker Utilization
Queue Depth
P95 Latency
Optimization Workflow
Identify bottleneck
- High queue depth → Need more workers
- Low cache hit rate → Increase cache size or review keys
- High P95 latency → Use tiered storage or compression
- Low worker utilization → Reduce worker count or improve allocation
Make targeted change
- Adjust configuration
- Monitor for 15-30 minutes
- Compare before/after metrics
Resource Limits
OpenTelemetry Collector
Tuning guidelines
Tuning guidelines
limit_mib: 1024+send_batch_size: 2048timeout: 5s
limit_mib: 256send_batch_size: 512timeout: 30s
otelcol_processor_refused_metric_points should be 0Prometheus Storage
samples/sec * retention_seconds * 1-2 bytes/sampleFor 1000 series at 15s interval for 30 days: ~170 MBBest Practices Summary
Cache Configuration
Cache Configuration
- Use tiered storage (memory + disk) for best performance
- Size memory cache to 10-20% of working set
- Enable compression for remote stores
- Use deduplication for incremental builds
Scheduler Tuning
Scheduler Tuning
- Set
worker_timeout_sto 30s for production - Use
most_recently_usedallocation for CI/CD - Configure
max_action_executing_timeout_sto catch hung actions - Keep
max_job_retriesat 2-3
Worker Optimization
Worker Optimization
- Match
max_concurrent_actionsto workload type - Define precise platform properties
- Scale workers based on queue depth
- Monitor per-worker cache hit rates
Network Performance
Network Performance
- Use 5 connections per gRPC endpoint
- Set appropriate RPC timeouts for object sizes
- Configure retries with jitter
- Enable compression for WAN transfers