Architecture Overview
A multi-worker setup consists of:- CAS Server: Stores build artifacts (Content Addressable Storage) and action cache
- Scheduler: Assigns jobs to workers based on platform properties and availability
- Multiple Workers: Execute build actions in parallel
Critical Requirement: All workers MUST share the same CAS storage path. Using isolated storage paths will cause “Object not found” errors when workers try to access artifacts stored by other workers.
Complete Configuration
CAS Server Configuration
Scheduler Configuration
Worker Configuration
Key Concepts
GRPC Store
Workers and schedulers connect to the remote CAS server using GRPC stores:Environment Variables: Use
${CAS_ENDPOINT} and ${SCHEDULER_ENDPOINT} to make configurations portable across environments. Set these when starting services.Fast-Slow Store with Remote Backend
Workers use a local cache with remote fallback:- Read: Check local cache → Fetch from remote CAS → Cache locally
- Write: Write directly to remote CAS (skip local cache)
- Result: Warm local cache for reads, avoid storage waste from one-off writes
Platform Property Queries
Workers can dynamically determine platform properties:Docker Compose Deployment
docker-compose.yml
Shared Volume: The
cas-data volume is mounted by both the CAS server and all workers. This ensures workers can access artifacts via hardlinks when possible, improving performance.Starting the Multi-Worker Setup
Testing the Setup
Bazel Build
Verify Distribution
Common Issues and Solutions
”Object not found” Errors
Symptom:Workers Not Receiving Jobs
Check Scheduler Connection:High CAS Server Load
Symptom: CAS server becomes bottleneck Solution: Add local worker cachesScaling Considerations
Horizontal Scaling
Resource Limits
Network Optimization
For distributed workers across machines:Production Deployment
For production multi-worker setups:- Use persistent storage: Replace Docker volumes with NFS, S3, or distributed filesystem
- Monitor worker health: Implement health checks and auto-restart
- Load balancing: Use multiple scheduler replicas for high availability
- Authentication: Add mTLS or token-based auth for worker registration
- Metrics: Export Prometheus metrics for monitoring
Example: S3 Shared Storage
Replace filesystem CAS with S3 for true distributed storage:See Also
- Basic CAS Configuration - Single-worker setup
- S3 Backend - Cloud storage for distributed workers
- Kubernetes Configuration - Production Kubernetes deployment
- Store Types - Complete store configuration reference