Workers are the execution engines of NativeLink that run build and test actions submitted by clients through the scheduler. They download inputs from CAS, execute commands in isolated environments, and upload outputs back to CAS.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/TraceMachina/nativelink/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Workers form a pool of computational resources that:- Connect to the scheduler and advertise their capabilities
- Receive action assignments based on platform property matching
- Download input files from Content Addressable Storage (CAS)
- Execute commands in clean, isolated working directories
- Upload output artifacts back to CAS
- Report execution results to the scheduler
Worker Lifecycle
1. Connection and Registration
When a worker starts, it connects to the scheduler: Worker Registration:Platform properties must match the scheduler’s
supported_platform_properties configuration for the worker to receive matching actions.2. Receiving Actions
The scheduler assigns actions to workers via the bidirectional stream:- operation_id: Unique identifier for this execution
- action_digest: Hash of the Action proto
- action_info: Expanded action details (command, inputs, timeout)
3. Action Execution
The Running Actions Manager handles concurrent action execution:Precondition Checks
Before accepting an action, workers can run a precondition script:- Verify sufficient disk space
- Check required tools are installed
- Confirm GPU availability
- Validate license server connectivity
- Exit 0: Accept the action
- Non-zero exit: Reject the action (worker signals backpressure)
Working Directory Setup
Each action executes in a clean, isolated directory:- Create temporary working directory (e.g.,
/tmp/nativelink/<operation_id>) - Download input root from CAS
- Materialize directory tree structure
- Set environment variables
- Execute command
- Capture stdout/stderr and exit code
- Upload outputs to CAS
- Delete working directory
This ensures hermetic execution - actions cannot interfere with each other or be affected by previous executions.
Command Execution
The worker executes the command specified in the Action:- Spawns process with specified arguments
- Sets environment variables
- Captures stdout and stderr
- Enforces timeout
- Monitors for completion
- Action timeout: Specified in Action proto or worker default
- Upload timeout: Maximum time to upload outputs
Output Collection
After successful execution:- Identify output files/directories specified in Command
- Hash each output file (compute digest)
- Upload outputs to CAS
- Create ActionResult proto with output digests
- Upload stdout/stderr to CAS
- Report result to scheduler
4. Keepalive and Health
Workers send periodic keepalive messages:- Signal the worker is still alive
- Prevent scheduler from timing out the worker
- Update last-seen timestamp
If the scheduler doesn’t receive a keepalive within
worker_timeout_s, the worker is removed from the pool.5. Graceful Shutdown
Workers can gracefully drain:- Worker receives shutdown signal (SIGTERM)
- Worker sends GoingAway to scheduler
- Scheduler stops assigning new actions
- Worker completes running actions
- Worker disconnects
Worker Configuration
Basic Configuration
Configuration Options
Worker Settings
Worker Settings
Advanced Features
Multi-Worker CAS
Workers can share a local CAS to reduce redundant downloads:- Multiple workers on the same machine share cached inputs
- Reduces network traffic to remote CAS
- Faster action startup (inputs already local)
Directory Caching
Workers maintain a cache of downloaded directory trees to avoid re-downloading:- Digest-based cache: Directories indexed by digest
- LRU eviction: Old directories removed when cache is full
- Atomic updates: Directories fully downloaded before use
Resource Monitoring
Workers can monitor resource usage and reject actions when resources are constrained: Metrics:- CPU usage
- Memory usage
- Disk space
- Network I/O
Running Workers
Standalone Worker
Worker Pool (Multiple Workers on One Machine)
Systemd Service
Docker Container
Kubernetes Deployment
Monitoring Workers
Metrics
Workers expose Prometheus metrics:- Actions completed: Total actions executed
- Actions failed: Failed action count
- Action duration: Execution time histogram
- Download bytes: Total input download volume
- Upload bytes: Total output upload volume
- Working directory size: Current disk usage
Logging
Workers log execution details:- Action received and started
- Input download progress
- Command execution (stdout/stderr)
- Output upload progress
- Execution result (success/failure)
- Errors and warnings
ERROR: Critical failuresWARN: Retryable issues (network errors, timeouts)INFO: Action lifecycle eventsDEBUG: Detailed execution tracesTRACE: Low-level protocol details
Troubleshooting
Common Issues
Common Issues
Best Practices
- Size worker pool based on expected workload and machine resources
- Use local CAS cache (filesystem or memory) to reduce network traffic
- Configure precondition scripts for dynamic resource checks
- Set appropriate timeouts based on typical action duration
- Monitor metrics to track worker health and performance
- Use graceful shutdown to avoid killing in-progress actions
- Allocate sufficient disk space for working directories
- Run workers on fast storage (SSDs) for better I/O performance
Next Steps
Schedulers
Configure scheduler to manage workers
Remote Execution
Understand the execution flow
Stores
Optimize CAS configuration