How Build Caching Works
The build cache operates on the principle of content-addressable storage where every build artifact is identified by a cryptographic hash of its content.Action Digest Computation
An Action Digest uniquely identifies a build action and is computed from:- Command: The exact command to execute (e.g.,
gcc -c file.c -o file.o) - Input Files: Content hashes of all input files and directories
- Environment Variables: Specified environment variables
- Platform Properties: Execution requirements (OS, architecture, etc.)
- Timeout: Execution timeout configuration
If any of these inputs change, the action digest changes, resulting in a cache miss.
Cache Lookup Process
Content Addressable Storage (CAS)
The Content Addressable Storage (CAS) is where all build artifacts are stored, indexed by their content hash.Digest-Based Addressing
Every blob in CAS is identified by aDigestInfo:
- SHA-256: Industry standard, widely compatible
- BLAKE3: Faster alternative with parallel hashing
- SHA-256
- BLAKE3
Default hash function used by most build tools.
Deduplication
Content addressing provides automatic deduplication:- Identical files share the same digest and are stored only once
- Saves storage space especially for common dependencies
- Reduces network transfer when artifacts already exist in CAS
Example: If 100 build actions all include the same
stdlib.h, it’s stored only once in CAS.Action Cache (AC)
The Action Cache maps action digests to their execution results.ActionResult Structure
AnActionResult contains:
- Output Files/Directories: Digests of produced artifacts
- Exit Code: Command return code
- Stdout/Stderr: Digests of captured output
- Execution Metadata: Duration, worker info
Cache Validation
NativeLink ensures cache integrity through Completeness Checking:Completeness Checking Store
Completeness Checking Store
When enabled, the
CompletenessCheckingSpec wrapper verifies that all output digests referenced in an ActionResult actually exist in the CAS before returning a cache hit.Recommended for AC stores to prevent returning incomplete cache entries.
Cache Hit Optimization Strategies
1. Deterministic Builds
Ensure builds are deterministic to maximize cache hits:Use Relative Paths
Avoid absolute paths in compiler flags that vary between machines.
Fixed Timestamps
Remove timestamp dependencies from build outputs.
Sorted Inputs
Process inputs in consistent order (e.g., sorted file lists).
Hermetic Environments
Isolate builds from system-specific dependencies.
2. Fine-Grained Actions
Break builds into smaller, focused actions:- Compile individual source files separately
- Link as a separate action
- Generate headers in dedicated actions
3. Input Root Minimization
Include only necessary inputs in the action:Cache Storage Backends
NativeLink supports various storage backends for CAS and AC:- Memory
- Filesystem
- S3 / GCS
- Redis
Ultra-fast in-memory cache.Use Case: Development, small projects, fast local cache tierLimitations: Volatile, limited by RAM
Multi-Tier Caching
Combine storage backends for optimal performance:- Read: Check fast tier (memory), fallback to slow tier (S3)
- Write: Write to both tiers simultaneously
- Promotion: Slow tier hits are cached in fast tier
This pattern provides local speed with cloud persistence.
Cache Eviction Policies
Control cache size with eviction policies:Zero-Byte File Handling
NativeLink optimizes for common cases:Special Zero-Byte Digests
Special Zero-Byte Digests
Empty files have well-known digests:
- SHA-256:
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855 - BLAKE3:
af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262
Cache Verification
TheVerifySpec wrapper validates uploads:
Cache Statistics
Monitor cache effectiveness: Key Metrics:- Cache Hit Rate: Percentage of actions served from cache
- Cache Size: Total bytes stored
- Eviction Rate: How often items are evicted
- Download/Upload Volume: Network transfer savings
A well-configured cache can achieve 80-95% hit rates for incremental builds.
Best Practices
- Use Completeness Checking on AC stores to ensure cache integrity
- Enable Verification on CAS uploads to catch corrupt data early
- Size Partitioning to separate small and large artifacts
- Compression for network-backed stores to reduce transfer costs
- Monitor Cache Metrics to tune eviction policies
- Shared Caches across teams to maximize reuse
Next Steps
Storage Backends
Deep dive into store types and composition
Remote Execution
Learn how actions are executed remotely