The Health Check service provides HTTP endpoints for monitoring NativeLink’s operational status.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/TraceMachina/nativelink/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Health service:- Exposes HTTP endpoint for health checks
- Reports status of all registered components
- Returns JSON status for each subsystem
- Supports custom timeout configuration
- Non-gRPC HTTP endpoint (easy integration with load balancers)
- Component-level health reporting
- Configurable timeouts
- Service-unavailable status on failures
Configuration
HTTP path for health check endpoint
Timeout for health check queries
The default path is
/status as documented in the source codeHTTP Endpoint
GET /status
Query health status of all components. Response:HTTP status code:
200 OK: All components healthy503 SERVICE_UNAVAILABLE: One or more components failed
Array of component health descriptions
Health Status Types
Components can report these statuses:- Ok
- Failed
- Timeout
Component is healthy and operational
Registered Components
NativeLink automatically registers health checks for:- Stores: All configured stores (CAS, AC, etc.)
- Schedulers: Task schedulers
- Workers: Local workers (if configured)
- Backend connections: S3, Redis, database connections
Components are registered during service initialization based on your configuration
Kubernetes Integration
Use the health endpoint for liveness and readiness probes:Load Balancer Integration
Configure health checks in your load balancer: AWS ALB:Timeout Configuration
Thetimeout_seconds setting controls how long to wait for each component:
Monitoring Best Practices
Set appropriate timeouts
Configure timeout_seconds based on your slowest backend (S3, database, etc.)
Custom Health Path
Change the health check path if it conflicts with other endpoints:Implementation Details
Fromnativelink-service/src/health_server.rs:
Error Handling
- 200 OK: All components healthy or no components registered
- 503 SERVICE_UNAVAILABLE: One or more components failed or timed out
- 500 INTERNAL_SERVER_ERROR: Failed to serialize health status (rare)