Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/TraceMachina/nativelink/llms.txt

Use this file to discover all available pages before exploring further.

The Worker API service enables workers to connect to the scheduler, receive work, and report execution status.

Overview

The Worker API provides:
  • Bidirectional gRPC streaming for worker-scheduler communication
  • Worker registration with platform properties
  • Task assignment and status updates
  • Keep-alive heartbeat mechanism
  • Graceful worker shutdown
Key features:
  • Long-lived bidirectional streaming connection
  • Platform property-based worker matching
  • Automatic timeout detection
  • Action result reporting

Configuration

scheduler
string
required
Reference to the scheduler that manages workers
{
  services: {
    worker_api: {
      scheduler: "MAIN_SCHEDULER"
    }
  }
}

gRPC Methods

ConnectWorker

Establish a bidirectional stream between worker and scheduler. Request (stream): Workers send UpdateForScheduler messages:
worker_id
string
Unique worker identifier (generated if not provided)
platform_properties
PlatformProperties
Worker capabilities (CPU, memory, OS, etc.)
keep_alive
KeepAliveRequest
Heartbeat to indicate worker is alive
going_away
GoingAwayRequest
Signal that worker is shutting down
execute_result
ExecuteResult
Action execution result (exit code, outputs)
Response (stream): Scheduler sends UpdateForWorker messages:
update_timeout
Duration
Time until next keep-alive expected
start_action
StartExecute
Action to execute:
  • operation_id: Unique operation identifier
  • action_digest: Digest of the Action to execute
  • skip_cache_lookup: Whether to skip AC lookup
  • platform_properties: Required platform properties

Connection Flow

1

Worker initiates connection

Worker calls ConnectWorker and sends initial UpdateForScheduler with platform_properties
2

Scheduler registers worker

Scheduler validates properties and assigns worker_id if needed
3

Task assignment

Scheduler sends start_action when work is available
4

Execution

Worker executes action and sends execute_result
5

Keep-alive

Worker periodically sends keep_alive messages to prevent timeout
6

Graceful shutdown

Worker sends going_away before disconnecting

Worker Registration

Workers register by sending their platform properties:
{
  "worker_id": "",  // Empty for new workers
  "platform_properties": {
    "properties": [
      {"name": "OSFamily", "value": "Linux"},
      {"name": "cpu_count", "value": "8"},
      {"name": "memory_kb", "value": "16777216"},
      {"name": "cpu_arch", "value": "x86_64"}
    ]
  }
}
The scheduler responds with:
{
  "update_timeout": {"seconds": 30}
}
Workers must send keep-alive messages within the update_timeout period

Execution Results

Workers report results via execute_result:
{
  "execute_result": {
    "operation_id": "abc123...",
    "result": {
      "exit_code": 0,
      "stdout_digest": {"hash": "...", "size_bytes": 1024},
      "output_files": [
        {
          "path": "out/binary",
          "digest": {"hash": "...", "size_bytes": 2048}
        }
      ]
    }
  }
}

Worker Timeout

Workers that fail to send updates within the timeout period are:
  1. Marked as timed out
  2. Removed from the available worker pool
  3. Any assigned actions are rescheduled
Configure timeouts appropriately - too short causes unnecessary reconnections, too long delays failure detection

Configuration Example

Complete worker configuration:
{
  workers: [
    {
      local: {
        worker_api_endpoint: {
          uri: "grpc://scheduler:50061"
        },
        platform_properties: {
          cpu_count: {values: ["8"]},
          memory_kb: {values: ["16777216"]},
          OSFamily: {values: ["Linux"]}
        }
      }
    }
  ]
}

Monitoring

Key metrics to monitor:
  • Active worker count
  • Worker connection duration
  • Task assignment rate
  • Worker timeout rate
  • Average execution time

Error Codes

CodeDescription
INVALID_ARGUMENTInvalid worker_id or platform properties
NOT_FOUNDScheduler not configured
DEADLINE_EXCEEDEDWorker timed out
CANCELLEDConnection closed

Implementation Details

From nativelink-service/src/worker_api_server.rs:
pub struct WorkerApiServer {
    scheduler: Arc<dyn WorkerScheduler>,
    now_fn: Arc<NowFn>,
    node_id: [u8; 6],
}
The Worker API server maintains a reference to the scheduler and automatically removes timed-out workers every second.
Workers use a custom NativeLink protocol over gRPC, not the standard Remote Execution API

Build docs developers (and LLMs) love