Skip to main content
The Worker API service enables workers to connect to the scheduler, receive work, and report execution status.

Overview

The Worker API provides:
  • Bidirectional gRPC streaming for worker-scheduler communication
  • Worker registration with platform properties
  • Task assignment and status updates
  • Keep-alive heartbeat mechanism
  • Graceful worker shutdown
Key features:
  • Long-lived bidirectional streaming connection
  • Platform property-based worker matching
  • Automatic timeout detection
  • Action result reporting

Configuration

scheduler
string
required
Reference to the scheduler that manages workers
{
  services: {
    worker_api: {
      scheduler: "MAIN_SCHEDULER"
    }
  }
}

gRPC Methods

ConnectWorker

Establish a bidirectional stream between worker and scheduler. Request (stream): Workers send UpdateForScheduler messages:
worker_id
string
Unique worker identifier (generated if not provided)
platform_properties
PlatformProperties
Worker capabilities (CPU, memory, OS, etc.)
keep_alive
KeepAliveRequest
Heartbeat to indicate worker is alive
going_away
GoingAwayRequest
Signal that worker is shutting down
execute_result
ExecuteResult
Action execution result (exit code, outputs)
Response (stream): Scheduler sends UpdateForWorker messages:
update_timeout
Duration
Time until next keep-alive expected
start_action
StartExecute
Action to execute:
  • operation_id: Unique operation identifier
  • action_digest: Digest of the Action to execute
  • skip_cache_lookup: Whether to skip AC lookup
  • platform_properties: Required platform properties

Connection Flow

1

Worker initiates connection

Worker calls ConnectWorker and sends initial UpdateForScheduler with platform_properties
2

Scheduler registers worker

Scheduler validates properties and assigns worker_id if needed
3

Task assignment

Scheduler sends start_action when work is available
4

Execution

Worker executes action and sends execute_result
5

Keep-alive

Worker periodically sends keep_alive messages to prevent timeout
6

Graceful shutdown

Worker sends going_away before disconnecting

Worker Registration

Workers register by sending their platform properties:
{
  "worker_id": "",  // Empty for new workers
  "platform_properties": {
    "properties": [
      {"name": "OSFamily", "value": "Linux"},
      {"name": "cpu_count", "value": "8"},
      {"name": "memory_kb", "value": "16777216"},
      {"name": "cpu_arch", "value": "x86_64"}
    ]
  }
}
The scheduler responds with:
{
  "update_timeout": {"seconds": 30}
}
Workers must send keep-alive messages within the update_timeout period

Execution Results

Workers report results via execute_result:
{
  "execute_result": {
    "operation_id": "abc123...",
    "result": {
      "exit_code": 0,
      "stdout_digest": {"hash": "...", "size_bytes": 1024},
      "output_files": [
        {
          "path": "out/binary",
          "digest": {"hash": "...", "size_bytes": 2048}
        }
      ]
    }
  }
}

Worker Timeout

Workers that fail to send updates within the timeout period are:
  1. Marked as timed out
  2. Removed from the available worker pool
  3. Any assigned actions are rescheduled
Configure timeouts appropriately - too short causes unnecessary reconnections, too long delays failure detection

Configuration Example

Complete worker configuration:
{
  workers: [
    {
      local: {
        worker_api_endpoint: {
          uri: "grpc://scheduler:50061"
        },
        platform_properties: {
          cpu_count: {values: ["8"]},
          memory_kb: {values: ["16777216"]},
          OSFamily: {values: ["Linux"]}
        }
      }
    }
  ]
}

Monitoring

Key metrics to monitor:
  • Active worker count
  • Worker connection duration
  • Task assignment rate
  • Worker timeout rate
  • Average execution time

Error Codes

CodeDescription
INVALID_ARGUMENTInvalid worker_id or platform properties
NOT_FOUNDScheduler not configured
DEADLINE_EXCEEDEDWorker timed out
CANCELLEDConnection closed

Implementation Details

From nativelink-service/src/worker_api_server.rs:
pub struct WorkerApiServer {
    scheduler: Arc<dyn WorkerScheduler>,
    now_fn: Arc<NowFn>,
    node_id: [u8; 6],
}
The Worker API server maintains a reference to the scheduler and automatically removes timed-out workers every second.
Workers use a custom NativeLink protocol over gRPC, not the standard Remote Execution API

Build docs developers (and LLMs) love