Synopsis
Description
Starts a worker node that connects to the metadata database and processes distributed dump tasks. The worker polls for available jobs and executes them. Multiple workers can run in parallel to process different portions of data extraction jobs. The node-id must be unique across all running workers. Workers require access to the metadata database configured in the config file and run continuously until terminated.Options
The unique identifier for this worker node. Used to track worker status and job assignments in the distributed system.Can also be set via the
AMP_NODE_ID environment variable.Worker Coordination
Workers operate in a coordination loop:- Register with metadata database using node ID
- Maintain heartbeat every 1 second
- Listen for job notifications via PostgreSQL LISTEN/NOTIFY
- Execute assigned extraction jobs
- Write Parquet files to configured storage
- Update job status in database
| Mechanism | Description |
|---|---|
| Heartbeat | 1-second interval health signal |
| LISTEN/NOTIFY | PostgreSQL-based job notifications |
| State Reconciliation | 60-second periodic state sync |
| Graceful Resume | Jobs resume on worker restart |
Configuration
Worker settings inherit from the main ampd configuration file. The metadata database URL must be configured:Environment Variables
Directory Configuration
ampd worker requires --config (or AMP_CONFIG) to be provided, and --node-id is mandatory. Default data, providers, and manifests directory paths are resolved relative to the config file’s parent directory only when the config file does not specify those paths.
When the config file specifies data_dir, providers_dir, or manifests_dir, those values are used directly.
This command does not create directories itself; it relies on the configured paths and any downstream components to create or validate storage locations as needed.
Examples
Single Worker
worker-01.
Multiple Workers (Parallel Processing)
Using Environment Variables
Geographic Distribution
Worker Operations
Job Assignment
Workers receive job assignments through PostgreSQL LISTEN/NOTIFY. The controller assigns jobs based on:- Worker availability (heartbeat status)
- Current workload
- Job priority and dependencies
Data Extraction
Workers execute extraction jobs by:- Reading dataset manifests from the manifests directory
- Connecting to configured data sources (RPC, Firehose)
- Extracting blockchain data for assigned block ranges
- Writing Parquet files to the data directory
- Updating progress in the metadata database
Fault Tolerance
- Heartbeat Monitoring: Workers send heartbeat every 1 second
- Job Resumption: Jobs can resume from checkpoints on worker restart
- State Reconciliation: Periodic sync every 60 seconds ensures consistency
- Graceful Shutdown: Ctrl+C allows in-progress jobs to complete
Scaling Workers
Horizontal scaling improves extraction throughput:- Single worker: Sequential block processing
- Multiple workers: Parallel processing of block ranges
- Dynamic scaling: Add/remove workers based on load
Each worker must have a unique
--node-id. Running multiple workers with the same ID will cause coordination failures.Monitoring
Check worker status through the Admin API:Exit Codes
Worker shut down gracefully
Error occurred during worker operation or configuration is invalid
See Also
- ampd overview - Command overview and global options
- ampd controller - Job scheduling and coordination
- ampd solo - Development mode with embedded worker