Skip to main content
Amp provides two primary operational modes that can be combined into different deployment patterns. Understanding these modes is essential for choosing the right architecture for your use case.

Operational Modes

Amp supports two primary operational modes:

Solo Mode

Single-process mode combining all components for local development and testing

Distributed Mode

Separate server, worker, and controller components for production deployments

Core Components

Amp provides several commands that can be combined into different deployment patterns:

Server

Query server providing Arrow Flight and JSON Lines interfaces for data access.
  • Port 1602: Arrow Flight (gRPC) - high-performance binary queries
  • Port 1603: JSON Lines (HTTP) - simple query interface
  • Use case: Query serving without extraction workers
ampd server

Worker

Standalone worker process for executing scheduled extraction jobs.
  • Coordinates via metadata database
  • Executes dump jobs and writes Parquet files
  • Supports horizontal scaling
  • Requires: --node-id for unique identification
ampd worker --node-id worker-01

Controller

Controller service providing the Admin API for job management.
  • Port 1610: Admin API (HTTP) - management operations
  • Schedules and monitors jobs
  • Tracks worker health
  • Manages dataset registry
ampd controller

Migrate

Run database migrations on the metadata database.
ampd migrate

Mode Selection Guide

Choose the right mode based on your deployment requirements:

Use Solo Mode When:

Local development and quick prototyping
Testing the full extract-query pipeline
CI/CD pipelines
Learning Amp capabilities
Never use solo mode for production deployments

Use Distributed Mode When:

Production deployments requiring high availability
Resource isolation between queries and extraction
Horizontal scaling of extraction workers
Independent component failure handling
Multi-region deployments

Architecture Differences

Solo Mode Architecture

┌──────────────────────────────────────────┐
│ ampd solo                                │
│ ┌──────────────┐ ┌────────────────────┐  │
│ │Server        │ │ Controller         │  │
│ │- Flight      │ │ - Admin API        │  │
│ │- JSON Lines  │ │                    │  │
│ └──────────────┘ └────────────────────┘  │
│ ┌──────────────┐                         │
│ │ Worker       │                         │
│ │ (embedded)   │                         │
│ └──────────────┘                         │
└──────────────────────────────────────────┘

    ├─ PostgreSQL (metadata)
    └─ Object Store (parquet files)
Characteristics:
  • Single process
  • All components share resources
  • Fixed worker node ID “worker”
  • No fault isolation
  • Resource contention between queries and extraction

Distributed Mode Architecture

┌────────────────────┐   ┌──────────────────┐
│ampd server         │   │ampd controller   │
│┌──────────────────┐│   │┌────────────────┐│
││Server            ││   ││Controller      ││
││- Flight          ││   ││- Admin API     ││
││- JSON Lines      ││   │└────────────────┘│
│└──────────────────┘│   └──────────────────┘
└────────────────────┘            │
         │                        │
         │               ┌──────────────────┐
         │               │ampd worker       │
         │               │┌────────────────┐│
         │               ││Worker-1        ││
         │               │└────────────────┘│
         │               └──────────────────┘
         │               ┌──────────────────┐
         │               │ampd worker       │
         │               │┌────────────────┐│
         │               ││Worker-2        ││
         │               │└────────────────┘│
         │               └──────────────────┘
         │                      │
         └──────────────────────┘

         ├─ PostgreSQL (metadata, coordination)
         └─ Object Store (parquet files)
Characteristics:
  • Separate processes for each component
  • Resource isolation
  • Independent scaling
  • Fault isolation (worker crash doesn’t affect queries)
  • Multiple workers for horizontal scaling

Common Deployment Patterns

Pattern 1: Local Development (Solo)

When to use:
  • Local development and testing
  • CI/CD pipelines
  • Quick prototyping
Not suitable for production
ampd solo

Pattern 2: Query-Only Server (Distributed)

When to use:
  • Read-only query serving
  • Datasets populated by external processes
  • Multiple query replicas for load balancing
ampd server

Pattern 3: Full Distributed (Production)

When to use:
  • Production deployments
  • Resource isolation needed
  • Horizontal scaling required
  • High availability important
# Server node
ampd server

# Controller node
ampd controller

# Worker nodes (multiple)
ampd worker --node-id worker-01
ampd worker --node-id worker-02
ampd worker --node-id worker-03

Pattern 4: Multi-Region Distributed

When to use:
  • Global deployments with low-latency requirements
  • Geographic redundancy
  • Load distribution across regions
# Region A
ampd server
ampd controller
ampd worker --node-id us-east-1-worker

# Region B
ampd server
ampd worker --node-id eu-west-1-worker

Scaling Path

Recommended progression for growing deployments:
1

Development & Testing

Use ampd solo for local development and testing. Single machine, minimal setup. Not for production use.
2

Production Single-Region

Deploy separate ampd controller, ampd server, and ampd worker instances. Enable observability and configure compaction.
3

Scaled Distributed Extraction

Deploy multiple ampd server instances for query load balancing and multiple ampd worker instances for parallel extraction.
4

Multi-Region Production

Deploy ampd server in different regions for low-latency queries. Deploy ampd worker instances near data sources.

Next Steps

Solo Mode Setup

Get started with local development using solo mode

Distributed Deployment

Deploy Amp in distributed mode for production

Production Guide

Best practices for production deployments

Build docs developers (and LLMs) love