Operational Modes
Amp supports two primary operational modes:Solo Mode
Single-process mode combining all components for local development and testing
Distributed Mode
Separate server, worker, and controller components for production deployments
Core Components
Amp provides several commands that can be combined into different deployment patterns:Server
Query server providing Arrow Flight and JSON Lines interfaces for data access.- Port 1602: Arrow Flight (gRPC) - high-performance binary queries
- Port 1603: JSON Lines (HTTP) - simple query interface
- Use case: Query serving without extraction workers
Worker
Standalone worker process for executing scheduled extraction jobs.- Coordinates via metadata database
- Executes dump jobs and writes Parquet files
- Supports horizontal scaling
- Requires:
--node-idfor unique identification
Controller
Controller service providing the Admin API for job management.- Port 1610: Admin API (HTTP) - management operations
- Schedules and monitors jobs
- Tracks worker health
- Manages dataset registry
Migrate
Run database migrations on the metadata database.Mode Selection Guide
Choose the right mode based on your deployment requirements:Use Solo Mode When:
Local development and quick prototyping
Testing the full extract-query pipeline
CI/CD pipelines
Learning Amp capabilities
Use Distributed Mode When:
Production deployments requiring high availability
Resource isolation between queries and extraction
Horizontal scaling of extraction workers
Independent component failure handling
Multi-region deployments
Architecture Differences
Solo Mode Architecture
- Single process
- All components share resources
- Fixed worker node ID “worker”
- No fault isolation
- Resource contention between queries and extraction
Distributed Mode Architecture
- Separate processes for each component
- Resource isolation
- Independent scaling
- Fault isolation (worker crash doesn’t affect queries)
- Multiple workers for horizontal scaling
Common Deployment Patterns
Pattern 1: Local Development (Solo)
When to use:- Local development and testing
- CI/CD pipelines
- Quick prototyping
Pattern 2: Query-Only Server (Distributed)
When to use:- Read-only query serving
- Datasets populated by external processes
- Multiple query replicas for load balancing
Pattern 3: Full Distributed (Production)
When to use:- Production deployments
- Resource isolation needed
- Horizontal scaling required
- High availability important
Pattern 4: Multi-Region Distributed
When to use:- Global deployments with low-latency requirements
- Geographic redundancy
- Load distribution across regions
Scaling Path
Recommended progression for growing deployments:Development & Testing
Use
ampd solo for local development and testing. Single machine, minimal setup. Not for production use.Production Single-Region
Deploy separate
ampd controller, ampd server, and ampd worker instances. Enable observability and configure compaction.Scaled Distributed Extraction
Deploy multiple
ampd server instances for query load balancing and multiple ampd worker instances for parallel extraction.Next Steps
Solo Mode Setup
Get started with local development using solo mode
Distributed Deployment
Deploy Amp in distributed mode for production
Production Guide
Best practices for production deployments