Configuration File
Amp uses a TOML configuration file to configure both the extraction and serving of datasets. The configuration file path is specified via theAMP_CONFIG environment variable.
Solo Mode Auto-Discovery
Forampd solo, the configuration file is automatically discovered at .amp/config.toml if it exists. You can override this by passing --config <path> or setting the AMP_CONFIG environment variable.
For other commands (server, worker, controller), the --config flag or AMP_CONFIG environment variable is required.
Sample Configuration
A complete sample configuration with all available options is provided in the source repository atdocs/config.sample.toml. Copy this file and edit it to match your deployment requirements.
Key Configuration Directories
Amp requires three object storage directories to be configured:Where the actual dataset parquet tables are stored once extracted. Can be initially empty.Supports both filesystem paths and object store URLs (S3, GCS, Azure).
Directory containing dataset definitions (manifest JSON files). This is the input to the extraction process.
Directory containing provider configurations for external services like Firehose and RPC endpoints. Each provider is configured as a separate TOML file.
Service Addresses
The following optional configuration keys control the hostname and port that each service binds to:Arrow Flight RPC server address for high-performance binary queries.
JSON Lines server address for HTTP-based queries.
Admin API server address for management operations.
Environment Variable Overrides
All values in the configuration file can be overridden from the environment by prefixing the environment variable name withAMP_CONFIG_.
Top-Level Values
For top-level configuration values, use uppercase with theAMP_CONFIG_ prefix:
Nested Configuration Values
For nested configuration values, use double underscores (__) to represent the nesting hierarchy:
Mixing Configuration File and Environment Variables
You can use a configuration file for base settings and override specific values with environment variables. This is useful for:- Development: Use a local config file with environment-specific overrides
- Production: Store secrets in environment variables while keeping other config in files
- CI/CD: Override database URLs and object store paths per environment
Memory and Performance
Global memory limit for all queries in MB. A value of 0 means unlimited.
Per-query memory limit in MB. A value of 0 means unlimited per query.
Paths for DataFusion temporary files for spill-to-disk when memory limits are exceeded.
Operational Timing
Polling interval for new blocks during extraction in seconds.
Maximum interval for derived dataset dump microbatches in blocks.
Maximum interval for streaming server microbatches in blocks.
Keep-alive interval for streaming server in seconds. Minimum value is 30.
Logging
Logging verbosity is controlled by theAMP_LOG environment variable (not in the config file):
RUST_LOG environment variable:
Configuration Validation
Amp validates the configuration file on startup and will report errors if:- Required fields are missing
- Field types are incorrect
- Object store URLs are malformed
- Service addresses are invalid
Next Steps
Metadata Database
Configure PostgreSQL for metadata storage
Storage
Set up object storage backends
Telemetry
Configure OpenTelemetry and Grafana