Overview
Amp uses PostgreSQL to store metadata about datasets, jobs, workers, files, and extraction progress. The metadata database enables coordination across distributed components and provides transactional consistency for dataset operations.Configuration
The metadata database is configured in the[metadata_db] section of the config file:
Configuration Fields
PostgreSQL connection URL. Required for all commands (Environment variable override:
server, worker, controller, migrate).Solo mode automatically provides a managed PostgreSQL URL by default. You can optionally provide your own database URL to use an external database instead.Size of the database connection pool. Controls how many concurrent connections Amp maintains to PostgreSQL.
Automatically run database migrations on startup. When enabled, Amp will apply any pending schema migrations before starting services.Set to
false if you want to run migrations manually using ampd migrate.Connection String Format
PostgreSQL connection strings follow the standard format:Examples
Connection Parameters
Common query parameters for the connection URL:sslmode: SSL/TLS mode (disable,require,verify-ca,verify-full)connect_timeout: Connection timeout in secondsapplication_name: Application name shown in PostgreSQL logshost: Unix socket directory path
Solo Mode Auto-Managed PostgreSQL
ampd solo automatically manages a local PostgreSQL instance with zero configuration:
- Uses system-installed PostgreSQL binaries (
initdb,postgres) - Initializes database cluster at
.amp/metadb/on first run - Connects via Unix socket (no TCP port conflicts)
- Data persists across restarts
- Shuts down gracefully with Ctrl+C or SIGTERM
Using External Database in Solo Mode
To use an external database instead of the auto-managed instance:Database Setup
1. Create Database
Create a dedicated database for Amp metadata:2. Run Migrations
Migrations can be applied automatically or manually: Automatic (default):3. Verify Connection
Test the database connection:Database Migrations
Amp uses SQL migrations to manage the metadata database schema. Migrations are:- Sequential: Applied in order by ID
- Idempotent: Safe to run multiple times
- Tracked: Status stored in
_amp_migrationstable - Versioned: Each Amp version includes required migrations
Running Migrations
Themigrate command applies pending migrations:
- Connects to PostgreSQL metadata database
- Checks which migrations have been applied
- Runs all unapplied migrations in order
- Updates
_amp_migrationstracking table - Exits with success or error status
Migration Safety
Best practices:- Backup first: Take a full database backup before upgrading
- Test migrations: Run migrations in a staging environment first
- Check release notes: Review migration notes in release documentation
- Monitor logs: Watch for errors during migration
- Verify completion: Check migration status after running
Troubleshooting Migrations
Permission errors: Ensure database user hasCREATE, ALTER, and DROP privileges:
Database Maintenance
Connection Pooling
Adjustpool_size based on your workload:
Monitoring
Monitor PostgreSQL performance:Vacuum and Analyze
PostgreSQL requires periodic maintenance:High Availability
For production deployments, consider:- Replication: PostgreSQL streaming replication for read replicas
- Failover: Automatic failover with tools like Patroni or pg_auto_failover
- Backups: Regular backups with pg_dump or continuous archiving
- Monitoring: Track connection pool usage, query performance, and disk space
Security
Security checklist:- Use strong passwords for database users
- Enable SSL/TLS for remote connections (
sslmode=require) - Restrict network access with firewalls or VPCs
- Use dedicated database users with minimal privileges
- Store connection strings in environment variables, not in code
- Rotate credentials periodically
- Enable PostgreSQL audit logging for compliance
Next Steps
Storage Configuration
Configure object storage backends
Telemetry
Set up metrics and traces