Overview
Amp is a high-performance blockchain data platform that extracts, transforms, and serves blockchain data through SQL queries. Built on the FDAP stack (Apache Arrow Flight, DataFusion, Arrow, and Parquet), Amp provides a complete ETL pipeline for blockchain analytics and data services.Amp combines the familiarity of SQL with the performance of Apache Arrow to deliver real-time blockchain data access at scale.
What Amp Does
Amp solves the core challenge of working with blockchain data: how to efficiently extract, transform, and query ever-growing blockchain datasets.- Extract: Pull data from multiple blockchain sources (EVM RPC, Firehose, Solana)
- Transform: Process data using SQL queries with custom user-defined functions (UDFs)
- Store: Save as optimized Parquet files in object storage
- Serve: Query data through Arrow Flight (gRPC) and JSON Lines (HTTP) interfaces
Key Features
Multi-Source Extraction
Connect to blockchain data from various sources:- EVM RPC: Ethereum and EVM-compatible chains via JSON-RPC endpoints
- Firehose: StreamingFast’s high-performance gRPC streaming protocol
- Solana: Solana blockchain data via RPC and archive formats
SQL-Based Transformations
Define derived datasets using standard SQL:Dual Query Interfaces
Query your data through two complementary interfaces:Arrow Flight
High-performance gRPC interface using Apache Arrow format. Ideal for large-scale analytics, streaming queries, and applications that consume Arrow data directly.
JSON Lines
Simple HTTP POST interface returning newline-delimited JSON. Perfect for ad-hoc queries, curl commands, and tools that support HTTP.
Flexible Deployment Modes
Amp adapts to your deployment needs:- Solo Mode (
ampd solo): All-in-one process for local development and testing - Distributed Mode: Separate server, controller, and worker components for production deployments
- Query-Only Mode: Serve data without running extraction jobs
- Multi-Region: Deploy globally with shared metadata and object storage
Production-Ready Storage
- Columnar Format: Parquet files optimized for analytical queries
- Object Storage: Support for S3, GCS, and Azure Blob Storage
- Automatic Compaction: Configurable file compaction for optimal query performance
- Garbage Collection: Automatic cleanup of obsolete files
Architecture Overview
Amp follows a distributed architecture with three core components:Controller
Manages job scheduling, worker coordination, and provides the Admin API for:- Dataset registration and versioning
- Job deployment and monitoring
- Worker health tracking
- File metadata queries
Server
Provides query interfaces:- Arrow Flight (port 1602): High-performance binary protocol over gRPC
- JSON Lines (port 1603): HTTP POST endpoint returning NDJSON
Workers
Execute extraction jobs in parallel:- Pull data from blockchain sources
- Write Parquet files to object storage
- Update metadata database with progress
- Coordinate via PostgreSQL LISTEN/NOTIFY
Metadata Database
PostgreSQL database tracking:- Dataset manifests and versions
- Extraction job status and progress
- File metadata and locations
- Worker heartbeats and health
Technology Stack
Amp is built on proven open-source technologies:- Language: Rust for performance and safety
- Query Engine: Apache DataFusion for SQL execution
- Storage Format: Apache Parquet for columnar storage
- Wire Format: Apache Arrow for zero-copy data transfer
- Database: PostgreSQL for metadata and coordination
- Observability: OpenTelemetry for metrics and traces
Use Cases
Blockchain Analytics
Extract and analyze blockchain data using familiar SQL:Data Services
Build data APIs on top of Amp’s query interfaces:- Serve blockchain data to applications via Arrow Flight
- Export data to warehouses using Arrow format
- Create derived datasets for specific use cases
Real-Time Monitoring
Stream blockchain data using Amp’s streaming query support:Cross-Chain Analytics
Query multiple blockchain datasets in a single SQL statement:Next Steps
Ready to get started with Amp?Installation
Install Amp using ampup, Nix, or build from source
Quickstart
Get querying blockchain data in under 5 minutes