Overview
Providers are external data source configurations that enable datasets to connect to blockchain networks. They abstract connection details from dataset definitions, allowing reusable, shareable configurations across multiple datasets. The provider system supports multiple blockchain protocols:- EVM RPC - Ethereum-compatible chains via JSON-RPC
- Firehose - High-throughput gRPC streaming from StreamingFast
- Solana - Solana blockchain via RPC + Old Faithful archive
Key Concepts
Provider Components
Provider - A named configuration representing a connection to a blockchain data source:- Stored as TOML files in
providers_dir - Contains endpoint URLs, credentials, and connection settings
- Matched to datasets by
kindandnetwork
evm-rpc- JSON-RPC for EVM-compatible chainsfirehose- StreamingFast Firehose gRPC protocolsolana- Solana RPC with Old Faithful archive support
mainnet,goerli,sepolia(Ethereum)base,polygon,arbitrum,optimism(L2s and other chains)- Custom network names for private chains
- Dataset requests provider by
(kind, network)tuple - System finds all matching providers
- Providers are shuffled for load balancing
- Environment variable substitution applied
- First successful connection is used
Architecture
Providers decouple dataset definitions from concrete data sources:Benefits of Provider System
| Benefit | Description |
|---|---|
| Reusability | Multiple datasets share the same provider configuration |
| Flexibility | Switch endpoints without modifying dataset manifests |
| Load Balancing | Random selection among matching providers distributes load |
| Security | Credentials isolated from dataset definitions in environment variables |
| Environment-Specific | Different providers for dev/staging/production |
Provider Resolution Flow
When a dataset needs a provider: Resolution steps:- Match by criteria - Filter providers by
kindandnetwork - Shuffle providers - Randomize order for load distribution
- Substitute variables - Replace
${VAR}with environment values - Test connection - Attempt to connect to blockchain endpoint
- Return on success - First successful provider is used
- Fail if all fail - Error if no provider connects successfully
Provider Types
EVM RPC Provider
Connects to Ethereum-compatible chains via standard JSON-RPC. Configuration:| Field | Type | Required | Description |
|---|---|---|---|
kind | string | Yes | Must be "evm-rpc" |
network | string | Yes | Network identifier (mainnet, base, polygon, etc.) |
url | string | Yes | RPC endpoint URL (http/https/ws/wss/ipc) |
concurrent_request_limit | number | No | Max concurrent requests (default: 1024) |
rpc_batch_size | number | No | Requests per batch, 0 = disabled (default: 0) |
rate_limit_per_minute | number | No | Rate limit in requests/minute |
fetch_receipts_per_tx | boolean | No | Use per-tx receipt fetching (default: false) |
| Scheme | Type | Use Case |
|---|---|---|
http://, https:// | HTTP | Standard RPC endpoints |
ws://, wss:// | WebSocket | Persistent connections |
ipc:// | IPC Socket | Local node connections |
fetch_receipts_per_tx = false):
- Uses
eth_getBlockReceiptsfor all receipts at once - Faster but requires RPC support
- Not all endpoints support this method
fetch_receipts_per_tx = true):
- Uses
eth_getTransactionReceiptfor each transaction - Slower but more compatible
- Works with all standard RPC endpoints
blocks- Block headerstransactions- Transaction datalogs- Event logs
Firehose Provider
Connects to StreamingFast’s Firehose for high-throughput gRPC streaming. Configuration:| Field | Type | Required | Description |
|---|---|---|---|
kind | string | Yes | Must be "firehose" |
network | string | Yes | Network identifier (mainnet, base, etc.) |
url | string | Yes | Firehose gRPC endpoint URL |
token | string | No | Bearer token for authentication |
| Feature | Description |
|---|---|
| Gzip Compression | Both send and receive compressed |
| Large Messages | Up to 100 MiB message size |
| Auto Retry | 5-second backoff on stream errors |
| TLS | Native TLS with system roots |
blocks- Block header informationtransactions- Transaction datacalls- Internal call traces (Firehose-specific)logs- Event logs
Firehose is significantly faster than RPC for bulk extraction due to server-side streaming and optimized binary format. It’s ideal for syncing large block ranges.
Solana Provider
Connects to Solana blockchain using a two-stage approach: historical data from Old Faithful archive and real-time data from RPC. Configuration:| Field | Type | Required | Description |
|---|---|---|---|
kind | string | Yes | Must be "solana" |
network | string | Yes | Network identifier (mainnet, devnet) |
rpc_provider_url | string | Yes | Solana RPC HTTP endpoint |
of1_car_directory | string | Yes | Local directory for CAR file cache |
use_archive | string | No | Archive mode: "auto", "always", or "never" (default: "always") |
max_rpc_calls_per_second | number | No | Rate limit for RPC calls |
keep_of1_car_files | boolean | No | Retain CAR files after processing (default: false) |
| Mode | Behavior | Use Case |
|---|---|---|
"always" | Always use archive, even for recent data | Full historical extraction |
"auto" | RPC for recent slots (<10k), archive for historical | Balanced approach |
"never" | RPC-only mode | Demos, recent data only |
block_headers- Slot, parent_slot, block_hash, block_height, block_timetransactions- Slot, tx_index, signatures, status, fee, balancesmessages- Slot, tx_index, message fieldsinstructions- Slot, tx_index, program_id_index, accounts, data
- Not every slot produces a block (skipped slots)
- Gaps in block number sequence are normal
- Chain integrity maintained through hash-based validation
Provider Configuration
Directory Structure
Providers are stored as TOML files in the configuredproviders_dir:
Environment Variables
Provider configs support environment variable substitution using${VAR} syntax:
Multiple Providers
Multiple providers for the same(kind, network) enable:
Load balancing:
Provider Management
While providers are managed as files, the Admin API provides inspection capabilities:Listing Providers
Viewing Provider Details
Best Practices
Security
Use environment variables for credentials - Never hardcode API keys in TOML files
Restrict file permissions - Ensure provider files are not world-readable if they contain sensitive URLs
Rotate credentials regularly - Update environment variables without modifying config files
Performance
Enable rate limiting for public endpoints - Prevent hitting API quotas
Use batching for RPC providers - Set
rpc_batch_size = 100 for bulk extractionConfigure appropriate concurrency - Balance throughput with endpoint limits
High Availability
Configure multiple providers - Provide fallback endpoints for critical networks
Use Firehose for production - Higher reliability and throughput than RPC
Monitor provider health - Track connection failures and switch providers if needed
Example Configurations
Production Ethereum Setup
Development with Local Node
Multi-Chain Configuration
Related Documentation
Architecture
Understand how providers fit into Amp’s architecture
Data Flow
See how providers enable data extraction
Datasets
Learn how datasets reference providers
Configuration
Configure provider directories and settings