Skip to main content

Overview

Providers are external data source configurations that enable datasets to connect to blockchain networks. They abstract connection details from dataset definitions, allowing reusable, shareable configurations across multiple datasets. The provider system supports multiple blockchain protocols:
  • EVM RPC - Ethereum-compatible chains via JSON-RPC
  • Firehose - High-throughput gRPC streaming from StreamingFast
  • Solana - Solana blockchain via RPC + Old Faithful archive

Key Concepts

Provider Components

Provider - A named configuration representing a connection to a blockchain data source:
  • Stored as TOML files in providers_dir
  • Contains endpoint URLs, credentials, and connection settings
  • Matched to datasets by kind and network
Provider Kind - The protocol type:
  • evm-rpc - JSON-RPC for EVM-compatible chains
  • firehose - StreamingFast Firehose gRPC protocol
  • solana - Solana RPC with Old Faithful archive support
Network - The blockchain network identifier:
  • mainnet, goerli, sepolia (Ethereum)
  • base, polygon, arbitrum, optimism (L2s and other chains)
  • Custom network names for private chains
Provider Resolution - Automatic matching process:
  1. Dataset requests provider by (kind, network) tuple
  2. System finds all matching providers
  3. Providers are shuffled for load balancing
  4. Environment variable substitution applied
  5. First successful connection is used

Architecture

Providers decouple dataset definitions from concrete data sources:

Benefits of Provider System

BenefitDescription
ReusabilityMultiple datasets share the same provider configuration
FlexibilitySwitch endpoints without modifying dataset manifests
Load BalancingRandom selection among matching providers distributes load
SecurityCredentials isolated from dataset definitions in environment variables
Environment-SpecificDifferent providers for dev/staging/production

Provider Resolution Flow

When a dataset needs a provider: Resolution steps:
  1. Match by criteria - Filter providers by kind and network
  2. Shuffle providers - Randomize order for load distribution
  3. Substitute variables - Replace ${VAR} with environment values
  4. Test connection - Attempt to connect to blockchain endpoint
  5. Return on success - First successful provider is used
  6. Fail if all fail - Error if no provider connects successfully

Provider Types

EVM RPC Provider

Connects to Ethereum-compatible chains via standard JSON-RPC. Configuration:
# providers/alchemy-mainnet.toml
kind = "evm-rpc"
network = "mainnet"
url = "${ETH_MAINNET_RPC_URL}"

# Optional: Performance tuning
concurrent_request_limit = 512
rpc_batch_size = 100
rate_limit_per_minute = 1000
fetch_receipts_per_tx = false
Fields:
FieldTypeRequiredDescription
kindstringYesMust be "evm-rpc"
networkstringYesNetwork identifier (mainnet, base, polygon, etc.)
urlstringYesRPC endpoint URL (http/https/ws/wss/ipc)
concurrent_request_limitnumberNoMax concurrent requests (default: 1024)
rpc_batch_sizenumberNoRequests per batch, 0 = disabled (default: 0)
rate_limit_per_minutenumberNoRate limit in requests/minute
fetch_receipts_per_txbooleanNoUse per-tx receipt fetching (default: false)
Supported URL schemes:
SchemeTypeUse Case
http://, https://HTTPStandard RPC endpoints
ws://, wss://WebSocketPersistent connections
ipc://IPC SocketLocal node connections
Examples:
# HTTP endpoint with API key
url = "https://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}"

# WebSocket endpoint
url = "wss://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}"

# Local IPC socket
url = "ipc:///home/user/.ethereum/geth.ipc"
Receipt fetching strategies: Bulk receipts (default, fetch_receipts_per_tx = false):
  • Uses eth_getBlockReceipts for all receipts at once
  • Faster but requires RPC support
  • Not all endpoints support this method
Per-transaction receipts (fetch_receipts_per_tx = true):
  • Uses eth_getTransactionReceipt for each transaction
  • Slower but more compatible
  • Works with all standard RPC endpoints
Extracted tables:
  • blocks - Block headers
  • transactions - Transaction data
  • logs - Event logs
Enable batching (rpc_batch_size = 100) to reduce HTTP overhead when extracting historical data.

Firehose Provider

Connects to StreamingFast’s Firehose for high-throughput gRPC streaming. Configuration:
# providers/firehose-mainnet.toml
kind = "firehose"
network = "mainnet"
url = "${FIREHOSE_ETH_MAINNET_URL}"

# Optional: Authentication
token = "${FIREHOSE_ETH_MAINNET_TOKEN}"
Fields:
FieldTypeRequiredDescription
kindstringYesMust be "firehose"
networkstringYesNetwork identifier (mainnet, base, etc.)
urlstringYesFirehose gRPC endpoint URL
tokenstringNoBearer token for authentication
Connection features:
FeatureDescription
Gzip CompressionBoth send and receive compressed
Large MessagesUp to 100 MiB message size
Auto Retry5-second backoff on stream errors
TLSNative TLS with system roots
Authentication: When a token is provided, it’s sent as a bearer token:
authorization: bearer <token>
Extracted tables:
  • blocks - Block header information
  • transactions - Transaction data
  • calls - Internal call traces (Firehose-specific)
  • logs - Event logs
Firehose is significantly faster than RPC for bulk extraction due to server-side streaming and optimized binary format. It’s ideal for syncing large block ranges.

Solana Provider

Connects to Solana blockchain using a two-stage approach: historical data from Old Faithful archive and real-time data from RPC. Configuration:
# providers/solana-mainnet.toml
kind = "solana"
network = "mainnet"
rpc_provider_url = "${SOLANA_MAINNET_RPC_URL}"
of1_car_directory = "${SOLANA_OF1_CAR_DIRECTORY}"

# Archive mode: "always", "auto", or "never"
use_archive = "always"

# Optional: Performance tuning
max_rpc_calls_per_second = 50
keep_of1_car_files = false
Fields:
FieldTypeRequiredDescription
kindstringYesMust be "solana"
networkstringYesNetwork identifier (mainnet, devnet)
rpc_provider_urlstringYesSolana RPC HTTP endpoint
of1_car_directorystringYesLocal directory for CAR file cache
use_archivestringNoArchive mode: "auto", "always", or "never" (default: "always")
max_rpc_calls_per_secondnumberNoRate limit for RPC calls
keep_of1_car_filesbooleanNoRetain CAR files after processing (default: false)
Archive modes:
ModeBehaviorUse Case
"always"Always use archive, even for recent dataFull historical extraction
"auto"RPC for recent slots (<10k), archive for historicalBalanced approach
"never"RPC-only modeDemos, recent data only
Two-stage extraction: Extracted tables:
  • block_headers - Slot, parent_slot, block_hash, block_height, block_time
  • transactions - Slot, tx_index, signatures, status, fee, balances
  • messages - Slot, tx_index, message fields
  • instructions - Slot, tx_index, program_id_index, accounts, data
Slot handling: Solana uses slots (~400ms intervals) rather than sequential block numbers:
  • Not every slot produces a block (skipped slots)
  • Gaps in block number sequence are normal
  • Chain integrity maintained through hash-based validation
CAR files are ~745GB per epoch. Download takes 10+ hours on typical connections. Use use_archive = "never" for testing recent data.

Provider Configuration

Directory Structure

Providers are stored as TOML files in the configured providers_dir:
providers/
├── alchemy-mainnet.toml
├── alchemy-base.toml
├── infura-mainnet.toml
├── firehose-mainnet.toml
├── firehose-base.toml
├── local-geth.toml
└── solana-mainnet.toml

Environment Variables

Provider configs support environment variable substitution using ${VAR} syntax:
# providers/alchemy-mainnet.toml
kind = "evm-rpc"
network = "mainnet"
url = "https://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}"
Environment setup:
export ALCHEMY_API_KEY="your-api-key-here"
export FIREHOSE_ETH_MAINNET_URL="grpc://firehose.example.com:9000"
export FIREHOSE_ETH_MAINNET_TOKEN="your-bearer-token"
export SOLANA_MAINNET_RPC_URL="https://api.mainnet-beta.solana.com"
export SOLANA_OF1_CAR_DIRECTORY="/data/solana/car"
Never commit credentials to version control. Always use environment variables for API keys, tokens, and sensitive URLs.

Multiple Providers

Multiple providers for the same (kind, network) enable: Load balancing:
providers/
├── alchemy-mainnet.toml      # kind=evm-rpc, network=mainnet
├── infura-mainnet.toml       # kind=evm-rpc, network=mainnet
└── quicknode-mainnet.toml    # kind=evm-rpc, network=mainnet
Random selection distributes load across endpoints. Failover: If the first provider fails to connect, the system tries the next one. Environment-specific configs:
# Development
export PROVIDERS_DIR=./providers/dev

# Production
export PROVIDERS_DIR=./providers/prod

Provider Management

While providers are managed as files, the Admin API provides inspection capabilities:

Listing Providers

curl http://localhost:1610/providers
Returns all loaded providers with their configurations (credentials masked).

Viewing Provider Details

curl http://localhost:1610/providers/{provider_id}
Shows detailed configuration for a specific provider.

Best Practices

Security

Use environment variables for credentials - Never hardcode API keys in TOML files
Restrict file permissions - Ensure provider files are not world-readable if they contain sensitive URLs
Rotate credentials regularly - Update environment variables without modifying config files

Performance

Enable rate limiting for public endpoints - Prevent hitting API quotas
Use batching for RPC providers - Set rpc_batch_size = 100 for bulk extraction
Configure appropriate concurrency - Balance throughput with endpoint limits

High Availability

Configure multiple providers - Provide fallback endpoints for critical networks
Use Firehose for production - Higher reliability and throughput than RPC
Monitor provider health - Track connection failures and switch providers if needed

Example Configurations

Production Ethereum Setup

# providers/firehose-mainnet.toml (primary)
kind = "firehose"
network = "mainnet"
url = "${FIREHOSE_ETH_MAINNET_URL}"
token = "${FIREHOSE_ETH_MAINNET_TOKEN}"
# providers/alchemy-mainnet.toml (fallback)
kind = "evm-rpc"
network = "mainnet"
url = "https://eth-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}"
concurrent_request_limit = 256
rpc_batch_size = 50
rate_limit_per_minute = 600

Development with Local Node

# providers/local-geth.toml
kind = "evm-rpc"
network = "mainnet"
url = "http://localhost:8545"
concurrent_request_limit = 1024
fetch_receipts_per_tx = true

Multi-Chain Configuration

# providers/base-mainnet.toml
kind = "evm-rpc"
network = "base"
url = "https://base-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}"
concurrent_request_limit = 512
rpc_batch_size = 100
# providers/polygon-mainnet.toml
kind = "evm-rpc"
network = "polygon"
url = "https://polygon-mainnet.g.alchemy.com/v2/${ALCHEMY_API_KEY}"
concurrent_request_limit = 512
rpc_batch_size = 100

Architecture

Understand how providers fit into Amp’s architecture

Data Flow

See how providers enable data extraction

Datasets

Learn how datasets reference providers

Configuration

Configure provider directories and settings

Build docs developers (and LLMs) love