The Firehose provider enables high-throughput blockchain data extraction via StreamingFast’s Firehose protocol. It uses gRPC streaming for efficient, continuous data delivery with full transaction traces and call data.
Overview
Firehose is StreamingFast’s high-performance blockchain streaming protocol that provides:
- Real-time streaming: Continuous block delivery via gRPC
- Full transaction traces: Complete call trees and internal transactions
- High throughput: Optimized for large-scale data extraction
- Protocol buffers: Efficient binary serialization
Firehose extracts data into four tables:
- blocks: Block headers and metadata
- transactions: Transaction data and execution results
- calls: Internal call traces (function calls, contract creations, self-destructs)
- logs: Event logs emitted by smart contracts
Firehose provides calls table data that is not available through standard JSON-RPC endpoints.
Configuration
Required Fields
| Field | Type | Description |
|---|
kind | string | Must be "firehose" |
network | string | Network identifier (e.g., mainnet, base, polygon) |
url | string | Firehose gRPC endpoint URL |
Optional Fields
| Field | Type | Default | Description |
|---|
token | string | none | Bearer token for authentication |
Minimal Configuration
kind = "firehose"
network = "mainnet"
url = "${FIREHOSE_ETH_MAINNET_URL}"
With Authentication
kind = "firehose"
network = "mainnet"
url = "${FIREHOSE_ETH_MAINNET_URL}"
token = "${FIREHOSE_ETH_MAINNET_TOKEN}"
Always use environment variables for Firehose tokens to prevent credential leakage.
How It Works
gRPC Streaming
Firehose uses server-side gRPC streaming for continuous data delivery:
Dataset Job → Provider Resolution → Firehose Client → gRPC Stream → Firehose Endpoint
↓
TLS + Compression
↓
Block Stream (with traces)
↓
Parquet Tables (blocks, txs, calls, logs)
Connection Features
| Feature | Description |
|---|
| Gzip Compression | Both send and receive compressed for bandwidth efficiency |
| Large Messages | Up to 100 MiB message size for complex blocks with many calls |
| Auto Retry | 5-second backoff on stream errors for resilience |
| TLS | Native TLS with system root certificates |
Data Flow
- Dataset job requests block range from manifest
- Provider resolution finds matching
firehose provider for network
- Client establishes gRPC stream with authentication (if configured)
- Firehose streams blocks with full transaction traces
- Data materialized as Parquet files in data directory
Authentication
When a token is provided, it’s sent as a bearer token in the authorization header:
authorization: bearer <token>
Example with Token
kind = "firehose"
network = "mainnet"
url = "grpc://firehose.example.com:9000"
token = "${FIREHOSE_TOKEN}"
Public Endpoints
Some Firehose endpoints are public and don’t require authentication:
kind = "firehose"
network = "mainnet"
url = "grpc://public-firehose.example.com:9000"
# No token needed
Supported Networks
Any EVM-compatible network with a Firehose endpoint:
Ethereum and L2s
# Ethereum mainnet
network = "mainnet"
# Base L2
network = "base"
# Arbitrum One
network = "arbitrum-one"
# Optimism
network = "optimism"
# Polygon PoS
network = "polygon"
Testnets
# Sepolia testnet
network = "sepolia"
# Goerli testnet
network = "goerli"
blocks
Block header information:
number: Block number
hash: Block hash
parent_hash: Previous block hash
timestamp: Block timestamp
miner: Block producer address
size: Block size in bytes
gas_limit, gas_used: Gas metrics
- Additional protocol buffer fields
transactions
Transaction data with execution results:
block_number, block_hash: Block reference
transaction_index: Position in block
hash: Transaction hash
from, to: Sender and recipient
value: ETH transferred
gas, gas_price, gas_used: Gas parameters
status: Execution status (success/failure)
input: Transaction input data
calls
Internal call traces (unique to Firehose):
block_number: Block reference
transaction_hash, transaction_index: Transaction reference
call_type: Call type (call, create, suicide, reward)
from, to: Caller and callee addresses
value: ETH transferred
gas, gas_used: Gas consumption
input, output: Call data and return value
depth: Call depth in execution tree
The calls table provides complete transaction execution traces, including internal contract-to-contract calls, contract creations, and self-destructs.
logs
Event logs emitted by smart contracts:
block_number, block_hash: Block reference
transaction_hash, transaction_index: Transaction reference
log_index: Position in block
address: Contract address that emitted the log
topics: Indexed event parameters (up to 4)
data: Non-indexed event data
Firehose uses Protocol Buffers for efficient binary serialization:
// Example: Block message structure
message Block {
uint64 number = 1;
bytes hash = 2;
bytes parent_hash = 3;
google.protobuf.Timestamp timestamp = 4;
repeated Transaction transactions = 5;
// ... additional fields
}
Benefits
- Compact: Smaller message sizes than JSON
- Fast: Efficient serialization/deserialization
- Typed: Strong schema validation
- Backwards compatible: Versioned schemas
Throughput
Firehose is optimized for high-throughput extraction:
- Streaming: Continuous block delivery without polling
- Compression: Gzip reduces bandwidth by ~70%
- Batching: Multiple blocks per message
- Parallel: Multiple streams for different block ranges
Resource Usage
- Network: ~10-50 MB/s during initial sync (compressed)
- Memory: ~500 MB per stream
- CPU: Low overhead (protocol buffer deserialization)
Reliability
- Auto-retry: 5-second backoff on connection failures
- Resumable: Stream from specific block numbers
- Checkpointing: Progress tracked in metadata database
Example Configurations
StreamingFast Hosted Firehose
kind = "firehose"
network = "mainnet"
url = "grpc://mainnet.eth.streamingfast.io:443"
token = "${STREAMINGFAST_API_KEY}"
Self-Hosted Firehose
kind = "firehose"
network = "mainnet"
url = "grpc://firehose.internal.example.com:9000"
# No token for internal endpoints
Base L2 Firehose
kind = "firehose"
network = "base"
url = "${FIREHOSE_BASE_URL}"
token = "${FIREHOSE_BASE_TOKEN}"
Polygon Firehose
kind = "firehose"
network = "polygon"
url = "${FIREHOSE_POLYGON_URL}"
token = "${FIREHOSE_POLYGON_TOKEN}"
Comparison: Firehose vs EVM RPC
| Feature | Firehose | EVM RPC |
|---|
| Protocol | gRPC streaming | JSON-RPC |
| Call traces | ✅ Yes (calls table) | ❌ No (limited via debug_trace*) |
| Throughput | Very high | Moderate |
| Latency | Low (streaming) | Higher (request/response) |
| Compatibility | Requires Firehose endpoint | Any JSON-RPC endpoint |
| Setup complexity | Moderate (need credentials) | Low (standard RPC) |
| Cost | Paid service | Free to paid |
Use Firehose when you need call traces or maximum throughput. Use EVM RPC for simpler setups with standard endpoints.
Troubleshooting
Connection Failures
- Verify the gRPC endpoint URL is correct
- Check that the endpoint is accessible from your network
- Ensure authentication token is valid and not expired
- Test connectivity with
grpcurl:
grpcurl -H "authorization: bearer $TOKEN" firehose.example.com:443 list
Stream Errors
- Check Firehose endpoint logs for errors
- Verify block range is available on the endpoint
- Ensure sufficient network bandwidth
- Monitor for rate limiting or quota issues
Authentication Issues
- Verify token is set correctly in environment
- Check token hasn’t expired
- Ensure token has permissions for the requested network
- Test with a fresh token
- Check network bandwidth to Firehose endpoint
- Monitor CPU usage (protocol buffer deserialization)
- Verify sufficient disk I/O for Parquet writes
- Consider multiple workers for parallel extraction
Migration from Firehose to EVM RPC
Firehose is being deprecated in favor of EVM RPC for most use cases. Consider migrating to EVM RPC providers.
To migrate from Firehose to EVM RPC:
- Create equivalent EVM RPC provider configuration
- Update dataset manifests to use
evm-rpc kind
- Note: calls table data won’t be available via standard RPC
- Re-extract data starting from the same block range