Overview
Arrow Flight is Apache Arrow’s RPC framework designed for high-performance data transfer. It’s the recommended transport for:- Production workloads requiring maximum throughput
- Streaming queries with real-time updates
- Applications that can consume Arrow data directly
- Large result sets with efficient memory usage
Key Features
- gRPC-based - Uses HTTP/2 for efficient multiplexing and streaming
- Columnar format - Apache Arrow’s in-memory columnar format for zero-copy reads
- Streaming support - Continuous query execution with incremental results
- Schema metadata - Rich type information included with results
Server Configuration
The Arrow Flight endpoint runs on port 1602 by default.Default Setup
Custom Port Configuration
.amp/config.toml
Flight-Only Mode
Run only the Arrow Flight endpoint:Request Flow
Arrow Flight uses a two-step process for query execution:- getFlightInfo - Submit SQL query, receive metadata and ticket
- doGet - Use ticket to retrieve results as streaming data
FlightInfo Metadata
TheFlightInfo response contains:
- Schema - Arrow schema describing result columns and types
- Ticket - Opaque token for retrieving results via
doGet - Endpoints - List of servers that can provide the data
FlightData Streaming
EachFlightData message contains:
- RecordBatch - Columnar data in Arrow format
- app_metadata - Optional metadata (used for streaming queries)
Client Examples
Python (pyarrow)
Python’spyarrow library provides excellent Arrow Flight support.
Installation
Basic Query
Streaming Query
Using Headers
You can override streaming mode using theamp-stream header:
Rust (arrow-flight)
Rust applications can use thearrow-flight crate.
Dependencies
Cargo.toml
Basic Query
Query Modes
Batch Queries
Default mode - query runs once and returns complete results:Streaming Queries
Continuous execution with incremental results:Headers
Arrow Flight supports custom headers for request metadata:| Header | Type | Description |
|---|---|---|
amp-stream | true or 1 | Override streaming mode |
amp-resume | JSON object | Resume streaming from cursor |
Resume Header Example
Streaming Metadata
For streaming queries,FlightData.app_metadata contains block range information:
Verifying Connection
Check if the Arrow Flight server is running:Performance Tips
Use columnar processing
Use columnar processing
Arrow’s columnar format enables efficient SIMD operations. Process entire columns rather than individual rows when possible.
Batch size tuning
Batch size tuning
For streaming queries, adjust
server_microbatch_max_interval to control batch sizes:Connection pooling
Connection pooling
Reuse Flight clients across queries to avoid connection overhead.
Schema caching
Schema caching
Cache Arrow schemas from
FlightInfo to avoid repeated schema parsing.Limitations
- Arrow format only - Results are always Apache Arrow RecordBatches
- gRPC transport - Requires gRPC client library support
- Schema required - Client must handle Arrow schema for data interpretation
- Binary data - Not human-readable without conversion
Next Steps
SQL Basics
Learn SQL syntax and query patterns
Streaming
Set up real-time streaming queries
JSON Lines
Alternative HTTP/JSON interface