Overview
Arrow Flight is Apache Arrow’s RPC framework for high-performance data transfer over gRPC. This is the recommended transport for:- Production workloads requiring high throughput
- Streaming queries with real-time data updates
- Applications that can consume Arrow data directly
- Use cases requiring reorg detection and watermarking
Endpoint
- Protocol: gRPC (Arrow Flight)
- Default Port:
1602 - Default Address:
0.0.0.0:1602 - Configuration: Set via
flight_addrin config.toml orAMP_CONFIG_FLIGHT_ADDRenvironment variable
Request Flow
- Client calls
getFlightInfowith SQL query inFlightDescriptor - Server parses SQL, builds query plan, returns
FlightInfowith schema and ticket - Client calls
doGetwith the ticket to retrieve results - Server executes query and streams
FlightDatamessages - Each message contains Arrow RecordBatch with result rows
Query Modes
Batch Queries
Default mode when noSETTINGS clause is specified. Query runs once and returns complete results.
Streaming Queries
Continuous execution with incremental results. Enabled by addingSETTINGS stream = true to your SQL query.
Headers
Custom headers can be used to control query behavior:| Header | Description | Values |
|---|---|---|
amp-stream | Override streaming mode | true or 1 to enable streaming |
amp-resume | Resume streaming from cursor | JSON-encoded cursor object |
Streaming Metadata
For streaming queries, eachFlightData message includes app_metadata containing block range information:
ranges: Array of block ranges covered by this batchranges_complete:truewhen this represents a watermark (ranges completed),falsefor data batches
Client Libraries
Python (pyarrow)
Install the PyArrow Flight SQL client:Rust (arrow-flight)
Add to yourCargo.toml:
Amp Rust Client (amp-client)
The official Amp Rust client provides additional features like reorg detection and state management:Authentication
Bearer Token (Python)
Bearer Token (Rust)
Response Format
All responses are returned as Apache Arrow RecordBatches with the following characteristics:- Columnar format: Data organized in columns for efficient processing
- Typed schema: Each column has a specific Arrow data type
- Zero-copy: Efficient memory usage and serialization
- Streaming: Results can be processed incrementally
Limitations
- Arrow format only: Results are always Apache Arrow RecordBatches (use JSONL endpoint for JSON output)
- gRPC transport: Requires gRPC client library support
- Schema required: Client must handle Arrow schema for data interpretation
Performance Considerations
Message Size Limits
The default gRPC max decoding message size is 32 MiB. For large result sets, you may need to increase this:Connection Pooling
Reuse the same client instance for multiple queries to avoid connection overhead:Error Handling
Invalid SQL
If the SQL query is malformed, the server returns a gRPC error duringgetFlightInfo:
Streaming Errors
Errors during query execution are returned in thedoGet stream:
See Also
- JSON Lines Protocol - Simple HTTP alternative
- SQL Basics - SQL syntax and features
- Streaming Queries - Guide to streaming query execution