Documentation Index
Fetch the complete documentation index at: https://mintlify.com/vortex-data/vortex/llms.txt
Use this file to discover all available pages before exploring further.
The vortex-datafusion crate integrates Vortex as a native DataFusion FileFormat, supporting both reads and writes with filter, projection, and limit pushdown. You can query .vortex files using standard SQL or the Rust API directly.
Setup
Add the dependency
Add vortex-datafusion to your Cargo.toml:[dependencies]
vortex-datafusion = "<version>"
Register VortexFormat with SessionContext
Build a SessionContext with the Vortex file format registered:use std::sync::Arc;
use datafusion::datasource::provider::DefaultTableFactory;
use datafusion::execution::SessionStateBuilder;
use datafusion::prelude::SessionContext;
use datafusion_common::GetExt;
use vortex_datafusion::VortexFormatFactory;
let factory = Arc::new(VortexFormatFactory::new());
let state = SessionStateBuilder::new()
.with_default_features()
.with_table_factory(
factory.get_ext().to_uppercase(),
Arc::new(DefaultTableFactory::new()),
)
.with_file_formats(vec![factory])
.build();
let ctx = SessionContext::new_with_state(state).enable_url_table();
Reading Vortex files
SQL
Create an external table pointing at a directory of .vortex files, then query it with standard SQL:
ctx.sql(
"CREATE EXTERNAL TABLE my_table \
(name VARCHAR NOT NULL, age INT NOT NULL) \
STORED AS vortex \
LOCATION '/demo/'",
)
.await?;
let result = ctx
.sql("SELECT name, age FROM my_table WHERE age > 28 ORDER BY age")
.await?
.collect()
.await?;
You can also query individual files directly using the URL table syntax:
SELECT * FROM '/path/to/data.vortex'
Rust API
Register a ListingTable programmatically using VortexFormat:
use std::sync::Arc;
use datafusion::datasource::listing::{
ListingOptions, ListingTable, ListingTableConfig, ListingTableUrl,
};
use datafusion::prelude::SessionContext;
use vortex_datafusion::VortexFormat;
use vortex::session::VortexSession;
let session = VortexSession::default().with_tokio();
let ctx = SessionContext::new();
let format = Arc::new(VortexFormat::new(session));
let table_url = ListingTableUrl::parse("/path/to/data.vortex")?;
let config = ListingTableConfig::new(table_url)
.with_listing_options(
ListingOptions::new(format).with_session_config_options(ctx.state().config()),
)
.infer_schema(&ctx.state())
.await?;
let listing_table = Arc::new(ListingTable::try_new(config)?);
ctx.register_table("vortex_tbl", listing_table as _)?;
Writing Vortex files
Write query results to Vortex using INSERT INTO on an external table:
ctx.sql(
"CREATE EXTERNAL TABLE my_table \
(name VARCHAR NOT NULL, age INT NOT NULL) \
STORED AS vortex \
LOCATION '/demo/'",
)
.await?;
ctx.sql(
"INSERT INTO my_table VALUES \
('Alice', 30), ('Bob', 25), ('Charlie', 35), ('Diana', 28)",
)
.await?
.collect()
.await?;
Partitioned writes are supported. DataFusion automatically creates subdirectories for each partition value when writing to a partitioned external table.
Pushdown support
Filters, projections, and limits are pushed down into the Vortex scan so that only the data needed by your query is read and decompressed.
| Pushdown type | Details |
|---|
| Projections | Only referenced columns are read and decompressed. |
| Filters | Comparison (=, <, >), logical (AND, OR, NOT), IN, LIKE, IS NULL, and cast expressions are evaluated during the scan. Unsupported filters fall back to DataFusion post-scan evaluation. |
| Limits | Applied at the scan level when no filter is present. |
| File pruning | Files are eliminated without being opened based on partition values and file-level column statistics (min/max). |
Querying with pushdown
// Only `name` and `age` columns are read; the WHERE clause is pushed into the scan.
let result = ctx
.sql("SELECT name, age FROM my_table WHERE age > 28 ORDER BY age")
.await?
.collect()
.await?;