TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/vortex-data/vortex/llms.txt
Use this file to discover all available pages before exploring further.
vortex-data package provides Python bindings for reading and writing Vortex files, constructing arrays from Python objects or Apache Arrow tables, and querying data with filter and projection pushdown. It integrates with PyArrow, Pandas, and Polars out of the box.
Install vortex-data
Install the package using Verify the install by checking the version:
uv (recommended) or pip:Create an array and write it to a file
Use
vx.array() to construct a Vortex array from a Python list of dicts, then write it to disk with vx.io.write():vx.array() accepts Python lists, Apache Arrow arrays, and Arrow-compatible objects (including Pandas DataFrames and PyArrow tables). The file is written with BtrBlocks compression applied automatically.Read and filter with the scan API
Open a Vortex file with You can combine projection and filtering in a single scan:Targeted random access with row indicesIf you have an external index that identifies specific rows, pass a sorted, non-null Repeated scansFor workloads that execute the same scan many times (e.g., serving row ranges from an index), use
vx.open(), then call .scan() to read it back. Scans support filter pushdown via expressions and projection pushdown via an explicit column list—only the requested columns and rows are read from disk.The filter columns do not need to appear in the projection. Vortex reads the filter column from disk to evaluate the predicate, then returns only the projected columns for matching rows.
vx.array of row indices to skip all non-matching IO:to_repeated_scan() to prepare the scan once and execute it many times efficiently:Read as Arrow, Pandas, or Polars
VortexFile supports multiple output formats without extra copies:
Next Steps
- Rust Quickstart — use Vortex from Rust with async file IO
- Java Quickstart — Spark connector and JNI bindings
- Introduction — architecture overview and performance benchmarks