Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vortex-data/vortex/llms.txt

Use this file to discover all available pages before exploring further.

Vortex is a Rust monorepo for columnar array processing, compression encodings, and file I/O. The workspace is divided into four main groups — core crates, encodings, language bindings, and query engine integrations — each with a well-defined role and dependency boundary.
All external consumers and third-party encoding authors should depend only on the top-level vortex crate. It re-exports everything they need and shields them from internal crate boundaries that may change across releases.

The vortex Crate

The vortex crate is the single entry point for external consumers. It re-exports types from vortex-array, vortex-file, vortex-scan, and the standard encodings, and calls each encoding’s initialize function during VortexSession::default(). This single-dependency design provides:
  • A stable API surface that does not expose internal crate boundaries.
  • Freedom to refactor internal crate splits without breaking downstream code.
  • Consistent versioning across the entire encoding and file ecosystem.
Third-party encodings implement their vtables against types re-exported from vortex, and query engine integrations build on the file reading and scan APIs exposed through it.

Core Crates

The core crates provide the foundation for the Vortex type system, array representation, file format, and I/O.
CrateRole
vortex-errorVortexError and VortexResult<T> types; vortex_err! / vortex_bail! macros
vortex-bufferZero-copy aligned Buffer<T> and BufferMut<T>, guaranteed aligned to T or a requested runtime alignment
vortex-maskBitmask operations for validity and selection
CrateRole
vortex-array/src/dtypeDType enum: Null, Bool, Primitive, UTF8, Binary, Struct, List, Extension
vortex-array/src/scalarSingle-value representations of each dtype
vortex-arrayCore Array trait, canonical encodings, vtable dispatch system, statistics
vortex-exprExpression representation and optimization
CrateRole
vortex-sessionSession object holding registries for encodings, layouts, and extension types
vortex-ioAsync I/O abstraction (local filesystem, object store, HTTP)
vortex-layoutLayout traits and built-in layouts (Flat, Struct, Chunked, Zoned, Dict)
vortex-ipcIPC format for inter-process communication
vortex-flatbuffersFlatBuffer schema definitions for arrays, layouts, footers, and IPC messages
vortex-file.vortex file reading and writing via LayoutReader
vortex-scanTable scan with filter and projection pushdown

Dependency Order

The crates form a strict dependency DAG. Building from the bottom up:
vortex-error
  └── vortex-buffer
        └── vortex-array  (includes dtype, scalar, vtable system)
              ├── vortex-session
              ├── vortex-io
              ├── vortex-layout
              ├── vortex-file
              ├── vortex-scan
              └── encodings/*
vortex-session depends on vortex-array for ArraySession and DTypeSession, and vortex-io adds RuntimeSession. Crates higher in the stack are never imported by those below them.

Encodings

Each encoding lives in its own crate under encodings/, implements the array vtable, and registers itself with the session. The standard set is bundled into the top-level vortex crate.
CrateTechnique
vortex-alpAdaptive Lossless floating-Point compression
vortex-fastlanesFastLanes bit-packing, delta, and frame-of-reference
vortex-fsstFast Static Symbol Table compression for strings
vortex-runendRun-end encoding for repetitive data
vortex-sparseSparse array encoding
vortex-zigzagZigZag encoding for signed integers
vortex-roaringRoaring bitmap encoding
vortex-dictDictionary encoding
vortex-byteboolByte-per-boolean encoding
vortex-datetime-partsDateTime field decomposition
vortex-decimal-byte-partsDecimal byte decomposition
vortex-sequenceArithmetic sequence encoding

Language Bindings

Vortex exposes a tiered API surface to non-Rust languages. Each tier is a strict superset of the one below it, ranging from basic Arrow I/O (Tier 0) through native array access (Tier 2) to full plugin registration (Tier 3).

Python (PyO3)

vortex-python/ — Python bindings via PyO3 and Maturin. Currently near Tier 2: native expressions, array inspection, and Arrow export. Tier 3 (plugin registration) is planned.

C FFI (cbindgen)

vortex-ffi/ — C API generated via cbindgen. This is the foundation ABI for all non-Rust bindings. Not yet ABI-stable; should be statically linked. Target: Tier 2.

C++ wrapper

vortex-cxx/ — Higher-level C++ RAII wrapper around the C FFI. CMake integration. The plan is to migrate from cxx to wrapping the C API directly. Target: Tier 2.

Java (JNI)

java/vortex-jni/ — JNI bindings for broad JDK compatibility. Used by Spark and Trino connectors. Target: Tier 1. A separate Panama FFI path targeting Tier 2 is planned for JDK 22+.

Query Engine Integrations

Crate / DirectoryEngineNotes
vortex-datafusion/DataFusionTableProvider and FileFormat integration
vortex-duckdb/DuckDBTable function integration using CurrentThreadRuntime
java/vortex-spark_{2.12,2.13}/SparkDataSource V2 connector via JNI
java/vortex-trino/TrinoConnector in development

Other Crates

CrateRole
vortex-cudaGPU-accelerated decompression and compute (Linux only)
vortex-tuiTerminal UI for inspecting Vortex files
vortex-benchBenchmark harness and data generators

Building a Single Crate

When iterating on a specific crate, prefer narrow builds to keep feedback loops fast:
cargo build -p vortex-array
cargo nextest run -p vortex-array
Use workspace-wide builds only when a change spans crate boundaries:
cargo build --workspace

Build docs developers (and LLMs) love