Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vortex-data/vortex/llms.txt

Use this file to discover all available pages before exploring further.

A core principle of Vortex is that its data types — called DTypes — are logical rather than physical. The dtype defines the domain of values an array may hold, but says nothing about how that data is actually stored in memory. A u32 dtype represents an unsigned integer between 0 and 2^32 - 1 regardless of whether the underlying array is stored flat, dictionary-encoded, run-length encoded, or bit-packed. This separation between logical and physical representation is what enables Vortex’s most powerful features — including performing compute directly on compressed data without first decompressing it.
Vortex has no concept of a schema. Instead, it uses a struct dtype to represent columnar data. This means you can write a Vortex file containing a single integer array just as easily as one with many columns.

Logical types

Every DType can be marked as either nullable or non-nullable. Nullability is part of the type itself, not a separate field annotation as in Apache Arrow.
DTypeDomain
Nullnull only
Booltrue, false
PrimitiveFixed-width numeric values (see below)
DecimalFixed-precision real numbers
Utf8Variable-length valid UTF-8 encoded strings
BinaryArbitrary variable-length bytes
ListVariable-length sequences of an element type
FixedSizeListFixed-length sequences of an element type
StructOrdered collection of named fields
ExtensionUser-defined types built on storage types
Some logical types are not yet supported, including fixed-length binary, maps, and variants. These may be added in future versions.

Primitive types

The Primitive dtype is an enumeration of fixed-width numeric types. It is parameterized by a PType that selects the specific numeric format:
PTypeDomainWidth
I88-bit signed integer1 byte
I1616-bit signed integer2 bytes
I3232-bit signed integer4 bytes
I6464-bit signed integer8 bytes
U88-bit unsigned integer1 byte
U1616-bit unsigned integer2 bytes
U3232-bit unsigned integer4 bytes
U6464-bit unsigned integer8 bytes
F16IEEE 754-2008 half precision2 bytes
F32IEEE 754-1985 single precision4 bytes
F64IEEE 754-1985 double precision8 bytes
For example, a DType::Primitive(PType::I32, NonNullable) represents a non-nullable 32-bit signed integer array — regardless of whether it is stored as a flat buffer, run-length encoded, or bit-packed in memory.

Composite types

A List dtype has a single element type — itself a logical dtype — and represents an array of variable-length sequences. For example, a list of strings has element type Utf8.A FixedSizeList is similar, but every sequence in the array has the same fixed length. The element count per entry is part of the dtype itself.
A Struct dtype is an ordered collection of named fields, each with its own dtype. Structs are how Vortex represents multi-column (tabular) data. There is no separate concept of a schema — a struct dtype is the schema.Fields may have different nullabilities from one another. Each field’s dtype is independent, allowing a struct to mix integers, strings, lists, and nested structs.
An Extension dtype is a user-defined logical type composed of an id, a storage dtype, and an optional metadata field. The id and metadata together may restrict the domain of the underlying storage dtype.For example, a vortex.date type is logically stored as a U32 representing the number of days since the Unix epoch. The storage dtype carries the physical data; the extension id and metadata give it semantic meaning.Extension types allow Vortex to represent domain-specific types — such as timestamps, geospatial coordinates, or decimals with specific precision — without requiring changes to the core type system.

Nullability

Every DType (except Null, which is always null) carries an explicit nullability marker: either NonNullable or Nullable. This is different from Apache Arrow, where nullability is a property of a Field rather than the type itself.
// Non-nullable 32-bit integer
DType::Primitive(PType::I32, Nullability::NonNullable)

// Nullable UTF-8 string
DType::Utf8(Nullability::Nullable)
Non-nullable types allow Vortex to skip allocating or reading validity bitmaps, which saves both memory and I/O.

Differences from Apache Arrow

Vortex dtypes are intentionally simpler and more uniform than Arrow’s type system:
  • Nullability is part of the dtype in Vortex; in Arrow it belongs to a Field.
  • No duplicate types — Arrow has string and large_string, both representing UTF-8. Vortex has a single Utf8.
  • Encodings are separate — Arrow uses types like dictionary to describe physical layout. In Vortex, encodings are an entirely separate concept from dtypes.
  • No first-class date/time types — Timestamps and dates are represented as Extension dtypes built on primitive storage, keeping the core type system small.
  • No schemas — A struct dtype serves the same purpose, allowing columnar and non-columnar data to be described uniformly.

Build docs developers (and LLMs) love