Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vortex-data/vortex/llms.txt

Use this file to discover all available pages before exploring further.

Vortex arrays are physical representations of data — each encoding defines how values are stored in memory and how they are decoded back into a canonical form. Built-in encodings include run-end encoding, dictionary encoding, and FastLanes bit-packing. You can extend Vortex with your own encoding by implementing the VTable trait and registering it with a session.
This part of the Vortex documentation is still being written. The content below reflects the current source code in vortex-array/src/array/vtable/. For questions or guidance not yet covered here, join the Vortex Slack or open a GitHub Discussion.

What is an encoding?

An encoding is the unit of physical representation in Vortex. When Vortex reads an array from disk or receives one over IPC, it dispatches deserialization through the encoding registered for that array’s ID. When it executes (canonicalizes) an array, it calls the encoding’s execute method to decode values into a canonical form. Every encoding is identified by a namespaced string ID such as "vortex.runend". Encodings can hold child arrays in named slots (e.g., ends and values for run-end encoding) and optional byte buffers for raw data.

The VTable trait

The core trait to implement is VTable from vortex_array::vtable. It controls serialization, deserialization, execution (canonicalization), and child management for your encoding.
pub trait VTable: 'static + Clone + Sized + Send + Sync + Debug {
    /// Per-array metadata stored alongside the type, dtype, and length.
    type TypedArrayData: 'static + Send + Sync + Clone + Debug + Display + ArrayHash + ArrayEq;

    /// VTable for scalar-at and other operations.
    type OperationsVTable: OperationsVTable<Self>;

    /// VTable for computing array validity.
    type ValidityVTable: ValidityVTable<Self>;

    /// The unique identifier for this encoding.
    fn id(&self) -> ArrayId;

    /// Validate that the given metadata, dtype, length, and slots are consistent.
    fn validate(
        &self,
        data: &Self::TypedArrayData,
        dtype: &DType,
        len: usize,
        slots: &[Option<ArrayRef>],
    ) -> VortexResult<()>;

    /// Number of raw byte buffers this encoding stores directly.
    fn nbuffers(array: ArrayView<'_, Self>) -> usize;

    /// Retrieve a buffer by index.
    fn buffer(array: ArrayView<'_, Self>, idx: usize) -> BufferHandle;

    /// Optional name for a buffer (used in debugging / IPC).
    fn buffer_name(array: ArrayView<'_, Self>, idx: usize) -> Option<String>;

    /// Serialize per-array metadata to bytes for IPC or file storage.
    fn serialize(
        array: ArrayView<'_, Self>,
        session: &VortexSession,
    ) -> VortexResult<Option<Vec<u8>>>;

    /// Reconstruct an array from serialized components.
    fn deserialize(
        &self,
        dtype: &DType,
        len: usize,
        metadata: &[u8],
        buffers: &[BufferHandle],
        children: &dyn ArrayChildren,
        session: &VortexSession,
    ) -> VortexResult<ArrayParts<Self>>;

    /// Return a name for the slot at the given index (used in debugging).
    fn slot_name(array: ArrayView<'_, Self>, idx: usize) -> String;

    /// Decode this array, returning an ExecutionResult.
    ///
    /// Use ExecutionResult::execute_slot to request that a child be executed first.
    /// Use ExecutionResult::done when you can produce a result directly.
    fn execute(array: Array<Self>, ctx: &mut ExecutionCtx) -> VortexResult<ExecutionResult>;
}
Two sub-traits are attached via associated types:
  • OperationsVTable — provides scalar_at for indexed element access.
  • ValidityVTable — computes the Validity of a nullable array.

Step-by-step: writing a new encoding

1

Define your VTable struct and per-array data

Create a zero-sized struct for the vtable and a data struct for per-array state. Use prost to serialize metadata if it needs to be persisted.
use vortex_array::vtable::VTable;
use vortex_session::registry::CachedId;

/// The vtable marker type for MyEncoding.
#[derive(Clone, Debug)]
pub struct MyEncoding;

/// Per-array data held alongside dtype and length.
#[derive(Clone, Debug)]
pub struct MyEncodingData {
    pub offset: usize,
}
MyEncodingData must implement Display, ArrayHash, and ArrayEq from vortex_array.
2

Implement the VTable trait

Implement VTable for MyEncoding. The id() must return a stable, unique namespaced string. Use CachedId to avoid repeated string allocations.
impl VTable for MyEncoding {
    type TypedArrayData = MyEncodingData;
    type OperationsVTable = NotSupported; // or your own impl
    type ValidityVTable = Self;           // or MyEncoding if you impl ValidityVTable

    fn id(&self) -> ArrayId {
        static ID: CachedId = CachedId::new("myorg.myencoding");
        *ID
    }

    fn validate(
        &self,
        data: &Self::TypedArrayData,
        dtype: &DType,
        len: usize,
        slots: &[Option<ArrayRef>],
    ) -> VortexResult<()> {
        // Check that children are present and types match
        Ok(())
    }

    fn nbuffers(_array: ArrayView<'_, Self>) -> usize { 0 }

    fn buffer(_array: ArrayView<'_, Self>, idx: usize) -> BufferHandle {
        vortex_panic!("no buffers")
    }

    fn buffer_name(_array: ArrayView<'_, Self>, _idx: usize) -> Option<String> {
        None
    }

    fn serialize(
        array: ArrayView<'_, Self>,
        _session: &VortexSession,
    ) -> VortexResult<Option<Vec<u8>>> {
        // Encode your metadata using prost or similar
        Ok(Some(vec![]))
    }

    fn deserialize(
        &self,
        dtype: &DType,
        len: usize,
        metadata: &[u8],
        _buffers: &[BufferHandle],
        children: &dyn ArrayChildren,
        _session: &VortexSession,
    ) -> VortexResult<ArrayParts<Self>> {
        // Reconstruct children and return ArrayParts
        let child = children.get(0, dtype, len)?;
        let slots = vec![Some(child)];
        let data = MyEncodingData { offset: 0 };
        Ok(ArrayParts::new(self.clone(), dtype.clone(), len, data)
            .with_slots(slots))
    }

    fn slot_name(_array: ArrayView<'_, Self>, idx: usize) -> String {
        ["values"][idx].to_string()
    }

    fn execute(array: Array<Self>, ctx: &mut ExecutionCtx) -> VortexResult<ExecutionResult> {
        // Decode and return the canonical form.
        // Use ExecutionResult::execute_slot(idx) to request a child first,
        // or ExecutionResult::done(result) to return directly.
        let child = array.slots()[0]
            .as_ref()
            .vortex_expect("child slot")
            .clone();
        Ok(ExecutionResult::done(child))
    }
}
3

Implement ValidityVTable

If your encoding is nullable, implement ValidityVTable<MyEncoding> for MyEncoding (or a helper type):
use vortex_array::vtable::ValidityVTable;
use vortex_array::validity::Validity;

impl ValidityVTable<MyEncoding> for MyEncoding {
    fn validity(array: ArrayView<'_, MyEncoding>) -> VortexResult<Validity> {
        // Delegate to a child that holds validity,
        // or return AllValid / NonNullable as appropriate.
        array.slots()[0]
            .as_ref()
            .vortex_expect("child")
            .validity()
    }
}
4

Register parent kernels (optional)

If your encoding can intercept operations performed by a parent array (e.g., slicing a run-end encoded child), implement ExecuteParentKernel and register it in a ParentKernelSet. Return Ok(None) to decline and fall through to the next kernel.
use vortex_array::kernel::{ExecuteParentKernel, ParentKernelSet};

#[derive(Debug)]
struct MySliceKernel;

impl ExecuteParentKernel<MyEncoding> for MySliceKernel {
    type Parent = Slice; // the parent array type this handles

    fn execute_parent(
        &self,
        array: ArrayView<'_, MyEncoding>,
        parent: ArrayView<'_, Slice>,
        _child_idx: usize,
        ctx: &mut ExecutionCtx,
    ) -> VortexResult<Option<ArrayRef>> {
        // Return Some(result) or Ok(None) to decline
        Ok(None)
    }
}

pub(crate) const PARENT_KERNELS: ParentKernelSet<MyEncoding> =
    ParentKernelSet::new(&[ParentKernelSet::lift(&MySliceKernel)]);
Then override execute_parent in your VTable impl:
fn execute_parent(
    array: ArrayView<'_, Self>,
    parent: &ArrayRef,
    child_idx: usize,
    ctx: &mut ExecutionCtx,
) -> VortexResult<Option<ArrayRef>> {
    PARENT_KERNELS.execute(array, parent, child_idx, ctx)
}
5

Register with a session

Vortex encodings are registered per session through ArraySession. Call register before decoding arrays with your encoding:
use vortex_array::session::ArraySession;
use vortex_session::VortexSession;

let session = VortexSession::empty()
    .with::<ArraySession>();

session.arrays().register(MyEncoding);
For the default session used in tests (LEGACY_SESSION), registrations are global. For production use, build a session explicitly and pass it through your pipeline.

Testing your encoding

Use assert_arrays_eq! for array comparisons and VortexResult<()> return types in tests. A minimal round-trip test pattern:
#[cfg(test)]
mod tests {
    use vortex_array::{IntoArray, VortexSessionExecute, assert_arrays_eq};
    use vortex_buffer::buffer;
    use vortex_error::VortexResult;

    #[test]
    fn roundtrip() -> VortexResult<()> {
        // Build your encoded array
        let encoded = MyEncoding::encode(buffer![1i32, 1, 2, 3].into_array())?;

        // Canonicalize via session execution
        let mut ctx = SESSION.create_execution_ctx();
        let canonical = encoded.into_array().execute_as::<PrimitiveArray>("test", &mut ctx)?;

        let expected = buffer![1i32, 1, 2, 3].into_array();
        assert_arrays_eq!(canonical, expected);
        Ok(())
    }
}

Further reading

  • vortex-array/src/array/vtable/mod.rs — full VTable trait definition
  • encodings/runend/src/array.rsRunEnd encoding as a complete reference implementation
  • encodings/fastlanes/src/ — FastLanes bit-packing and delta encodings
  • GitHub Discussions — ask questions about the encoding model

Build docs developers (and LLMs) love