Java Quickstart

Vortex provides two JVM integration paths: a Spark connector (vortex-spark) for reading and writing Vortex tables from Apache Spark jobs, and a lower-level JNI library for direct access to Vortex arrays and files from Java or Kotlin without a Spark cluster.

Java support is under active development. The API surface may change between releases. For production use cases, track the GitHub releases page for stability announcements.

Spark Connector

The Spark connector lets you read and write Vortex files using the standard Spark DataFrame and SQL APIs.

Add the dependency

The connector is published to Maven Central under the dev.vortex group.

<dependency>
  <groupId>dev.vortex</groupId>
  <artifactId>vortex-spark</artifactId>
  <version>LATEST</version>
</dependency>

Replace LATEST with the current version shown on Maven Central. You can also use the Maven badge in the Vortex README to get the latest published version.

Write a Vortex table from Spark

import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;

SparkSession spark = SparkSession.builder()
    .appName("VortexExample")
    .getOrCreate();

// Read from an existing Parquet file
Dataset<Row> df = spark.read()
    .parquet("yellow_tripdata_2024-01.parquet");

// Write as a Vortex file
df.write()
    .format("vortex")
    .save("yellow_tripdata_2024-01.vortex");

Read a Vortex table from Spark

Read a Vortex file back into a Spark DataFrame and run SQL queries. Filter and projection pushdown are applied automatically:

Dataset<Row> vortexDf = spark.read()
    .format("vortex")
    .load("yellow_tripdata_2024-01.vortex");

vortexDf.createOrReplaceTempView("trips");

spark.sql(
    "SELECT PULocationID, COUNT(*) AS num_trips, " +
    "ROUND(AVG(trip_distance), 2) AS avg_distance " +
    "FROM trips " +
    "GROUP BY PULocationID " +
    "ORDER BY num_trips DESC " +
    "LIMIT 10"
).show();

JNI Library (Standalone)

For use cases that do not involve Spark—such as embedding Vortex reads inside a Java service or a custom query engine—the JNI library exposes Vortex’s core array and file APIs directly.

Add the JNI dependency

<dependency>
  <groupId>dev.vortex</groupId>
  <artifactId>vortex-jni</artifactId>
  <version>LATEST</version>
</dependency>

Open and read a Vortex file

The JNI library mirrors the Rust and Python APIs: open a file, create a scan, apply predicates, and read results as Arrow record batches.

import dev.vortex.VortexFile;
import dev.vortex.VortexScan;
import org.apache.arrow.vector.ipc.ArrowReader;

try (VortexFile file = VortexFile.open("example.vortex")) {
    VortexScan scan = file.scan()
        .withProjection("name", "age")
        .withFilter("age > 30");

    try (ArrowReader reader = scan.toArrow()) {
        while (reader.loadNextBatch()) {
            System.out.println(reader.getVectorSchemaRoot().contentToTSVString());
        }
    }
}

Planned Features

The Java integration roadmap includes:

Reading and writing Vortex files without Spark via the standalone JNI library
Full Apache Arrow Java integration for zero-copy data transfer
Apache Iceberg table format support (coming soon)

Follow progress in the vortex-data/vortex repository or join the community Slack for updates.

Next Steps

Python Quickstart — Python bindings with scan and filter pushdown
Rust Quickstart — native Rust API with async file IO
Introduction — architecture overview and performance benchmarks

Get Started

Core Concepts

Query Engine Integrations

Extending Vortex

Internals & Architecture

Java Quickstart

Spark Connector

JNI Library (Standalone)

Planned Features

Next Steps

Build docs developers (and LLMs) love

Get Started

Core Concepts

Query Engine Integrations

Extending Vortex

Internals & Architecture

Documentation Index

​Spark Connector

​JNI Library (Standalone)

​Planned Features

​Next Steps

Build docs developers (and LLMs) love

Spark Connector

JNI Library (Standalone)

Planned Features

Next Steps