Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vortex-data/vortex/llms.txt

Use this file to discover all available pages before exploring further.

Vortex provides two JVM integration paths: a Spark connector (vortex-spark) for reading and writing Vortex tables from Apache Spark jobs, and a lower-level JNI library for direct access to Vortex arrays and files from Java or Kotlin without a Spark cluster.
Java support is under active development. The API surface may change between releases. For production use cases, track the GitHub releases page for stability announcements.

Spark Connector

The Spark connector lets you read and write Vortex files using the standard Spark DataFrame and SQL APIs.
1

Add the dependency

The connector is published to Maven Central under the dev.vortex group.
<dependency>
  <groupId>dev.vortex</groupId>
  <artifactId>vortex-spark</artifactId>
  <version>LATEST</version>
</dependency>
Replace LATEST with the current version shown on Maven Central. You can also use the Maven badge in the Vortex README to get the latest published version.
2

Write a Vortex table from Spark

Register the Vortex data source and write a DataFrame to a .vortex file using the vortex format specifier:
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;

SparkSession spark = SparkSession.builder()
    .appName("VortexExample")
    .getOrCreate();

// Read from an existing Parquet file
Dataset<Row> df = spark.read()
    .parquet("yellow_tripdata_2024-01.parquet");

// Write as a Vortex file
df.write()
    .format("vortex")
    .save("yellow_tripdata_2024-01.vortex");
3

Read a Vortex table from Spark

Read a Vortex file back into a Spark DataFrame and run SQL queries. Filter and projection pushdown are applied automatically:
Dataset<Row> vortexDf = spark.read()
    .format("vortex")
    .load("yellow_tripdata_2024-01.vortex");

vortexDf.createOrReplaceTempView("trips");

spark.sql(
    "SELECT PULocationID, COUNT(*) AS num_trips, " +
    "ROUND(AVG(trip_distance), 2) AS avg_distance " +
    "FROM trips " +
    "GROUP BY PULocationID " +
    "ORDER BY num_trips DESC " +
    "LIMIT 10"
).show();

JNI Library (Standalone)

For use cases that do not involve Spark—such as embedding Vortex reads inside a Java service or a custom query engine—the JNI library exposes Vortex’s core array and file APIs directly.
1

Add the JNI dependency

<dependency>
  <groupId>dev.vortex</groupId>
  <artifactId>vortex-jni</artifactId>
  <version>LATEST</version>
</dependency>
2

Open and read a Vortex file

The JNI library mirrors the Rust and Python APIs: open a file, create a scan, apply predicates, and read results as Arrow record batches.
import dev.vortex.VortexFile;
import dev.vortex.VortexScan;
import org.apache.arrow.vector.ipc.ArrowReader;

try (VortexFile file = VortexFile.open("example.vortex")) {
    VortexScan scan = file.scan()
        .withProjection("name", "age")
        .withFilter("age > 30");

    try (ArrowReader reader = scan.toArrow()) {
        while (reader.loadNextBatch()) {
            System.out.println(reader.getVectorSchemaRoot().contentToTSVString());
        }
    }
}

Planned Features

The Java integration roadmap includes:
  • Reading and writing Vortex files without Spark via the standalone JNI library
  • Full Apache Arrow Java integration for zero-copy data transfer
  • Apache Iceberg table format support (coming soon)
Follow progress in the vortex-data/vortex repository or join the community Slack for updates.

Next Steps

Build docs developers (and LLMs) love