Use this file to discover all available pages before exploring further.
Vortex provides two JVM integration paths: a Spark connector (vortex-spark) for reading and writing Vortex tables from Apache Spark jobs, and a lower-level JNI library for direct access to Vortex arrays and files from Java or Kotlin without a Spark cluster.
Java support is under active development. The API surface may change between releases. For production use cases, track the GitHub releases page for stability announcements.
Replace LATEST with the current version shown on Maven Central. You can also use the Maven badge in the Vortex README to get the latest published version.
2
Write a Vortex table from Spark
Register the Vortex data source and write a DataFrame to a .vortex file using the vortex format specifier:
import org.apache.spark.sql.SparkSession;import org.apache.spark.sql.Dataset;import org.apache.spark.sql.Row;SparkSession spark = SparkSession.builder() .appName("VortexExample") .getOrCreate();// Read from an existing Parquet fileDataset<Row> df = spark.read() .parquet("yellow_tripdata_2024-01.parquet");// Write as a Vortex filedf.write() .format("vortex") .save("yellow_tripdata_2024-01.vortex");
3
Read a Vortex table from Spark
Read a Vortex file back into a Spark DataFrame and run SQL queries. Filter and projection pushdown are applied automatically:
Dataset<Row> vortexDf = spark.read() .format("vortex") .load("yellow_tripdata_2024-01.vortex");vortexDf.createOrReplaceTempView("trips");spark.sql( "SELECT PULocationID, COUNT(*) AS num_trips, " + "ROUND(AVG(trip_distance), 2) AS avg_distance " + "FROM trips " + "GROUP BY PULocationID " + "ORDER BY num_trips DESC " + "LIMIT 10").show();
For use cases that do not involve Spark—such as embedding Vortex reads inside a Java service or a custom query engine—the JNI library exposes Vortex’s core array and file APIs directly.