Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/vortex-data/vortex/llms.txt

Use this file to discover all available pages before exploring further.

Vortex provides a Spark DataSource V2 connector for reading and writing Vortex files. The connector is published to Maven Central as dev.vortex:vortex-spark and is built against Spark 4.x with Scala 2.13.

Installation

implementation("dev.vortex:vortex-spark:<version>")
The connector ships as a shadow JAR that relocates its Arrow, Guava, and Protobuf dependencies to avoid classpath conflicts with Spark.

Reading Vortex files

Use the vortex format to read a single file or a directory of Vortex files:
Dataset<Row> df = spark.read()
    .format("vortex")
    .option("path", "/path/to/data.vortex")
    .load();
When pointed at a directory, the connector discovers all .vortex files and creates one read partition per file. Column pruning is pushed down — only the columns referenced by the query are read from the file.

Reading from S3

Dataset<Row> df = spark.read()
    .format("vortex")
    .option("path", "s3://bucket/path/to/data")
    .load();

Writing Vortex files

df.write()
    .format("vortex")
    .option("path", "/path/to/output")
    .mode(SaveMode.Overwrite)
    .save();
Each Spark partition produces one output file named part-{partitionId}-{taskId}.vortex.

Write options

OptionDefaultDescription
vortex.write.batch.size2048Number of rows per batch (1–65536)

Save modes

The connector supports all standard Spark save modes: Overwrite, Append, Ignore, and ErrorIfExists.

Supported types

Spark typeVortex type
BooleanTypeBool
ByteTypeInt8 / UInt8
ShortTypeInt16 / UInt16
IntegerTypeInt32 / UInt32
LongTypeInt64 / UInt64
FloatTypeFloat32
DoubleTypeFloat64
StringTypeUtf8
BinaryTypeBinary
DecimalTypeDecimal
DateTypeDate (days)
TimestampTypeTimestamp (microseconds, UTC)
TimestampNTZTypeTimestamp (microseconds, no timezone)
ArrayTypeList
StructTypeStruct

Build docs developers (and LLMs) love