Apache Spark Integration

Auto-generate your docs

Installation
Reading Vortex files
Reading from S3
Writing Vortex files
Write options
Save modes
Supported types

Vortex provides a Spark DataSource V2 connector for reading and writing Vortex files. The connector is published to Maven Central as dev.vortex:vortex-spark and is built against Spark 4.x with Scala 2.13.

Installation

implementation("dev.vortex:vortex-spark:<version>")

The connector ships as a shadow JAR that relocates its Arrow, Guava, and Protobuf dependencies to avoid classpath conflicts with Spark.

Reading Vortex files

Use the vortex format to read a single file or a directory of Vortex files:

Dataset<Row> df = spark.read()
    .format("vortex")
    .option("path", "/path/to/data.vortex")
    .load();

When pointed at a directory, the connector discovers all .vortex files and creates one read partition per file. Column pruning is pushed down — only the columns referenced by the query are read from the file.

Reading from S3

Dataset<Row> df = spark.read()
    .format("vortex")
    .option("path", "s3://bucket/path/to/data")
    .load();

Writing Vortex files

df.write()
    .format("vortex")
    .option("path", "/path/to/output")
    .mode(SaveMode.Overwrite)
    .save();

Each Spark partition produces one output file named part-{partitionId}-{taskId}.vortex.

Write options

Option	Default	Description
`vortex.write.batch.size`	`2048`	Number of rows per batch (1–65536)

Save modes

The connector supports all standard Spark save modes: Overwrite, Append, Ignore, and ErrorIfExists.

Supported types

Spark type	Vortex type
`BooleanType`	Bool
`ByteType`	Int8 / UInt8
`ShortType`	Int16 / UInt16
`IntegerType`	Int32 / UInt32
`LongType`	Int64 / UInt64
`FloatType`	Float32
`DoubleType`	Float64
`StringType`	Utf8
`BinaryType`	Binary
`DecimalType`	Decimal
`DateType`	Date (days)
`TimestampType`	Timestamp (microseconds, UTC)
`TimestampNTZType`	Timestamp (microseconds, no timezone)
`ArrayType`	List
`StructType`	Struct

DuckDB Integration

Polars Integration

Build docs developers (and LLMs) love

Get started for free Talk to us

Get Started

Core Concepts

Query Engine Integrations

Extending Vortex

Internals & Architecture

Apache Spark Integration

Installation

Reading Vortex files

Reading from S3

Writing Vortex files

Write options

Save modes

Supported types

Build docs developers (and LLMs) love

Get Started

Core Concepts

Query Engine Integrations

Extending Vortex

Internals & Architecture

Documentation Index

​Installation

​Reading Vortex files

​Reading from S3

​Writing Vortex files

​Write options

​Save modes

​Supported types

Build docs developers (and LLMs) love

Installation

Reading Vortex files

Reading from S3

Writing Vortex files

Write options

Save modes

Supported types