Vortex provides aDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/vortex-data/vortex/llms.txt
Use this file to discover all available pages before exploring further.
VortexDatasource for Ray Data that reads .vortex files in distributed Ray pipelines. Each file in a directory becomes a read partition, and the datasource supports column projection and filter pushdown to minimize I/O across the cluster.
Installation
Reading Vortex files with Ray
Write some Vortex files
Prepare one or more
.vortex files in a directory. Each file will become a read partition in Ray:Column projection and filtering
VortexDatasource accepts optional columns and filter arguments to push projection and predicate evaluation into the scan, reducing the amount of data read across the cluster:
VortexDatasource reference
VortexDatasource accepts the following constructor arguments:
| Argument | Type | Description |
|---|---|---|
url | str | Path to a directory of .vortex files |
columns | list[str] | None | Columns to project. Reads all columns if omitted. |
filter | pc.Expression | VortexExpr | None | Predicate to push into the scan |
batch_size | int | None | Maximum number of rows per batch |
meta_provider | BaseFileMetadataProvider | Custom metadata provider for file discovery |
Distributed processing
VortexDatasource sets supports_distributed_reads = True, which means Ray will schedule read tasks across the cluster rather than concentrating all reads on the driver node. The parallelism is controlled by the parallelism argument passed to read_datasource, and files are distributed across tasks as evenly as possible.
Ray does not start correctly inside a
uv run environment. If you are running Ray locally for development, activate your virtual environment with source .venv/bin/activate before starting Ray.