Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/apache/wayang/llms.txt

Use this file to discover all available pages before exploring further.

Apache Wayang is the first open-source cross-platform data processing system. Write your pipeline once against a single unified API, register the engines you have, and let Wayang’s cost-based optimizer decide which platform to use for each operator — or take full control and pin execution to a specific engine. When your data outgrows one machine, you don’t rewrite anything; you just make another engine available.

Introduction

Understand what Wayang is, how it works, and where it fits in your data stack.

Quickstart

Run your first cross-platform pipeline in minutes with a working WordCount example.

Installation

Add Wayang to your Maven project or build from source with full dependency details.

Core Concepts

Learn the architecture, optimizer, operator model, and plugin system that power Wayang.

Explore by Topic

Java API

Fluent JavaPlanBuilder and DataQuanta for building pipelines in Java.

Scala API

Idiomatic Scala PlanBuilder with full type inference.

Python API

Build and execute Wayang plans from Python with PyWayang.

SQL API

Run SQL queries across platforms via the Apache Calcite integration.

Platforms

Supported engines: Java Streams, Spark, Flink, PostgreSQL, SQLite, Kafka, and more.

Configuration

Tune the optimizer, logging, and per-platform settings via properties files.

How It Works

1

Write your pipeline once

Use the Java, Scala, Python, or SQL API to build a logical plan. Your pipeline is engine-agnostic — no platform-specific code required.
2

Register your engines

Call .withPlugin(Java.basicPlugin()), .withPlugin(Spark.basicPlugin()), or any combination of supported platforms. Wayang will consider only the engines you register.
3

Let the optimizer choose

Wayang’s cost-based optimizer estimates cardinality and load for each operator, then assigns each step to the cheapest available platform — mixing engines within a single job when beneficial.
4

Scale without rewriting

When your dataset grows, add more engines. The pipeline code stays the same; only the registered platforms change.
Apache Wayang is available on Maven Central as org.apache.wayang:wayang-core. See Installation for the full dependency list.

Build docs developers (and LLMs) love