Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/apache/wayang/llms.txt

Use this file to discover all available pages before exploring further.

Apache Wayang is composed of a family of Maven artifacts grouped under the org.apache.wayang groupId. Every module has a focused responsibility — the core optimizer, a specific execution-engine adapter, an API surface, or a utility — so you only include what you actually use. This page lists every published artifact, explains its purpose, and shows the exact Maven dependency XML to paste into your pom.xml.
All Wayang artifacts share a single version number. Replace WAYANG_VERSION in every snippet below with the latest release available on Maven Central. The current development version in source is 1.1.2-SNAPSHOT.

Versioning and snapshot builds

Wayang follows standard Apache versioning. Release artifacts are published to Maven Central and require no extra repository configuration. Snapshot artifacts (versions ending in -SNAPSHOT) are published to the Apache Foundation snapshot repository and require an additional <repositories> block:
<repositories>
  <repository>
    <id>apache-snapshots</id>
    <name>Apache Foundation Snapshot Repository</name>
    <url>https://repository.apache.org/content/repositories/snapshots</url>
    <snapshots>
      <enabled>true</enabled>
    </snapshots>
    <releases>
      <enabled>false</enabled>
    </releases>
  </repository>
</repositories>
Snapshot builds reflect the current state of the main branch and may contain breaking changes between builds. Use a release version for production deployments.

Core modules

These two artifacts are required in every Wayang project. They provide the data model, the cost-based optimizer, the plan compiler, and the built-in operators that every platform adapter depends on.

wayang-core

The foundation of the entire framework. Contains the WayangPlan data structures, the cross-platform optimizer, cardinality estimators, cost functions, and the execution engine abstraction layer. You always need this.

wayang-basic

The standard operator library: TextFileSource, MapOperator, FilterOperator, FlatMapOperator, ReduceByOperator, KafkaTopicSource, KafkaTopicSink, and more. Depends on wayang-core.
<!-- wayang-core: optimizer, plan model, execution abstraction — always required -->
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-core</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>

<!-- wayang-basic: standard operators and data types — always recommended -->
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-basic</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>

API modules

These modules provide the developer-facing API surfaces. Include the ones that match the language and style you want to write pipelines in.
Enables writing Wayang pipelines in Python. The Python code is serialised and executed by the JVM-side runtime, bridging the Python data-science ecosystem with Wayang’s cross-platform execution.When to include: When you want to express pipelines in Python or integrate with Python ML libraries.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-api-python</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
Parses standard SQL queries using Apache Calcite and translates them into Wayang plans. Lets you submit SQL to Wayang and have it execute across multiple engines.When to include: When you want to write queries in SQL rather than a programmatic API.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-api-sql</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
Exposes Wayang as a Spring Boot REST application that accepts job definitions as JSON payloads. Useful for building services or UIs on top of Wayang without writing JVM code.When to include: When you need a REST endpoint for submitting Wayang jobs programmatically or from a web interface.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-api-json</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
Internal utility classes shared across the various API modules (wayang-api-scala-java, wayang-api-sql, wayang-api-json). You typically do not depend on this directly — it is pulled in transitively by whichever API module you use.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-api-utils</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>

Platform adapters

Each adapter below teaches Wayang how to run operators on a specific execution engine. Include exactly the adapters for the engines you want available. The optimizer will only consider platforms whose adapters are registered on the WayangContext.
You must call .withPlugin(Engine.basicPlugin()) on your WayangContext at runtime for every platform adapter you include. Adding the dependency alone is not enough.

Local / in-process

Executes operators using the Java Streams API, entirely in-process with no external cluster. Startup overhead is negligible, making this the right choice for development, unit tests, and small datasets.Typical use: Always register this alongside heavier engines so the optimizer can keep small operators local.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-java</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
// Registration
wayangContext.withPlugin(Java.basicPlugin());

Batch processing

Translates Wayang operators into Spark RDD and Dataset operations. Best suited for large-scale batch workloads where Spark’s parallel shuffle is worth its startup overhead.Typical use: Register alongside wayang-java in production. The optimizer will route large shuffles to Spark and keep small lookups local.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-spark</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
wayangContext.withPlugin(Spark.basicPlugin());

Databases

Pushes relational operators (filters, projections, joins, aggregations) down into a PostgreSQL database using JDBC. Particularly efficient when data already lives in Postgres — no ETL required.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-postgres</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
Same as wayang-postgres but targets SQLite. Useful for lightweight local testing with a relational platform without running a database server.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-sqlite3</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
Provides a generic JDBC-based platform template used as the foundation for wayang-postgres and wayang-sqlite3. You do not typically depend on this directly; instead depend on the specific database adapter you want.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-jdbc-template</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
A generic JDBC platform adapter that can be pointed at any JDBC-compatible data source without a dedicated Wayang adapter module. Useful for databases not covered by the built-in adapters.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-generic-jdbc</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>

Streaming

Kafka source and sink operators (KafkaTopicSource, KafkaTopicSink) are part of wayang-basic and are executed by the Java platform adapter. Add kafka-clients to your classpath and configure wayang-kafka-defaults.properties with your broker settings.
<!-- Kafka client — required at runtime for KafkaTopicSource / KafkaTopicSink -->
<dependency>
    <groupId>org.apache.kafka</groupId>
    <artifactId>kafka-clients</artifactId>
    <version>3.9.2</version>
</dependency>

Specialised

Executes graph-parallel operators on Apache Giraph, a Pregel-inspired graph processing framework that runs on Hadoop. Include when your pipeline contains graph algorithms.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-giraph</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
Delegates ML inference and training operators to TensorFlow. Allows embedding TensorFlow model execution as a step inside a broader Wayang pipeline.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-tensorflow</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>

Plugin modules

These modules add optional capabilities layered on top of the core and platform modules.
Implements the IEJoin algorithm for efficient inequality-predicate joins (e.g., a.ts < b.ts). Standard hash or sort-merge joins cannot handle these efficiently. Include this plugin when your pipeline contains range or inequality join conditions.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-iejoin</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
Adds spatial data types and geometry operators backed by the JTS Topology Suite. Include when working with geospatial or geometry data.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-spatial</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
Bridges Wayang pipelines with ML frameworks via the SQL API. Enables hybrid pipelines that mix data-processing operators with ML model training or scoring steps.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-ml</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>

Utility modules

Observes actual operator execution times and cardinalities to learn and refine the cost functions used by the optimizer. Run the profiler on representative workloads to improve placement decisions in production.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-profiler</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
Contains benchmark applications (TPC-H queries and synthetic workloads, including a production-ready k-means and WordCount) used to evaluate the optimizer and compare performance across platform combinations. Useful for performance testing and regression checks.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-benchmark</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
Provides a collection of machine learning algorithms (k-means, SGD, and others) implemented as Wayang pipelines via the ML4all abstraction. Includes ready-to-run entry points such as org.apache.wayang.ml4all.examples.RunKMeans.
<dependency>
    <groupId>org.apache.wayang</groupId>
    <artifactId>wayang-ml4all</artifactId>
    <version>WAYANG_VERSION</version>
</dependency>
Produces the wayang-submit binary distribution tarball. This module is not included as a dependency in application code; it is invoked during the build to create the standalone distribution:
./mvnw clean package -pl :wayang-assembly -Pdistribution
The resulting archive under wayang-assembly/target/ contains the bin/wayang-submit launcher script used to run Wayang jobs from the command line.

Minimum dependency set by use case

Use caseRequired modules
Local development / unit testswayang-core, wayang-basic, wayang-api-scala-java, wayang-java
Production batch on Spark+ wayang-spark
Kafka source or sink+ kafka-clients (transitive via wayang-basic)
SQL queries+ wayang-api-sql
Geospatial data+ wayang-spatial
Graph algorithms+ wayang-giraph
ML inference+ wayang-tensorflow or wayang-ml
Python pipelines+ wayang-api-python

Minimum pom.xml for a Java project

<properties>
    <wayang.version>WAYANG_VERSION</wayang.version>
</properties>

<dependencies>
    <!-- Always required -->
    <dependency>
        <groupId>org.apache.wayang</groupId>
        <artifactId>wayang-core</artifactId>
        <version>${wayang.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.wayang</groupId>
        <artifactId>wayang-basic</artifactId>
        <version>${wayang.version}</version>
    </dependency>

    <!-- Fluent Java/Scala API -->
    <dependency>
        <groupId>org.apache.wayang</groupId>
        <artifactId>wayang-api-scala-java</artifactId>
        <version>${wayang.version}</version>
    </dependency>

    <!-- Execution engines — add one per platform you want available -->
    <dependency>
        <groupId>org.apache.wayang</groupId>
        <artifactId>wayang-java</artifactId>
        <version>${wayang.version}</version>
    </dependency>
    <dependency>
        <groupId>org.apache.wayang</groupId>
        <artifactId>wayang-spark</artifactId>
        <version>${wayang.version}</version>
    </dependency>

    <!-- Scala runtime (required when using Scala API or Spark) -->
    <dependency>
        <groupId>org.scala-lang</groupId>
        <artifactId>scala-library</artifactId>
        <version>2.12.17</version>
    </dependency>

    <!-- Logging -->
    <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-simple</artifactId>
        <version>1.7.13</version>
    </dependency>
</dependencies>

Build docs developers (and LLMs) love