Supported Execution Platforms in Apache Wayang

Apache Wayang abstracts your data pipeline away from any single execution engine by treating every engine as a pluggable platform. You register the platforms you want available, and Wayang’s cost-based optimizer decides — at runtime — which platform (or combination of platforms) best executes each operator in your job. Adding more platforms to the registry gives the optimizer more choices, which generally produces better execution plans, especially when data volumes or query shapes change across pipeline stages.

All supported platforms

Java Streams

Single-JVM execution backed by java.util.stream. Zero setup, ideal for development, testing, and small datasets.

Apache Spark

Distributed batch processing on Spark 3.4.4 / Scala 2.12. Handles large-scale data across a cluster or locally.

Apache Flink

Unified stream and batch engine. DataSet API (legacy) and bounded DataStream API both supported.

PostgreSQL

Push filter, projection, join, and reduce operators into PostgreSQL via JDBC. Avoids moving data out of the database.

SQLite3

Lightweight embedded JDBC platform. Good for edge devices and integration tests that need an in-process SQL engine.

Apache Kafka

KafkaTopicSource and KafkaTopicSink are first-class logical operators. Both Java Streams and Spark can execute them.

Apache Giraph

Graph-processing platform for large-scale PageRank and similar BSP algorithms. Entry point: Giraph.plugin().

TensorFlow

Machine-learning platform exposing DLTrainingOperator and PredictOperator. Entry point: Tensorflow.plugin().

Platform table

Platform	Maven module	Entry-point class	Notes
Java Streams	`wayang-java`	`Java`	Always available; fallback for many operator types
Apache Spark	`wayang-spark`	`Spark`	Requires `SPARK_HOME`; Spark 3.4.4, Scala 2.12
Apache Flink	`wayang-flink`	`Flink`	DataSet (deprecated) and bounded stream modes
PostgreSQL	`wayang-postgres`	`Postgres`	JDBC; filter/join/projection pushed into SQL
SQLite3	`wayang-sqlite3`	`Sqlite3`	JDBC; embedded, file-based or in-memory
Apache Kafka	`wayang-java` / `wayang-spark`	`KafkaTopicSource` / `KafkaTopicSink`	Logical operators; executed on Java or Spark
Apache Giraph	`wayang-giraph`	`Giraph`	PageRank; needs a Hadoop/Giraph cluster
TensorFlow	`wayang-tensorflow`	`Tensorflow`	DL training and inference operators

Plugin architecture

Every platform ships as one or more plugins. A plugin bundles:

Operator mappings — a table that tells the optimizer “logical operator X can be executed as physical operator Y on this platform.”
Channel conversions — cost-annotated rules for moving data between platforms (for example, collecting a Spark RDD back to a Java Collection to hand off to a downstream operator on Java Streams).
Default configuration — a bundled .properties file with cost-model parameters, parallelism settings, and connection defaults.

You register plugins on the WayangContext:

WayangContext wayang = new WayangContext(new Configuration())
        .withPlugin(Java.basicPlugin())   // Java Streams operators
        .withPlugin(Spark.basicPlugin()); // Spark operators

The optimizer inspects every registered mapping when it builds the execution plan. More mappings available means more candidate plans to compare, which typically lets the optimizer find cheaper routes, especially when it can split a single logical plan across platforms.

You only pay the overhead of a platform (driver startup, network round-trips, JVM serialization) when the optimizer actually chooses it for at least one operator in the job. Registering an additional platform does not force its use.

Choosing which plugins to register

Scenario	Recommended plugins
Local development / unit tests	`Java.basicPlugin()`
Small–medium production jobs	`Java.basicPlugin()`
Large datasets on a cluster	`Spark.basicPlugin()`
Let Wayang pick automatically	`Java.basicPlugin()` + `Spark.basicPlugin()`
Graph analytics	add `Spark.graphPlugin()` or `Giraph.plugin()`
SQL-heavy pipelines	add `Postgres.plugin()` or `Sqlite3.plugin()`
ML training	add `Tensorflow.plugin()`

In production, register every platform you have available and let the cost-based optimizer decide. Artificially limiting the platform set only helps when you need to guarantee a specific engine for compliance or environment reasons.

Get Started

Core Concepts

API Guides

Platforms

Advanced Guides

Examples & Reference

Supported Execution Platforms in Apache Wayang

All supported platforms

Java Streams

Apache Spark

Apache Flink

PostgreSQL

SQLite3

Apache Kafka

Apache Giraph

TensorFlow

Platform table

Plugin architecture

Choosing which plugins to register

Build docs developers (and LLMs) love

Get Started

Core Concepts

API Guides

Platforms

Advanced Guides

Examples & Reference

Documentation Index

​All supported platforms

Java Streams

Apache Spark

Apache Flink

PostgreSQL

SQLite3

Apache Kafka

Apache Giraph

TensorFlow

​Platform table

​Plugin architecture

​Choosing which plugins to register

Build docs developers (and LLMs) love

All supported platforms

Platform table

Plugin architecture

Choosing which plugins to register