Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/apache/wayang/llms.txt

Use this file to discover all available pages before exploring further.

Apache Wayang’s behavior is governed by a layered Configuration object that every WayangContext and Job carries. You can supply defaults through .properties files, override them programmatically, and fork child configurations for per-job tuning — all without touching application code. This page explains the hierarchy, the key properties to know, and how to configure each supported platform.

Configuration hierarchy

Wayang resolves configuration values through a parent-chain. Each layer can override the one below it:
Built-in defaults (wayang-core-defaults.properties)

Platform defaults (spark/default.properties, etc.)

User file (wayang.properties on classpath, or -Dwayang.configuration=<url>)

Programmatic overrides (config.setProperty(...))
When you call new Configuration(), Wayang looks for a configuration file in this order:
  1. The system property wayang.configuration — point it at any URL Wayang can open.
  2. A wayang.properties resource on the classpath.
  3. If neither is found, a blank (defaults-only) configuration is used.

The Configuration class

org.apache.wayang.core.api.Configuration is the central handle. Key methods:
MethodPurpose
new Configuration()Creates an instance backed by built-in defaults, then loads the user file if found.
new Configuration(String url)As above, but explicitly loads from the given URL.
config.fork()Creates a child Configuration that inherits all values but can override them independently. Useful for per-job settings.
config.fork(String name)Same as fork() with a debug-friendly name.
config.setProperty(key, value)Adds or overrides a string property.
config.getStringProperty(key)Retrieves a property value; throws if not present.
config.getStringProperty(key, fallback)Returns fallback if the key is absent.
config.getBooleanProperty(key)Parses the value as a boolean.
config.getLongProperty(key)Parses the value as a long.
config.getDoubleProperty(key)Parses the value as a double.
config.load(String url)Loads additional properties from a URL at runtime.
config.setCostModel(EstimatableCost)Replaces the optimizer cost model (see ML Cost Model).

Loading configuration from a file

import org.apache.wayang.core.api.Configuration;
import org.apache.wayang.core.api.WayangContext;
import org.apache.wayang.java.Java;

// Load from an explicit file path
Configuration config = new Configuration("file:///etc/wayang/production.properties");

WayangContext wayang = new WayangContext(config)
    .withPlugin(Java.basicPlugin());

Setting properties programmatically

Configuration config = new Configuration();

// Override a single property
config.setProperty("wayang.core.log.enabled", "false");

// Add multiple overrides
config.setProperty("spark.master", "spark://my-cluster:7077");
config.setProperty("spark.app.name", "MyWayangJob");

// Per-job fork — leaves the parent untouched
Configuration jobConfig = config.fork("my-job");
jobConfig.setProperty("wayang.core.optimizer.reoptimize", "true");

Core configuration properties

These live in wayang-core-defaults.properties and are always available.

Statistics and logging

PropertyDefaultDescription
wayang.core.log.enabledtrueEnables runtime statistics collection. Set to false to reduce I/O in production.
wayang.core.log.cardinalities~/.wayang/cardinalities.jsonPath where cardinality log entries are written.
wayang.core.log.executions~/.wayang/executions.jsonPath where per-execution timing data is appended.

Plan explanation

PropertyDefaultDescription
wayang.core.explain.enabledfalseWhen true, Wayang writes a human-readable plan explanation to disk after optimization.
wayang.core.explain.directrory~/.wayang/Directory for plan explanation files. Note: the property key contains a typo (directrory) — use exactly this spelling.
Enable plan explanation during development to inspect which platforms were selected for each operator. Disable it in production as it generates file I/O on every job.

Optimizer settings

PropertyDefaultDescription
wayang.core.optimizer.pruning.strategiesLatentOperatorPruningStrategyComma-separated list of pruning strategy class names.
wayang.core.optimizer.reoptimizefalseEnables re-optimization after partial execution.
wayang.core.optimizer.reoptimize.proactivefalseTriggers re-optimization even before a stage boundary.
wayang.core.optimizer.cardinality.maxspread10Maximum acceptable ratio between upper and lower cardinality bounds.
wayang.core.optimizer.cardinality.minconfidence0.5Minimum cardinality estimate confidence to suppress fallback.
wayang.core.optimizer.enumeration.parallel-tasksfalseEnables parallel plan enumeration (experimental).

Fallback cost estimates

When no cost model is provided for an operator, Wayang uses these interval values (in abstract cost units):
PropertyDefaultDescription
wayang.core.fallback.udf.cpu.lower100Lower bound for UDF CPU cost.
wayang.core.fallback.udf.cpu.upper1000Upper bound for UDF CPU cost.
wayang.core.fallback.udf.cpu.confidence0.2Confidence for UDF CPU estimate interval.
wayang.core.fallback.udf.ram.lower100Lower bound for UDF RAM cost.
wayang.core.fallback.udf.ram.upper1000Upper bound for UDF RAM cost.
wayang.core.fallback.udf.ram.confidence0.2Confidence for UDF RAM estimate interval.
wayang.core.fallback.operator.cpu.lower100Lower bound for operator CPU cost.
wayang.core.fallback.operator.cpu.upper1000Upper bound for operator CPU cost.
wayang.core.fallback.operator.cpu.confidence0.2Confidence for operator CPU estimate interval.
wayang.core.fallback.operator.ram.lower100Lower bound for operator RAM cost.
wayang.core.fallback.operator.ram.upper1000Upper bound for operator RAM cost.
wayang.core.fallback.operator.ram.confidence0.2Confidence for operator RAM estimate interval.

Spark configuration properties

These defaults are defined in conf/spark/default.properties and are merged when the Spark plugin is active.
PropertyDefaultDescription
spark.masterlocal[1]Spark master URL. Use local[*] for all cores, or spark://host:7077 for a cluster.
spark.app.nameWayang AppApplication name shown in the Spark UI.
spark.ui.showConsoleProgressfalseSuppresses Spark’s progress bar output to the console.
spark.driver.allowMultipleContextstruePermits multiple SparkContext instances — required when Wayang manages the context lifecycle.
spark.master = spark://my-cluster:7077
spark.app.name = Production Wayang Job
spark.executor.memory = 8g
spark.executor.cores = 4
These defaults are in conf/flink/default.properties.
PropertyDefaultDescription
wayang.flink.mode.runcollectionExecution mode: collection (local), local, or distribution (cluster).
wayang.flink.paralelism1Default operator parallelism. Note: property key has a single l (paralelism).
wayang.flink.master(unset)Flink JobManager host (required for distribution mode).
wayang.flink.port(unset)Flink JobManager port (required for distribution mode).
wayang.flink.mode.run = local
wayang.flink.paralelism = 4

Python API configuration

The Python API requires two path properties so Wayang can launch the Python worker process.
PropertyDefaultDescription
wayang.api.python.worker/var/www/html/python/src/pywy/execution/worker.pyAbsolute path to the Python worker entry point.
wayang.api.python.pathpython3Python executable name or absolute path.
wayang.api.python.env.path/usr/local/lib/python3.8/dist-packagesPath to the Python site-packages directory used by the worker.
config.setProperty("wayang.api.python.worker", "/opt/wayang/python/src/pywy/execution/worker.py");
config.setProperty("wayang.api.python.path", "/usr/bin/python3");
config.setProperty("wayang.api.python.env.path", "/home/user/.venv/lib/python3.10/site-packages");

Monitor configuration

PropertyDefaultDescription
wayang.core.monitor.enabledfalseEnables the built-in job monitor.

Configuration precedence: complete example

// 1. Built-in defaults load automatically
Configuration config = new Configuration();

// 2. User file (wayang.properties on classpath) already merged by constructor

// 3. Programmatic overrides take highest precedence
config.setProperty("spark.master", "local[4]");
config.setProperty("wayang.core.log.enabled", "true");
config.setProperty("wayang.core.explain.enabled", "true");

// 4. Per-job child config — useful for A/B testing different settings
Configuration jobConfig = config.fork("experiment-1");
jobConfig.setProperty("wayang.core.optimizer.reoptimize", "true");

WayangContext wayang = new WayangContext(config);
Use config.fork() to create job-specific overrides without affecting the shared parent configuration. This is especially useful in multi-tenant or batch-submission scenarios.

Build docs developers (and LLMs) love