Documentation Index Fetch the complete documentation index at: https://mintlify.com/apache/wayang/llms.txt
Use this file to discover all available pages before exploring further.
Apache Wayang’s behavior is governed by a layered Configuration object that every WayangContext and Job carries. You can supply defaults through .properties files, override them programmatically, and fork child configurations for per-job tuning — all without touching application code. This page explains the hierarchy, the key properties to know, and how to configure each supported platform.
Configuration hierarchy
Wayang resolves configuration values through a parent-chain. Each layer can override the one below it:
Built-in defaults (wayang-core-defaults.properties)
↓
Platform defaults (spark/default.properties, etc.)
↓
User file (wayang.properties on classpath, or -Dwayang.configuration=<url>)
↓
Programmatic overrides (config.setProperty(...))
When you call new Configuration(), Wayang looks for a configuration file in this order:
The system property wayang.configuration — point it at any URL Wayang can open.
A wayang.properties resource on the classpath.
If neither is found, a blank (defaults-only) configuration is used.
The Configuration class
org.apache.wayang.core.api.Configuration is the central handle. Key methods:
Method Purpose new Configuration()Creates an instance backed by built-in defaults, then loads the user file if found. new Configuration(String url)As above, but explicitly loads from the given URL. config.fork()Creates a child Configuration that inherits all values but can override them independently. Useful for per-job settings. config.fork(String name)Same as fork() with a debug-friendly name. config.setProperty(key, value)Adds or overrides a string property. config.getStringProperty(key)Retrieves a property value; throws if not present. config.getStringProperty(key, fallback)Returns fallback if the key is absent. config.getBooleanProperty(key)Parses the value as a boolean. config.getLongProperty(key)Parses the value as a long. config.getDoubleProperty(key)Parses the value as a double. config.load(String url)Loads additional properties from a URL at runtime. config.setCostModel(EstimatableCost)Replaces the optimizer cost model (see ML Cost Model ).
Loading configuration from a file
Programmatic file load
System property
Classpath resource
import org.apache.wayang.core.api.Configuration;
import org.apache.wayang.core.api.WayangContext;
import org.apache.wayang.java.Java;
// Load from an explicit file path
Configuration config = new Configuration ( "file:///etc/wayang/production.properties" );
WayangContext wayang = new WayangContext (config)
. withPlugin ( Java . basicPlugin ());
Setting properties programmatically
Configuration config = new Configuration ();
// Override a single property
config . setProperty ( "wayang.core.log.enabled" , "false" );
// Add multiple overrides
config . setProperty ( "spark.master" , "spark://my-cluster:7077" );
config . setProperty ( "spark.app.name" , "MyWayangJob" );
// Per-job fork — leaves the parent untouched
Configuration jobConfig = config . fork ( "my-job" );
jobConfig . setProperty ( "wayang.core.optimizer.reoptimize" , "true" );
Core configuration properties
These live in wayang-core-defaults.properties and are always available.
Statistics and logging
Property Default Description wayang.core.log.enabledtrueEnables runtime statistics collection. Set to false to reduce I/O in production. wayang.core.log.cardinalities~/.wayang/cardinalities.jsonPath where cardinality log entries are written. wayang.core.log.executions~/.wayang/executions.jsonPath where per-execution timing data is appended.
Plan explanation
Property Default Description wayang.core.explain.enabledfalseWhen true, Wayang writes a human-readable plan explanation to disk after optimization. wayang.core.explain.directrory~/.wayang/Directory for plan explanation files. Note: the property key contains a typo (directrory) — use exactly this spelling.
Enable plan explanation during development to inspect which platforms were selected for each operator. Disable it in production as it generates file I/O on every job.
Optimizer settings
Property Default Description wayang.core.optimizer.pruning.strategiesLatentOperatorPruningStrategyComma-separated list of pruning strategy class names. wayang.core.optimizer.reoptimizefalseEnables re-optimization after partial execution. wayang.core.optimizer.reoptimize.proactivefalseTriggers re-optimization even before a stage boundary. wayang.core.optimizer.cardinality.maxspread10Maximum acceptable ratio between upper and lower cardinality bounds. wayang.core.optimizer.cardinality.minconfidence0.5Minimum cardinality estimate confidence to suppress fallback. wayang.core.optimizer.enumeration.parallel-tasksfalseEnables parallel plan enumeration (experimental).
Fallback cost estimates
When no cost model is provided for an operator, Wayang uses these interval values (in abstract cost units):
Property Default Description wayang.core.fallback.udf.cpu.lower100Lower bound for UDF CPU cost. wayang.core.fallback.udf.cpu.upper1000Upper bound for UDF CPU cost. wayang.core.fallback.udf.cpu.confidence0.2Confidence for UDF CPU estimate interval. wayang.core.fallback.udf.ram.lower100Lower bound for UDF RAM cost. wayang.core.fallback.udf.ram.upper1000Upper bound for UDF RAM cost. wayang.core.fallback.udf.ram.confidence0.2Confidence for UDF RAM estimate interval. wayang.core.fallback.operator.cpu.lower100Lower bound for operator CPU cost. wayang.core.fallback.operator.cpu.upper1000Upper bound for operator CPU cost. wayang.core.fallback.operator.cpu.confidence0.2Confidence for operator CPU estimate interval. wayang.core.fallback.operator.ram.lower100Lower bound for operator RAM cost. wayang.core.fallback.operator.ram.upper1000Upper bound for operator RAM cost. wayang.core.fallback.operator.ram.confidence0.2Confidence for operator RAM estimate interval.
Spark configuration properties
These defaults are defined in conf/spark/default.properties and are merged when the Spark plugin is active.
Property Default Description spark.masterlocal[1]Spark master URL. Use local[*] for all cores, or spark://host:7077 for a cluster. spark.app.nameWayang AppApplication name shown in the Spark UI. spark.ui.showConsoleProgressfalseSuppresses Spark’s progress bar output to the console. spark.driver.allowMultipleContextstruePermits multiple SparkContext instances — required when Wayang manages the context lifecycle.
conf/spark/default.properties
Programmatic Spark config
spark.master = spark://my-cluster:7077
spark.app.name = Production Wayang Job
spark.executor.memory = 8g
spark.executor.cores = 4
Flink configuration properties
These defaults are in conf/flink/default.properties.
Property Default Description wayang.flink.mode.runcollectionExecution mode: collection (local), local, or distribution (cluster). wayang.flink.paralelism1Default operator parallelism. Note: property key has a single l (paralelism). wayang.flink.master(unset) Flink JobManager host (required for distribution mode). wayang.flink.port(unset) Flink JobManager port (required for distribution mode).
wayang.flink.mode.run = local
wayang.flink.paralelism = 4
Python API configuration
The Python API requires two path properties so Wayang can launch the Python worker process.
Property Default Description wayang.api.python.worker/var/www/html/python/src/pywy/execution/worker.pyAbsolute path to the Python worker entry point. wayang.api.python.pathpython3Python executable name or absolute path. wayang.api.python.env.path/usr/local/lib/python3.8/dist-packagesPath to the Python site-packages directory used by the worker.
config . setProperty ( "wayang.api.python.worker" , "/opt/wayang/python/src/pywy/execution/worker.py" );
config . setProperty ( "wayang.api.python.path" , "/usr/bin/python3" );
config . setProperty ( "wayang.api.python.env.path" , "/home/user/.venv/lib/python3.10/site-packages" );
Monitor configuration
Property Default Description wayang.core.monitor.enabledfalseEnables the built-in job monitor.
Configuration precedence: complete example
// 1. Built-in defaults load automatically
Configuration config = new Configuration ();
// 2. User file (wayang.properties on classpath) already merged by constructor
// 3. Programmatic overrides take highest precedence
config . setProperty ( "spark.master" , "local[4]" );
config . setProperty ( "wayang.core.log.enabled" , "true" );
config . setProperty ( "wayang.core.explain.enabled" , "true" );
// 4. Per-job child config — useful for A/B testing different settings
Configuration jobConfig = config . fork ( "experiment-1" );
jobConfig . setProperty ( "wayang.core.optimizer.reoptimize" , "true" );
WayangContext wayang = new WayangContext (config);
Use config.fork() to create job-specific overrides without affecting the shared parent configuration. This is especially useful in multi-tenant or batch-submission scenarios.