SQL API: Run SQL Queries Across Multiple Platforms
Execute SQL across Java, Spark, and Postgres with Wayang’s SQL API. Calcite parses and validates queries; Wayang routes each operator to the best platform.
Use this file to discover all available pages before exploring further.
Apache Wayang’s SQL API lets you write standard SQL and have Wayang execute it on any combination of registered platforms — local Java Streams, Apache Spark, PostgreSQL, and others. Under the hood, the SQL string is parsed by Apache Calcite, validated and converted into a relational algebra tree, translated into Wayang operators by WayangRelConverter, and handed to the platform-agnostic cost-based optimizer. The pipeline from SQL text to execution is entirely transparent; you write SQL, Wayang handles the rest.
SqlContext is the entry point for the SQL API. It extends WayangContext and adds the Apache Calcite schema registry and the executeSql method.
import org.apache.wayang.api.sql.context.SqlContext;import org.apache.wayang.core.api.Configuration;import java.sql.SQLException;Configuration configuration = new Configuration();try { SqlContext sqlContext = new SqlContext(configuration);} catch (SQLException e) { throw new RuntimeException(e);}
SqlContext automatically registers the Java, Spark, and Postgres plugins when constructed with only a Configuration. To control which plugins are active, use the two-argument constructor:
Wayang’s SQL API uses Apache Calcite schemas to map table names in SQL to physical data sources. You configure the schema by setting the wayang.calcite.model property in your Configuration to an inline Calcite model JSON string.
The Calcite file adapter (org.apache.calcite.adapter.file.FileSchemaFactory) maps all CSV files in a directory as tables whose names match their filenames.
The Calcite schema drives validation and planning. The actual data reading at runtime goes through the registered Wayang platform operators, not the JDBC connection used by the schema introspection.
public Collection<Record> executeSql(String sql) throws SqlParseException
Parses, plans, optimises, and executes the SQL string, returning all result rows as a Collection<Record>. Every Record holds an Object[] of field values accessible by position or name.
Collection<Record> results = sqlContext.executeSql( "SELECT name, COUNT(*) AS cnt " + "FROM fs.orders " + "GROUP BY name " + "ORDER BY cnt DESC");for (Record record : results) { System.out.println(record.getField(0) + " → " + record.getField(1));}
SELECT u.name, o.totalFROM fs.users uJOIN fs.orders o ON u.id = o.user_id
Single-condition equi-joins are translated to JoinOperator. Multi-condition joins (e.g., ON a.x = b.x AND a.y = b.y) use WayangMultiConditionJoinVisitor, which chains two key extractors. Cross joins (ON TRUE / no condition) use WayangCrossJoinVisitor.
The optimizer applies Calcite’s SubQueryRemoveRule to rewrite IN, EXISTS, and scalar subqueries as joins before planning, so correlated subqueries in WHERE clauses are normalised automatically.
The pipeline from SQL text to execution goes through three stages:
1
Parse and validate
Optimizer.parseSql(sql) feeds the SQL string through Calcite’s SqlParser and SqlValidator. The result is a validated SqlNode AST. Syntax errors or unknown table/column references surface here as SqlParseException.
2
Convert to relational algebra
Optimizer.convert(validatedSqlNode) converts the AST to Calcite’s RelNode tree using a SqlToRelConverter. This is the standard Calcite logical plan.
3
Apply Wayang rules and translate
Optimizer.optimize(relNode, ...) applies the WayangRules rule set to convert each standard Calcite RelNode into a Wayang-convention counterpart (WayangTableScan, WayangFilter, WayangProject, WayangJoin, WayangAggregate, WayangSort). WayangRelConverter.convert() then recursively walks the Wayang-convention tree and instantiates the corresponding Wayang operator objects, which are wired together into a WayangPlan and submitted to the cross-platform executor.
SQL string │ ▼ Calcite SqlParser + SqlValidatorSqlNode (AST) │ ▼ SqlToRelConverterRelNode (logical relational algebra) │ ▼ WayangRules (Volcano planner)WayangRelNode tree │ ▼ WayangRelConverterWayang operator DAG │ ▼ Wayang cost-based optimizerExecution on Java / Spark / Postgres / ...
When wayang-postgres is on the classpath and the Calcite schema points to a PostgreSQL database, Wayang can push SELECT, WHERE, GROUP BY, and ORDER BY clauses directly into the database, retrieving only the aggregated results. Register both the conversion plugin and the basic plugin:
The Wayang cost model will evaluate whether it is cheaper to pull the data to Java/Spark or to push the aggregation down to PostgreSQL, and will choose accordingly.
Non-equi joins expressed directly in SQL (they will throw IllegalStateException)
For use cases requiring these features, use the Java API or Scala API directly.
All result rows are returned as Record objects regardless of platform. Columns are ordered as they appear in the SELECT clause. Use SELECT with explicit column names rather than SELECT * to ensure a stable column order across schema evolution.