SQLMorph provides seven evaluation techniques through theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/dais-polymtl/sqlmorph/llms.txt
Use this file to discover all available pages before exploring further.
EvaluationTechnique enum. Each technique implements a different strategy for comparing the result set of a predicted SQL query against the ground-truth result set. They range from a simple binary row-set equality check (EXECUTION_ACCURACY) to embedding-based approaches that align columns semantically and score rows by cosine similarity (UNIFIED_COLUMN_AND_SEMANTIC_ROW). You select the technique once when you construct an Evaluation instance, and it controls the entire pipeline.
Selecting a technique
config dictionary:
All seven techniques
EXECUTION_ACCURACY
EXECUTION_ACCURACY
Enum value:
EvaluationTechnique.EXECUTION_ACCURACYString value: "execution_accuracy"Embedding required: NoWhat it measuresBinary row-set equality. Returns EX = 1 if set(predicted_rows) == set(ground_truth_rows), otherwise EX = 0. This is the standard metric used by the BIRD benchmark. It produces only an EX score — no EXP, EXR, or F1.When to use itUse when you need a direct comparison against published BIRD leaderboard scores, or as a fast sanity check before running a heavier technique. Because it ignores column names entirely and treats rows as opaque tuples, it is sensitive to column ordering in SELECT clauses and will return EX = 0 if the predicted query returns extra columns — even if the data is correct.Returned metrics| Key | Type | Description |
|---|---|---|
EX | int | 1 if result sets match exactly, 0 otherwise |
EXACT_COLUMN_AND_EXACT_CELL
EXACT_COLUMN_AND_EXACT_CELL
Enum value:
EvaluationTechnique.EXACT_COLUMN_AND_EXACT_CELLString value: "exact_column_and_exact_cell"Embedding required: NoWhat it measuresColumn names are matched by exact string equality. The pipeline finds the intersection of predicted and ground-truth column names, projects both result sets onto those shared columns, and then counts matched rows using exact tuple equality. Precision and recall are computed at the cell level: matched_cells / predicted_cells and matched_cells / ground_truth_cells respectively.When to use itUse when the Text-to-SQL system is expected to produce column aliases that exactly match the ground truth — common when the ground truth uses bare column names with no renaming. This is the strictest column-aware technique. If the predicted query renames a column (e.g., phone AS contact_phone), that column will not be matched even if the values are identical.Returned metrics| Key | Type | Description |
|---|---|---|
EX | int | Binary row-set equality |
EXP | float | Execution precision over matched cells |
EXR | float | Execution recall over matched cells |
F1 | float | Harmonic mean of EXP and EXR |
EXACT_COLUMN_AND_PARTIAL_CELL
EXACT_COLUMN_AND_PARTIAL_CELL
Enum value:
EvaluationTechnique.EXACT_COLUMN_AND_PARTIAL_CELLString value: "exact_column_and_partial_cell"Embedding required: NoWhat it measuresColumns are matched by exact name, the same as EXACT_COLUMN_AND_EXACT_CELL. The difference is in row matching: after exact row matches are counted, the remaining unmatched rows undergo a greedy partial-row matching phase. Each remaining predicted row is paired with the ground-truth row that shares the most cell values in common (normalised by the number of shared columns). This produces fractional cell-match contributions for partially correct rows.When to use itUse when you expect queries that are directionally correct but off by a few cell values — for example, when the system predicts slightly wrong aggregation values or has off-by-one filter errors. The partial matching surface gives credit for rows that are close but not identical.Returned metrics| Key | Type | Description |
|---|---|---|
EX | int | Binary row-set equality |
EXP | float | Execution precision, including fractional partial matches |
EXR | float | Execution recall, including fractional partial matches |
F1 | float | Harmonic mean of EXP and EXR |
SEMANTIC_COLUMN_AND_EXACT_CELL
SEMANTIC_COLUMN_AND_EXACT_CELL
Enum value:
EvaluationTechnique.SEMANTIC_COLUMN_AND_EXACT_CELLString value: "semantic_column_and_exact_cell"Embedding required: YesWhat it measuresColumns are matched semantically using OpenAI embeddings and the Hungarian algorithm. For each column, a text representation is built from the column name, its inferred data type, and up to ten of its most frequent values. Embeddings of those texts are compared with cosine similarity, and the Hungarian algorithm finds the optimal one-to-one pairing. Only pairs with similarity > 0.7 are kept. Row matching on the semantically aligned columns is then performed with exact tuple equality (no partial credit).When to use itUse when the predicted query uses different column aliases than the ground truth but those columns contain the same data — for instance, phone_number vs Phone vs contact. This technique tolerates naming discrepancies that would cause EXACT_COLUMN_AND_EXACT_CELL to miss valid matches.Returned metrics| Key | Type | Description |
|---|---|---|
EX | int | Binary row-set equality |
EXP | float | Execution precision over semantically matched cells |
EXR | float | Execution recall over semantically matched cells |
F1 | float | Harmonic mean of EXP and EXR |
SEMANTIC_COLUMN_AND_PARTIAL_CELL
SEMANTIC_COLUMN_AND_PARTIAL_CELL
Enum value:
EvaluationTechnique.SEMANTIC_COLUMN_AND_PARTIAL_CELLString value: "semantic_column_and_partial_cell"Embedding required: YesWhat it measuresCombines semantic column matching (identical to SEMANTIC_COLUMN_AND_EXACT_CELL) with partial row matching (identical to EXACT_COLUMN_AND_PARTIAL_CELL). After semantically aligned columns are determined, rows are matched first exactly, then residual unmatched rows are scored greedily by per-cell value overlap. Both exact and fractional contributions accumulate into the final matched_cells count.When to use itThis is the most commonly recommended technique for evaluating Text-to-SQL systems on BIRD. It handles both renamed columns and partially correct result sets, making it robust to the two most common failure modes: alias mismatch and aggregation error. It is the technique used in the __main__ example in evaluation.py.Returned metrics| Key | Type | Description |
|---|---|---|
EX | int | Binary row-set equality |
EXP | float | Precision with semantic column alignment and partial row credit |
EXR | float | Recall with semantic column alignment and partial row credit |
F1 | float | Harmonic mean of EXP and EXR |
NO_COLUMN_AND_PARTIAL_CELL
NO_COLUMN_AND_PARTIAL_CELL
Enum value:
EvaluationTechnique.NO_COLUMN_AND_PARTIAL_CELLString value: "no_column_and_partial_cell"Embedding required: NoWhat it measuresColumn names are completely ignored. Each row is converted to a set of cell value strings, discarding column identity entirely. Rows are matched first by exact cell-set equality, then residual rows are paired using Jaccard similarity over their cell-value sets. Precision and recall are computed over individual cell values rather than whole cells in a column-aligned grid.When to use itUse when the ground truth and prediction may return equivalent data under entirely different schemas — for example, after schema renaming experiments in TQA, or when comparing systems that produce differently structured outputs. Because column identity is ignored, this technique cannot distinguish between a query that returns (name, city) and one that returns (city, name) if the values happen to be the same; it treats them as identical.Returned metrics| Key | Type | Description |
|---|---|---|
EX | int | Binary row-set equality |
EXP | float | Precision based on cell-value set overlap |
EXR | float | Recall based on cell-value set overlap |
F1 | float | Harmonic mean of EXP and EXR |
UNIFIED_COLUMN_AND_SEMANTIC_ROW
UNIFIED_COLUMN_AND_SEMANTIC_ROW
Enum value:
EvaluationTechnique.UNIFIED_COLUMN_AND_SEMANTIC_ROWString value: "unified_column_and_semantic_row"Embedding required: YesWhat it measuresThis technique takes a fundamentally different approach from the others. Instead of projecting onto matched columns, it builds a unified column space from the union of all predicted and ground-truth columns (sorted lexicographically). Both result sets are padded with "N/A" for any column they don’t contain, and each row is serialised as a pipe-separated string. The Hungarian algorithm then matches predicted rows to ground-truth rows based on cosine similarity between their row-level embeddings. A coverage penalty — the fraction of ground-truth columns that appear in the predicted result — is multiplied into each match score. EXP and EXR are computed as sum(penalised_similarity) / P and sum(penalised_similarity) / G respectively, where P and G are the row counts.When to use itUse when you want holistic row-level semantic similarity rather than cell-counting. This technique is most appropriate for exploratory analysis where the predicted query may return a structurally different but semantically related result set. It is the most computationally expensive technique because it embeds every row in both result sets.Returned metrics| Key | Type | Description |
|---|---|---|
EX | int | Binary row-set equality |
EXP | float | Mean penalised row-similarity across predicted rows |
EXR | float | Mean penalised row-similarity across ground-truth rows |
F1 | float | Harmonic mean of EXP and EXR |
Technique comparison at a glance
| Technique | Column matching | Row matching | Embedding | Metrics |
|---|---|---|---|---|
EXECUTION_ACCURACY | None | Exact set equality | No | EX |
EXACT_COLUMN_AND_EXACT_CELL | Exact name | Exact tuple | No | EX, EXP, EXR, F1 |
EXACT_COLUMN_AND_PARTIAL_CELL | Exact name | Exact + partial greedy | No | EX, EXP, EXR, F1 |
SEMANTIC_COLUMN_AND_EXACT_CELL | Semantic (cosine > 0.7) | Exact tuple | Yes | EX, EXP, EXR, F1 |
SEMANTIC_COLUMN_AND_PARTIAL_CELL | Semantic (cosine > 0.7) | Exact + partial greedy | Yes | EX, EXP, EXR, F1 |
NO_COLUMN_AND_PARTIAL_CELL | None (ignored) | Jaccard cell-set | No | EX, EXP, EXR, F1 |
UNIFIED_COLUMN_AND_SEMANTIC_ROW | Union + coverage penalty | Row embedding cosine | Yes | EX, EXP, EXR, F1 |