This guide walks you through setting up SQLMorph from scratch and running your first evaluation. You will need Python 3.12 or later, Git, and an OpenAI API key for semantic metrics. The entire setup takes under five minutes once you have those prerequisites in place.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/dais-polymtl/sqlmorph/llms.txt
Use this file to discover all available pages before exploring further.
Install uv
SQLMorph uses uv to manage dependencies and Python versions. Follow the official installation instructions for your operating system, then verify it is available.
Install dependencies
Install all project dependencies, including optional extras for notebooks and model backends.This command creates a virtual environment at
./venv and installs everything declared in pyproject.toml, including duckdb, openai, sqlglot, transformers, and the full suite of model provider clients.Install pre-commit hooks (optional)
If you plan to contribute or modify source code, install the pre-commit hooks. These run
ruff for linting and black for formatting on every commit.Configure your environment
Copy the example environment file and add your OpenAI API key. The key is required for any evaluation technique that uses semantic column matching or embedding-based row comparison.Open
.env and replace the placeholder value:Only the semantic evaluation techniques (
SEMANTIC_COLUMN_AND_EXACT_CELL, SEMANTIC_COLUMN_AND_PARTIAL_CELL, UNIFIED_COLUMN_AND_SEMANTIC_ROW) require an OpenAI API key. You can run EXECUTION_ACCURACY and EXACT_COLUMN_AND_PARTIAL_CELL without one.Source the metrics configuration
Before running evaluations from the CLI, source the metrics configuration script. This sets the environment variables that The script sets
evaluation.py reads at runtime.EVAL_TECHNIQUE, DBMS, DB_PATH, EMBEDDING_MODEL, LOGS_DIR_PATH, and PENALIZE_EXTRA_PRED_COLS. Edit the script directly to change these defaults.Run your first evaluation
You can evaluate a predicted SQL query against a ground-truth query using either the Python API or the CLI.Python APIThe example below reproduces the The Results are printed to stdout and, when
california_schools evaluation included at the bottom of src/metrics/evaluation.py. It uses SEMANTIC_COLUMN_AND_PARTIAL_CELL, which matches columns by embedding similarity and compares result cells with partial matching — the most informative technique for production use.predicted_sql above returns four columns while ground_truth_sql returns only Phone. With penalize_extra_pred_cols=True, this will score low on Execution Precision — exactly the kind of over-prediction that binary EX would pass as correct.CLIAfter sourcing scripts/metrics_config.sh, pass queries directly on the command line.ENABLE_LOG=true, written as a timestamped JSON file under LOGS_DIR_PATH.Next steps
Core concepts overview
Understand how JQE, TQA, and the fine-grained metrics fit together.
JQE usage guide
Generate structurally diverse evaluation sets by expanding queries with valid joins.
TQA usage guide
Create linguistically varied question perturbations to test robustness.
Metrics usage guide
Run all evaluation techniques and interpret EXP, EXR, and F1 results.