Textual Query Augmentation (TQA) concepts

Textual Query Augmentation addresses a different weakness than structural complexity: linguistic brittleness. Real-world database schemas often use terse, abbreviated, or domain-specific names that don’t match how users phrase their questions. TQA simulates this mismatch by renaming schema elements — tables and columns — using a controlled obfuscation mapping, then asking GPT-4o to rewrite the original natural language question so it reflects the new names. The ground-truth SQL stays the same; only the NL side changes. When applied to CHESS, DIN-SQL, and MAC-SQL, forcing heavy abbreviations caused up to a 17% drop in execution accuracy — a fragility invisible to standard benchmark evaluation.

The decrease-naturalness technique

TQA’s primary technique is decrease naturalness: it systematically degrades the readability of schema identifiers and propagates those changes into the NL queries that reference them. The process has two phases: schema renaming and NL query alteration.

Phase 1: schema renaming

A schema mappings CSV file defines how each table and column in a BIRD database should be renamed. The mapping records the original name and its obfuscated replacement — for example, replacing a readable column name like school_name with a terse abbreviation like schl_nm. The read_schema_mapping function loads this CSV and organises it into a nested dictionary keyed by db_id, then table name, then column name:

# src/textual_query_augmentation/decrease_naturalness/alter_nl_queries.py

def read_schema_mapping(mapping_csv_path):
    df = pd.read_csv(mapping_csv_path)
    db_mappings = defaultdict(lambda: defaultdict(dict))

    for _, row in df.iterrows():
        db_id = row["db_id"]
        table_name = row["table_name"]
        new_table_name = row["new_table_name"]

        if "columns" not in db_mappings[db_id][table_name]:
            db_mappings[db_id][table_name]["new_name"] = new_table_name
            db_mappings[db_id][table_name]["columns"] = {}

    for _, row in df.iterrows():
        db_id = row["db_id"]
        table_name = row["table_name"]
        column_name = row["column_name"]
        new_column_name = row["new_column_name"]
        db_mappings[db_id][table_name]["columns"][column_name] = new_column_name

    return db_mappings

Phase 2: NL query alteration via LLM

Once the schema mapping is loaded, generate_less_natural_nl constructs a prompt that includes the formatted mapping and the original NL question, then sends it to GPT-4o. The prompt template is a few-shot template stored at src/textual_query_augmentation/decrease_naturalness/prompts_templates/fewshot-decrease-nl-naturalness. The model returns a JSON object with an updated_question and updated_evidence field:

def generate_less_natural_nl(
    question,
    evidence,
    schema_mapping,
    model_name=OpenAIModel.GPT_4O,
    seed=42,
    temperature=0,
):
    prompt_renderer = PromptRenderer(
        templates_dir_path="src/textual_query_augmentation/decrease_naturalness/prompts_templates/"
    )
    llm = ModelManager.create_model(
        model_provider=ModelProvider.OPENAI,
        model_type=ModelType.COMPLETION,
        model_name=model_name,
        openai_api_key=os.getenv("OPENAI_API_KEY", None),
    )

    formatted_schema_mapping = format_schema_mapping(schema_mapping)
    prompt_context = {
        "_SCHEMA_MAPPING_": formatted_schema_mapping,
        "_QUESTION_": question,
        "_EVIDENCE_": evidence,
    }
    prompt = prompt_renderer.render("fewshot-decrease-nl-naturalness", prompt_context)
    messages = compose_chat_messages(user_messages=[prompt])
    llm_response = llm.get_chat_completion(messages=messages, seed=seed, temperature=temperature)

    response_json = json.loads(llm_response["completion_content"][0].strip())
    new_question = response_json.get("updated_question", question)
    new_evidence = response_json.get("updated_evidence", evidence)
    return new_question, new_evidence

TQA uses temperature=0 and a fixed seed=42 by default to ensure reproducible outputs. The same NL question will always produce the same altered version when using the same schema mapping and model.

The full pipeline: `process_nl_queries`

The process_nl_queries function orchestrates the end-to-end TQA pipeline for a batch of queries. It reads a CSV of original queries, loads the schema mappings, checks for already-processed entries (so it can resume interrupted runs), and writes results incrementally to an output CSV:

def process_nl_queries(
    queries_csv_path,
    mapping_csv_path,
    output_csv_path,
    model_name=OpenAIModel.GPT_4O,
    seed=42,
    temperature=0,
):
    db_mappings = read_schema_mapping(mapping_csv_path)
    df = pd.read_csv(queries_csv_path)
    existing_results = load_existing_results(output_csv_path)

    for idx, row in tqdm(df.iterrows(), total=len(df)):
        db_id = row["db_id"]
        schema_mapping = db_mappings.get(db_id, {})

        new_question, new_evidence = generate_less_natural_nl(
            row["original_question"],
            row["original_evidence"],
            schema_mapping,
            model_name,
            seed,
            temperature,
        )

        write_result_to_csv(output_csv_path, row_data, is_first_write)

The output CSV preserves both the original_question and new_question columns, so you can compare the two versions directly and trace how the LLM rewrote each query.

How the pipeline fits together

Prepare schema mappings

Create or obtain the schema mappings CSV (databases_naturalness_decreased_fixed.csv). Each row specifies a db_id, table_name, new_table_name, column_name, and new_column_name. The rename strategy determines the degree of naturalness decrease — partial abbreviations produce moderate difficulty; full obfuscation produces maximum difficulty.

Load the source queries

Provide a CSV of queries to augment (new_sql_queries.csv). Each row must have question_id, db_id, original_question, and original_evidence columns. These are typically a subset of the BIRD development set.

Run the augmentation

Call process_nl_queries with the paths to the query CSV, the mappings CSV, and the desired output path. The function processes queries one at a time, writing results to disk immediately so progress is not lost if the run is interrupted.

from src.textual_query_augmentation.decrease_naturalness.alter_nl_queries import process_nl_queries
from src.core.model_manager import OpenAIModel

stats = process_nl_queries(
    queries_csv_path="data/augmentation/decrease_naturalness/new_sql_queries.csv",
    mapping_csv_path="data/augmentation/decrease_naturalness/databases_naturalness_decreased_fixed.csv",
    output_csv_path="data/augmentation/decrease_naturalness/new_sql_nl_queries.csv",
    model_name=OpenAIModel.GPT_4O,
    seed=42,
    temperature=0,
)

Evaluate a Text-to-SQL system

Feed the augmented NL queries (with the unchanged ground-truth SQL) to a Text-to-SQL system, collect its predicted SQL, and evaluate using the SQLMorph metrics module. Compare EX scores on the original vs augmented queries to measure linguistic sensitivity.

Key finding: abbreviations cause large accuracy drops

Experiments with CHESS, DIN-SQL, and MAC-SQL showed that forcing heavy abbreviations in table and column names — propagated into the NL questions via TQA — caused up to a 17% drop in execution accuracy. This finding indicates that SOTA systems have learned to leverage readable schema names as implicit signals during SQL generation. When those signals are removed or distorted, generation quality degrades significantly.

You can control the degree of linguistic degradation by adjusting the schema mapping CSV. A more aggressive renaming strategy (e.g., replacing full words with single-letter codes) will produce harder perturbations and likely larger accuracy drops.

What happens when there is no schema mapping for a database?

If process_nl_queries encounters a db_id with no entry in the schema mappings CSV, it keeps the original NL question and evidence unchanged and logs a warning. The row is still written to the output CSV with the original values, so the output is always complete.

Can I resume an interrupted TQA run?

Yes. process_nl_queries calls load_existing_results at startup to read any already-processed entries from the output CSV. Rows whose (db_id, question_id) pair is already present are skipped automatically. This makes long batch runs safe to interrupt and restart.

Which OpenAI model does TQA use?

TQA defaults to OpenAIModel.GPT_4O. You can pass a different model_name to process_nl_queries or generate_less_natural_nl to use another supported model. Ensure your OPENAI_API_KEY is set in the environment before running.

Get Started

Core Concepts

Guides

Configuration

Textual Query Augmentation (TQA) concepts

The decrease-naturalness technique

Phase 1: schema renaming

Phase 2: NL query alteration via LLM

The full pipeline: `process_nl_queries`

How the pipeline fits together

Key finding: abbreviations cause large accuracy drops

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

Configuration

Documentation Index

​The decrease-naturalness technique

​Phase 1: schema renaming

​Phase 2: NL query alteration via LLM

​The full pipeline: process_nl_queries

​How the pipeline fits together

​Key finding: abbreviations cause large accuracy drops

Build docs developers (and LLMs) love

The decrease-naturalness technique

Phase 1: schema renaming

Phase 2: NL query alteration via LLM

The full pipeline: `process_nl_queries`

How the pipeline fits together

Key finding: abbreviations cause large accuracy drops