Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/dataease/SQLBot/llms.txt

Use this file to discover all available pages before exploring further.

Data training is a library of example pairs: a natural language question matched with the correct SQL query that answers it. When you submit a question in chat, SQLBot searches this library using semantic similarity and retrieves the most relevant examples. Those examples are included in the LLM prompt as few-shot demonstrations — concrete evidence of the correct SQL patterns, table names, and join logic specific to your database. The more relevant examples SQLBot can find, the more accurately it generates new queries.

Why data training matters

Out of the box, SQLBot relies on table structure, column descriptions, and terminology entries to understand your schema. For simple queries (“How many orders did we receive today?”), this is often sufficient. But complex queries involving multi-table joins, subqueries, window functions, or company-specific conventions are harder to get right from schema context alone. Data training examples give SQLBot a direct reference for patterns it has seen before.

Complex JOINs

If your schema requires a specific sequence of joins across four tables to answer revenue questions, an example showing that join pattern is far more reliable than leaving the LLM to infer it.

Unusual conventions

Tables with non-standard naming, status columns stored as integers, or composite primary keys are difficult to use correctly without examples showing the right approach.

Aggregate patterns

Examples for common aggregation questions (daily active users, 7-day rolling averages, cohort retention) save the LLM from having to reconstruct these patterns from scratch every time.

Filtered subsets

Questions that always need a specific WHERE clause (for example, excluding test accounts or internal users) are much more reliably handled with an example that demonstrates the required filter.

How examples are retrieved

When you ask a question, SQLBot embeds your question as a vector and searches the data training library for the most semantically similar stored questions. The top matching examples are injected into the prompt as context. Examples are stored in the data_training table with a pgvector embedding and matched via similarity search at query time.
Examples are workspace-scoped. Training examples created in one workspace do not affect query generation in other workspaces.

Adding an example

1

Open the Data Training section

In the left sidebar, navigate to Data Training. The list shows all examples already saved in your workspace.
2

Click Add

Click Add to open the example editor.
3

Write the question

In the Question field, write the natural language question exactly as a user is likely to ask it. Use realistic phrasing — the similarity search compares your users’ actual questions against these stored questions.
What was the total revenue by product category for Q1 2025?
4

Write the correct SQL

In the SQL field, write the query that correctly answers the question using your actual table and column names. The SQL should be complete and executable.
SELECT
    p.category,
    SUM(o.amount) AS total_revenue
FROM orders o
JOIN products p ON o.product_id = p.id
WHERE o.created_at >= '2025-01-01'
  AND o.created_at < '2025-04-01'
  AND o.status = 'completed'
GROUP BY p.category
ORDER BY total_revenue DESC
Always test the SQL against your actual database before saving it as a training example. An incorrect example teaches SQLBot the wrong pattern.
5

Select the datasource

Choose which datasource this example applies to. Examples are matched to the active datasource in the chat session — an example linked to your production database will not be retrieved during queries against your analytics warehouse.
6

Save

Click Save. SQLBot generates an embedding for the question and the example is immediately available for retrieval.

Best practices for training examples

Start with the questions your team asks most frequently. Ten well-chosen examples for your top query types will deliver more value than a hundred examples covering rare edge cases.
Each table-to-table relationship that users might query across is worth at least one example. SQLBot learns from the join condition, the alias conventions, and the column references in each example.
-- Example: joining users to their workspace assignments
SELECT u.name, w.name AS workspace
FROM users u
JOIN user_ws uw ON u.id = uw.uid
JOIN workspaces w ON uw.oid = w.id
WHERE u.status = 'active'
If certain queries should always exclude test records, internal users, or deleted rows, add an example that demonstrates the required filter:
-- Always exclude test accounts when counting active users
SELECT COUNT(DISTINCT id)
FROM users
WHERE status = 'active'
  AND is_test_account = false
  AND deleted_at IS NULL
Date handling varies between database engines. Examples that show the correct syntax for your specific database prevent the LLM from using functions that work in MySQL but not in your PostgreSQL instance:
-- PostgreSQL: last 30 days
SELECT COUNT(*) FROM orders
WHERE created_at >= NOW() - INTERVAL '30 days'

-- MySQL equivalent
SELECT COUNT(*) FROM orders
WHERE created_at >= DATE_SUB(NOW(), INTERVAL 30 DAY)
The question text in a training example is what gets compared against incoming user questions. Write questions the way your actual users talk, not the way a database engineer would describe a query. Multiple variations covering different phrasings of the same question help retrieval accuracy.

Bulk importing examples

If you have many examples to add, import them from a spreadsheet:
1

Download the template

Click Download template to get an .xlsx file. The template includes sample data showing the correct structure:
Column 1: Question (natural language)
Column 2: SQL (the correct query)
Column 3: Datasource name
2

Fill in the spreadsheet

Each row is one question-SQL pair. The datasource name must match the exact name of a datasource in your workspace.
3

Upload

Click Import and select your filled spreadsheet. SQLBot processes each row and reports the number of examples successfully created, duplicated, and failed.
If any rows fail — for example because the datasource name does not match — SQLBot produces a downloadable error report showing which rows failed and why.

Enabling and disabling examples

Each example has an enabled/disabled toggle. Disabling an example removes it from retrieval without deleting it. This is useful when a schema change makes an example temporarily incorrect — disable it during the migration, update the SQL, then re-enable it.

Exporting examples

Click Export to download all examples in your workspace as an .xlsx file. Use this for backup, sharing example sets with a new workspace, or reviewing your training coverage offline.

Build docs developers (and LLMs) love