This guide walks through building a Text2SQL agent that converts natural language questions into BigQuery SQL. The agent uses two MCP servers in tandem: the Neocarta MCP server for semantic schema discovery (finding the right tables, understanding foreign keys, and resolving business terms) and the BigQuery MCP server for executing the generated queries against live data. The full runnable source is inDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt
Use this file to discover all available pages before exploring further.
run_agent.py and agent/agent.py in the Neocarta repository.
Architecture Overview
The agent runs locally. It connects to a Neo4j graph containing your BigQuery schema metadata (ingested by Neocarta) and to the Google-hosted BigQuery MCP endpoint for query execution. An LLM (via LiteLLM) handles the reasoning and SQL generation step. How it works, step by step:User asks a question
The user types a natural language question in the terminal chat loop — for example, “Which customers placed the largest orders last quarter?”
Agent retrieves schema context
The agent calls the Neocarta MCP server with the question text. The server performs a hybrid vector + full-text search over the semantic graph and returns the most relevant tables with their columns, data types, sample values, and foreign key references.
LLM generates SQL
The agent passes the retrieved schema context and the original question to the LLM. The LLM writes a BigQuery SQL query grounded in the actual table and column names from the graph.
Agent executes the query
The agent calls
execute_sql on the BigQuery MCP server with the generated SQL and the project ID. BigQuery runs the query and returns the results.Prerequisites
Before running the agent, ensure the following are in place:Neocarta installed
Install with the
cli and mcp extras:BigQuery schema ingested
Run the schema connector to load your BigQuery dataset into Neo4j:
LangGraph dependencies
Install agent dependencies from the repository:
Google Cloud credentials
Authenticate with Application Default Credentials:
Environment Variables
Add the following to a.env file in the project root:
AGENT_MODEL accepts any LiteLLM model ID, including gemini/gemini-1.5-pro, anthropic/claude-3-5-sonnet-20241022, and Azure OpenAI deployment names.
BigQuery MCP Setup
Enable the BigQuery MCP server on your GCP project:GoogleAuth class in run_agent.py handles token refresh automatically:
Agent Factory
The agent is created withcreate_text2sql_agent in agent/agent.py. It uses LangGraph’s create_agent with a LiteLLM-backed chat model and in-memory conversation history:
MCP Server Configuration
run_agent.py configures both MCP servers and passes their tools to the agent. Neocarta tools are accepted in full; only the execute_sql tool is allowlisted from BigQuery to prevent the agent from calling administrative operations.
Running the Agent
From the repository root, start the interactive chat session:> prompt. Type a question and press Enter. Type exit, quit, or q to stop.
Example Interaction
get_context_by_table_hybrid_search and retrieved both orders and customers tables. The graph showed that orders.customer_id carries a REFERENCES edge to customers.id, so the agent wrote the correct JOIN and applied the quarterly date filter using the column flagged as the time dimension.
The specific MCP tool the agent chooses depends on which indexes are present in your Neo4j graph. If embeddings were generated with
--embeddings, the server registers hybrid search tools. Without embeddings, full-text search tools are registered instead. Both produce valid schema context.Available Neocarta MCP Tools
The MCP server self-configures based on your graph’s indexes. Depending on what’s present, the agent may have access to:| Tool | Requires |
|---|---|
list_schemas | Any ingested schema |
list_tables_by_schema | Any ingested schema |
get_full_metadata_schema | Any ingested schema |
get_context_by_table_full_text_search | Full-text index |
get_context_by_column_full_text_search | Full-text index |
get_context_by_table_vector_search | Vector index (embeddings) |
get_context_by_column_vector_search | Vector index (embeddings) |
get_context_by_table_hybrid_search | Both full-text and vector indexes |
get_context_by_column_hybrid_search | Both full-text and vector indexes |
get_context_by_table_business_term_hybrid_search | Both indexes + glossary |
get_context_by_column_business_term_hybrid_search | Both indexes + glossary |