Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt

Use this file to discover all available pages before exploring further.

A semantic layer is a structured representation of your data landscape that sits between raw databases and the consumers that query them. For AI agents, it answers three questions that a bare database schema cannot: what data exists, what it means in business terms, and how it connects to other data. Neocarta builds this layer as a graph in Neo4j and exposes it to agents through an MCP server — giving agents the contextual grounding they need to produce correct, trustworthy queries.

The Problem: Agents Querying Without Context

When an AI agent talks directly to a database, it operates without the context that a human analyst carries. The result is a predictable set of failure modes:

Wrong Assumptions

Column names like cust_id, customer_key, and client_no are all foreign keys to a customers table. Without metadata, an agent has no way to distinguish them from ordinary integer columns.

Missing Joins

Foreign key relationships are often not enforced at the database level, especially in analytical warehouses like BigQuery. An agent that can’t see these relationships will write queries that miss critical joins.

Business Terminology Gap

Business users ask about “net revenue,” “active subscribers,” or “fulfilled orders” — terms that don’t map directly to column names. Agents without a business glossary either fail or hallucinate.

Routing Errors

Modern data platforms spread data across multiple schemas, datasets, and even services. An agent without a catalog doesn’t know which database holds which tables.

How Neocarta Solves This

Neocarta reads metadata from your data sources — schema structure, business glossaries, query history, governance tags, and semantic model definitions — and loads it into a unified graph in Neo4j. Only metadata crosses into Neo4j; your source data always stays in the source. The graph unifies four layers of context:
1

Schema Metadata

Tables, columns, data types, nullability, primary keys, foreign keys, and sample values. This is the structural skeleton — the information needed to write syntactically correct SQL.
2

Business Glossary

Human-readable terms and categories linked to the tables and columns they describe. When a user asks about “net revenue,” the agent can look up the business term, find the columns tagged with it, and build the query around real column names.
3

Query History

Real SQL queries that have been run against the database, with parsed records of which tables and columns they touched. Query history reveals usage patterns that schema alone cannot — which joins are actually valid, which columns appear together, and which CTEs are commonly defined.
4

Metrics and Governance

Governed metric definitions (from OSI semantic models), governance tag assignments (sensitivity, cost center, data classification), and structured semantic models that define how data assets relate across a domain.

What the Graph Enables

Query Routing

The graph records which database platform (e.g., BigQuery) and service holds each schema. When a user asks a question, the agent can discover which database contains the relevant tables before writing a single line of SQL — avoiding queries sent to the wrong system.

Text2SQL with Correct Joins

Foreign key relationships are stored as (:Column)-[:REFERENCES]->(:Column) edges. An agent searching for tables related to “orders” will not just find the orders table — it will also retrieve the columns that link it to customers, products, and line_items, with the join conditions attached. The resulting SQL is built from graph traversal, not from guessing.

Data Discovery

When a user asks “what data do we have about customer churn?” the agent performs a semantic search over the graph, finds tables and columns whose descriptions match the query, and returns a structured summary — including business terms that describe the concept, columns that relate to it, and the schemas they live in. The graph becomes a discoverable data catalog, not a static schema dump.

The MCP Connection

Neocarta exposes the graph to agents as retrieval tools via an MCP server, not as raw Cypher queries. Tools like get_context_by_table_hybrid_search and list_tables_by_schema return structured table-and-column context blocks that any agent can consume without knowing anything about Neo4j or graph traversal. The MCP server probes the target graph at startup and registers only the tools whose backing indexes are present — so an agent never gets a tool it can’t use. Full-text search tools work from schema metadata alone; vector and hybrid search tools activate once embeddings are generated.
The neocarta-mcp server speaks stdio MCP and works with any MCP-compatible agent framework. The same tools are also available as CLI commands under neocarta tool <tool> for shell use or non-MCP agents.

Example: A Question the Agent Gets Right

“Which customers placed the largest orders last quarter?”
Without a semantic layer, an agent might query only the orders table, fail to join to customers, or use the wrong date column. With Neocarta:
  1. The agent calls get_context_by_table_hybrid_search with the text “customers orders.”
  2. The graph returns both the orders table and the customers table, along with the column orders.customer_id, which carries a REFERENCES edge to customers.id.
  3. The agent sees orders.customer_id → customers.id, writes the correct JOIN, applies the date filter using the column marked as the time dimension, and submits the query.
The agent didn’t guess — it followed the graph.

What Stays in the Source

Neocarta is metadata-only. It reads schema information, descriptions, query logs, and semantic model definitions. It does not copy table data, row values (beyond optional sample values for column context), or any personally identifiable or sensitive records. The semantic layer is a map of your data landscape — not a copy of it.
Sample column values (stored as Value nodes) are limited to representative distinct values used for column-context disambiguation. They are ingested explicitly and only when the connector is configured to include them.

Build docs developers (and LLMs) love