Neocarta: Semantic Layer Graphs for AI Agents

Neocarta is a Python library that builds a semantic layer in Neo4j from your data sources and serves it to AI agents through a Model Context Protocol (MCP) server. Instead of pointing an agent directly at a raw database and hoping it figures out the schema, Neocarta extracts only the metadata — table definitions, column types, foreign-key relationships, business glossary terms, governance tags, and real query history — loads it into a richly connected graph, and gives the agent structured tools to search and traverse that graph. Your data never leaves its source.

Neocarta is a Neo4j Labs project supported by the Neo4j field team. It is experimental and actively developed. The API and graph schema may change between minor versions.

The three-phase workflow

Neocarta follows a simple three-phase pipeline from raw source to agent-ready context.

Ingest

A connector reads schema metadata from a supported source (BigQuery, Dataplex, CSV, JDBC, Unity Catalog, Databricks, OSI) and loads it into the Neo4j semantic graph using a shared ETL pipeline of extractors, transformers, and loaders. Only metadata crosses into Neo4j — your data stays in the source.

Enrich

An optional embeddings connector generates vector embeddings for the description fields on Database, Schema, Table, Column, and BusinessTerm nodes, writing them into vector indexes. Enrichment unlocks semantic similarity search alongside the standard full-text and catalog retrieval tools.

Serve

The MCP server (neocarta-mcp) exposes the semantic graph as a set of retrieval tools. An agent connects to the server and calls tools like list_schemas, get_context_by_table_hybrid_search, or list_tables_by_schema to discover the right tables, follow foreign keys, and build correct queries — without guessing at the schema.

What Neocarta builds

The semantic graph is richer than a plain schema dump. Depending on the connectors you run, it can contain:

Schema metadata

Tables, columns, data types, nullability, primary keys, foreign-key references, and sample column values — the raw structural layer that enables join inference.

Business glossary

Glossaries, categories, and BusinessTerm nodes linked to the tables and columns they describe via TAGGED_WITH relationships — grounding agent answers in authoritative definitions.

Governance tags

GovernanceTag, GovernanceTagKey, and GovernanceTagValue nodes from sources such as Databricks Unity Catalog governed tags and Dataplex metadata types.

Query history

Query nodes parsed from BigQuery INFORMATION_SCHEMA.JOBS_BY_PROJECT or local log files, linked to the tables and columns they touch via USES_TABLE and USES_COLUMN relationships — revealing which parts of the schema actually matter in practice.

Key concepts

NodeLabel and RelationshipType enums

The NodeLabel and RelationshipType enums (exported from neocarta directly) define the canonical graph schema shared by every connector and the MCP server. Using these enums in code — rather than raw strings — is strongly recommended, though their .value strings are also accepted. Core node labels used in the structural schema:

`NodeLabel` member	Neo4j label	Description
`NodeLabel.DATABASE`	`Database`	Top-level source database or GCP project
`NodeLabel.SCHEMA`	`Schema`	Dataset or schema within a database
`NodeLabel.TABLE`	`Table`	A table or view within a schema
`NodeLabel.COLUMN`	`Column`	A column within a table, with type and constraints
`NodeLabel.VALUE`	`Value`	A sample value observed in a column

Glossary and governance node labels:

`NodeLabel` member	Neo4j label	Description
`NodeLabel.GLOSSARY`	`Glossary`	A named business glossary
`NodeLabel.CATEGORY`	`Category`	A category within a glossary
`NodeLabel.BUSINESS_TERM`	`BusinessTerm`	A governed business term, embeddable
`NodeLabel.GOVERNANCE_TAG_KEY`	`GovernanceTagKey`	A governance tag key
`NodeLabel.GOVERNANCE_TAG_VALUE`	`GovernanceTagValue`	A governance tag value
`NodeLabel.GOVERNANCE_TAG`	`GovernanceTag`	A concrete tag instance

Query and OSI node labels:

`NodeLabel` member	Neo4j label	Description
`NodeLabel.QUERY`	`Query`	A parsed SQL query with a content hash
`NodeLabel.CTE`	`CTE`	A common table expression within a query
`NodeLabel.DOMAIN`	`Domain`	An OSI domain container
`NodeLabel.METRIC`	`Metric`	A governed metric definition

Core relationship types:

`RelationshipType` member	Cypher pattern	Description
`HAS_SCHEMA`	`(:Database)-[:HAS_SCHEMA]->(:Schema)`	Database owns a schema
`HAS_TABLE`	`(:Schema)-[:HAS_TABLE]->(:Table)`	Schema contains a table
`HAS_COLUMN`	`(:Table)-[:HAS_COLUMN]->(:Column)`	Table contains a column
`HAS_VALUE`	`(:Column)-[:HAS_VALUE]->(:Value)`	Column has a sample value
`REFERENCES`	`(:Column)-[:REFERENCES]->(:Column)`	Foreign-key reference
`TAGGED_WITH`	`(:Table\|:Column)-[:TAGGED_WITH]->(:BusinessTerm)`	Governance annotation
`USES_TABLE`	`(:Query)-[:USES_TABLE]->(:Table)`	Query references a table
`USES_COLUMN`	`(:Query)-[:USES_COLUMN]->(:Column)`	Query references a column

Every connector — regardless of source — transforms its native metadata into this canonical schema before loading. This means the MCP server and its retrieval tools work identically whether the underlying data came from BigQuery, a CSV file, JDBC, or an OSI YAML spec.

Supported sources

BigQuery

Two connectors: BigQuerySchemaConnector reads INFORMATION_SCHEMA tables for database, schema, table, column, and foreign-key metadata; BigQueryLogsConnector parses INFORMATION_SCHEMA.JOBS_BY_PROJECT for real query history.

GCP Dataplex

DataplexSchemaConnector reads BigQuery metadata surfaced through Dataplex Universal Catalog; DataplexGlossaryConnector ingests the full Dataplex business glossary including categories, terms, and column-level TAGGED_WITH links.

CSV files

CSVConnector loads metadata from a directory of structured CSV files following a standard naming convention. The bundled sample e-commerce dataset (datasets/csv/) is the fastest way to get started — no cloud account needed.

JDBC

JDBCConnector uses SchemaCrawler under the hood to extract schema metadata from any JDBC-compatible database (PostgreSQL, MySQL, Oracle, SQL Server, and others). Requires Java 11+ and a JDBC driver JAR.

Unity Catalog

UnityCatalogConnector reads catalog, schema, table, and column metadata from any Unity Catalog-conformant server via the open UC REST API — works with both open-source and managed Unity Catalog.

Databricks

DatabricksConnector reads governed-tag definitions from managed Databricks Unity Catalog via the Databricks SDK. Requires the neocarta[databricks] extra and a Databricks personal access token.

OSI (Open Semantic Interchange)

OsiConnector is a bidirectional connector for the OSI YAML spec. It ingests semantic models (tables, columns, metrics, joins, AI context, business terms) from a local path or HTTPS URL, and can export a subgraph back to a spec-compliant OSI YAML file.

Query Log files

QueryLogConnector parses local query-log JSON files (distinct from the live BigQuery Logs connector) to load Query, CTE, and table/column reference relationships from exported logs.

Prerequisites

Before using Neocarta you will need the following:

Python 3.10 or higher — Python 3.11+ is required if you use the [performance] extra.
A running Neo4j instance — any of the three options below work:
- Neo4j AuraDB — managed cloud service with a free tier.
- Neo4j Desktop — local GUI-based instance for development.
- Docker — lightweight local instance, no installer needed.
Source credentials — relevant API keys or service account credentials for the data source you intend to ingest (e.g. a GCP service account for BigQuery, a Databricks PAT for the Databricks connector).
An embedding provider key (optional) — required only if you want to generate embeddings. OPENAI_API_KEY is the most common; LiteLLM supports Gemini, Cohere, Bedrock, Azure OpenAI, and others.

Get Started

Connectors

Enrichment

MCP Server

CLI Reference

Neocarta: Semantic Layer Graphs for AI Agents

The three-phase workflow

What Neocarta builds

Schema metadata

Business glossary

Governance tags

Query history

Key concepts

NodeLabel and RelationshipType enums

Supported sources

BigQuery

GCP Dataplex

CSV files

JDBC

Unity Catalog

Databricks

OSI (Open Semantic Interchange)

Query Log files

Prerequisites

Build docs developers (and LLMs) love

Get Started

Connectors

Enrichment

MCP Server

CLI Reference

Documentation Index

​The three-phase workflow

​What Neocarta builds

Schema metadata

Business glossary

Governance tags

Query history

​Key concepts

​NodeLabel and RelationshipType enums

​How all connectors share the schema

​Supported sources

BigQuery

GCP Dataplex

CSV files

JDBC

Unity Catalog

Databricks

OSI (Open Semantic Interchange)

Query Log files

​Prerequisites

Build docs developers (and LLMs) love

The three-phase workflow

What Neocarta builds

Key concepts

NodeLabel and RelationshipType enums

How all connectors share the schema

Supported sources

Prerequisites