Neocarta is a Python library that builds a semantic layer in Neo4j from your data sources and serves it to AI agents through a Model Context Protocol (MCP) server. Instead of pointing an agent directly at a raw database and hoping it figures out the schema, Neocarta extracts only the metadata — table definitions, column types, foreign-key relationships, business glossary terms, governance tags, and real query history — loads it into a richly connected graph, and gives the agent structured tools to search and traverse that graph. Your data never leaves its source.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt
Use this file to discover all available pages before exploring further.
Neocarta is a Neo4j Labs project supported by the Neo4j field team. It is experimental and actively developed. The API and graph schema may change between minor versions.
The three-phase workflow
Neocarta follows a simple three-phase pipeline from raw source to agent-ready context.Ingest
A connector reads schema metadata from a supported source (BigQuery, Dataplex, CSV, JDBC, Unity Catalog, Databricks, OSI) and loads it into the Neo4j semantic graph using a shared ETL pipeline of extractors, transformers, and loaders. Only metadata crosses into Neo4j — your data stays in the source.
Enrich
An optional embeddings connector generates vector embeddings for the
description fields on Database, Schema, Table, Column, and BusinessTerm nodes, writing them into vector indexes. Enrichment unlocks semantic similarity search alongside the standard full-text and catalog retrieval tools.Serve
The MCP server (
neocarta-mcp) exposes the semantic graph as a set of retrieval tools. An agent connects to the server and calls tools like list_schemas, get_context_by_table_hybrid_search, or list_tables_by_schema to discover the right tables, follow foreign keys, and build correct queries — without guessing at the schema.What Neocarta builds
The semantic graph is richer than a plain schema dump. Depending on the connectors you run, it can contain:Schema metadata
Tables, columns, data types, nullability, primary keys, foreign-key references, and sample column values — the raw structural layer that enables join inference.
Business glossary
Glossaries, categories, and
BusinessTerm nodes linked to the tables and columns they describe via TAGGED_WITH relationships — grounding agent answers in authoritative definitions.Governance tags
GovernanceTag, GovernanceTagKey, and GovernanceTagValue nodes from sources such as Databricks Unity Catalog governed tags and Dataplex metadata types.Query history
Query nodes parsed from BigQuery INFORMATION_SCHEMA.JOBS_BY_PROJECT or local log files, linked to the tables and columns they touch via USES_TABLE and USES_COLUMN relationships — revealing which parts of the schema actually matter in practice.Key concepts
NodeLabel and RelationshipType enums
TheNodeLabel and RelationshipType enums (exported from neocarta directly) define the canonical graph schema shared by every connector and the MCP server. Using these enums in code — rather than raw strings — is strongly recommended, though their .value strings are also accepted.
Core node labels used in the structural schema:
NodeLabel member | Neo4j label | Description |
|---|---|---|
NodeLabel.DATABASE | Database | Top-level source database or GCP project |
NodeLabel.SCHEMA | Schema | Dataset or schema within a database |
NodeLabel.TABLE | Table | A table or view within a schema |
NodeLabel.COLUMN | Column | A column within a table, with type and constraints |
NodeLabel.VALUE | Value | A sample value observed in a column |
NodeLabel member | Neo4j label | Description |
|---|---|---|
NodeLabel.GLOSSARY | Glossary | A named business glossary |
NodeLabel.CATEGORY | Category | A category within a glossary |
NodeLabel.BUSINESS_TERM | BusinessTerm | A governed business term, embeddable |
NodeLabel.GOVERNANCE_TAG_KEY | GovernanceTagKey | A governance tag key |
NodeLabel.GOVERNANCE_TAG_VALUE | GovernanceTagValue | A governance tag value |
NodeLabel.GOVERNANCE_TAG | GovernanceTag | A concrete tag instance |
NodeLabel member | Neo4j label | Description |
|---|---|---|
NodeLabel.QUERY | Query | A parsed SQL query with a content hash |
NodeLabel.CTE | CTE | A common table expression within a query |
NodeLabel.DOMAIN | Domain | An OSI domain container |
NodeLabel.METRIC | Metric | A governed metric definition |
RelationshipType member | Cypher pattern | Description |
|---|---|---|
HAS_SCHEMA | (:Database)-[:HAS_SCHEMA]->(:Schema) | Database owns a schema |
HAS_TABLE | (:Schema)-[:HAS_TABLE]->(:Table) | Schema contains a table |
HAS_COLUMN | (:Table)-[:HAS_COLUMN]->(:Column) | Table contains a column |
HAS_VALUE | (:Column)-[:HAS_VALUE]->(:Value) | Column has a sample value |
REFERENCES | (:Column)-[:REFERENCES]->(:Column) | Foreign-key reference |
TAGGED_WITH | (:Table|:Column)-[:TAGGED_WITH]->(:BusinessTerm) | Governance annotation |
USES_TABLE | (:Query)-[:USES_TABLE]->(:Table) | Query references a table |
USES_COLUMN | (:Query)-[:USES_COLUMN]->(:Column) | Query references a column |
How all connectors share the schema
Every connector — regardless of source — transforms its native metadata into this canonical schema before loading. This means the MCP server and its retrieval tools work identically whether the underlying data came from BigQuery, a CSV file, JDBC, or an OSI YAML spec.Supported sources
BigQuery
Two connectors: BigQuerySchemaConnector reads
INFORMATION_SCHEMA tables for database, schema, table, column, and foreign-key metadata; BigQueryLogsConnector parses INFORMATION_SCHEMA.JOBS_BY_PROJECT for real query history.GCP Dataplex
DataplexSchemaConnector reads BigQuery metadata surfaced through Dataplex Universal Catalog; DataplexGlossaryConnector ingests the full Dataplex business glossary including categories, terms, and column-level
TAGGED_WITH links.CSV files
CSVConnector loads metadata from a directory of structured CSV files following a standard naming convention. The bundled sample e-commerce dataset (
datasets/csv/) is the fastest way to get started — no cloud account needed.JDBC
JDBCConnector uses SchemaCrawler under the hood to extract schema metadata from any JDBC-compatible database (PostgreSQL, MySQL, Oracle, SQL Server, and others). Requires Java 11+ and a JDBC driver JAR.
Unity Catalog
UnityCatalogConnector reads catalog, schema, table, and column metadata from any Unity Catalog-conformant server via the open UC REST API — works with both open-source and managed Unity Catalog.
Databricks
DatabricksConnector reads governed-tag definitions from managed Databricks Unity Catalog via the Databricks SDK. Requires the
neocarta[databricks] extra and a Databricks personal access token.OSI (Open Semantic Interchange)
OsiConnector is a bidirectional connector for the OSI YAML spec. It ingests semantic models (tables, columns, metrics, joins, AI context, business terms) from a local path or HTTPS URL, and can export a subgraph back to a spec-compliant OSI YAML file.
Query Log files
QueryLogConnector parses local query-log JSON files (distinct from the live BigQuery Logs connector) to load
Query, CTE, and table/column reference relationships from exported logs.Prerequisites
Before using Neocarta you will need the following:- Python 3.10 or higher — Python 3.11+ is required if you use the
[performance]extra. - A running Neo4j instance — any of the three options below work:
- Neo4j AuraDB — managed cloud service with a free tier.
- Neo4j Desktop — local GUI-based instance for development.
- Docker — lightweight local instance, no installer needed.
- Source credentials — relevant API keys or service account credentials for the data source you intend to ingest (e.g. a GCP service account for BigQuery, a Databricks PAT for the Databricks connector).
- An embedding provider key (optional) — required only if you want to generate embeddings.
OPENAI_API_KEYis the most common; LiteLLM supports Gemini, Cohere, Bedrock, Azure OpenAI, and others.