Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt

Use this file to discover all available pages before exploring further.

Neocarta is a Python library that builds a semantic layer in Neo4j from your data sources and serves it to AI agents through a Model Context Protocol (MCP) server. Instead of pointing an agent directly at a raw database and hoping it figures out the schema, Neocarta extracts only the metadata — table definitions, column types, foreign-key relationships, business glossary terms, governance tags, and real query history — loads it into a richly connected graph, and gives the agent structured tools to search and traverse that graph. Your data never leaves its source.
Neocarta is a Neo4j Labs project supported by the Neo4j field team. It is experimental and actively developed. The API and graph schema may change between minor versions.

The three-phase workflow

Neocarta follows a simple three-phase pipeline from raw source to agent-ready context.
1

Ingest

A connector reads schema metadata from a supported source (BigQuery, Dataplex, CSV, JDBC, Unity Catalog, Databricks, OSI) and loads it into the Neo4j semantic graph using a shared ETL pipeline of extractors, transformers, and loaders. Only metadata crosses into Neo4j — your data stays in the source.
2

Enrich

An optional embeddings connector generates vector embeddings for the description fields on Database, Schema, Table, Column, and BusinessTerm nodes, writing them into vector indexes. Enrichment unlocks semantic similarity search alongside the standard full-text and catalog retrieval tools.
3

Serve

The MCP server (neocarta-mcp) exposes the semantic graph as a set of retrieval tools. An agent connects to the server and calls tools like list_schemas, get_context_by_table_hybrid_search, or list_tables_by_schema to discover the right tables, follow foreign keys, and build correct queries — without guessing at the schema.

What Neocarta builds

The semantic graph is richer than a plain schema dump. Depending on the connectors you run, it can contain:

Schema metadata

Tables, columns, data types, nullability, primary keys, foreign-key references, and sample column values — the raw structural layer that enables join inference.

Business glossary

Glossaries, categories, and BusinessTerm nodes linked to the tables and columns they describe via TAGGED_WITH relationships — grounding agent answers in authoritative definitions.

Governance tags

GovernanceTag, GovernanceTagKey, and GovernanceTagValue nodes from sources such as Databricks Unity Catalog governed tags and Dataplex metadata types.

Query history

Query nodes parsed from BigQuery INFORMATION_SCHEMA.JOBS_BY_PROJECT or local log files, linked to the tables and columns they touch via USES_TABLE and USES_COLUMN relationships — revealing which parts of the schema actually matter in practice.

Key concepts

NodeLabel and RelationshipType enums

The NodeLabel and RelationshipType enums (exported from neocarta directly) define the canonical graph schema shared by every connector and the MCP server. Using these enums in code — rather than raw strings — is strongly recommended, though their .value strings are also accepted. Core node labels used in the structural schema:
NodeLabel memberNeo4j labelDescription
NodeLabel.DATABASEDatabaseTop-level source database or GCP project
NodeLabel.SCHEMASchemaDataset or schema within a database
NodeLabel.TABLETableA table or view within a schema
NodeLabel.COLUMNColumnA column within a table, with type and constraints
NodeLabel.VALUEValueA sample value observed in a column
Glossary and governance node labels:
NodeLabel memberNeo4j labelDescription
NodeLabel.GLOSSARYGlossaryA named business glossary
NodeLabel.CATEGORYCategoryA category within a glossary
NodeLabel.BUSINESS_TERMBusinessTermA governed business term, embeddable
NodeLabel.GOVERNANCE_TAG_KEYGovernanceTagKeyA governance tag key
NodeLabel.GOVERNANCE_TAG_VALUEGovernanceTagValueA governance tag value
NodeLabel.GOVERNANCE_TAGGovernanceTagA concrete tag instance
Query and OSI node labels:
NodeLabel memberNeo4j labelDescription
NodeLabel.QUERYQueryA parsed SQL query with a content hash
NodeLabel.CTECTEA common table expression within a query
NodeLabel.DOMAINDomainAn OSI domain container
NodeLabel.METRICMetricA governed metric definition
Core relationship types:
RelationshipType memberCypher patternDescription
HAS_SCHEMA(:Database)-[:HAS_SCHEMA]->(:Schema)Database owns a schema
HAS_TABLE(:Schema)-[:HAS_TABLE]->(:Table)Schema contains a table
HAS_COLUMN(:Table)-[:HAS_COLUMN]->(:Column)Table contains a column
HAS_VALUE(:Column)-[:HAS_VALUE]->(:Value)Column has a sample value
REFERENCES(:Column)-[:REFERENCES]->(:Column)Foreign-key reference
TAGGED_WITH(:Table|:Column)-[:TAGGED_WITH]->(:BusinessTerm)Governance annotation
USES_TABLE(:Query)-[:USES_TABLE]->(:Table)Query references a table
USES_COLUMN(:Query)-[:USES_COLUMN]->(:Column)Query references a column

How all connectors share the schema

Every connector — regardless of source — transforms its native metadata into this canonical schema before loading. This means the MCP server and its retrieval tools work identically whether the underlying data came from BigQuery, a CSV file, JDBC, or an OSI YAML spec.

Supported sources

BigQuery

Two connectors: BigQuerySchemaConnector reads INFORMATION_SCHEMA tables for database, schema, table, column, and foreign-key metadata; BigQueryLogsConnector parses INFORMATION_SCHEMA.JOBS_BY_PROJECT for real query history.

GCP Dataplex

DataplexSchemaConnector reads BigQuery metadata surfaced through Dataplex Universal Catalog; DataplexGlossaryConnector ingests the full Dataplex business glossary including categories, terms, and column-level TAGGED_WITH links.

CSV files

CSVConnector loads metadata from a directory of structured CSV files following a standard naming convention. The bundled sample e-commerce dataset (datasets/csv/) is the fastest way to get started — no cloud account needed.

JDBC

JDBCConnector uses SchemaCrawler under the hood to extract schema metadata from any JDBC-compatible database (PostgreSQL, MySQL, Oracle, SQL Server, and others). Requires Java 11+ and a JDBC driver JAR.

Unity Catalog

UnityCatalogConnector reads catalog, schema, table, and column metadata from any Unity Catalog-conformant server via the open UC REST API — works with both open-source and managed Unity Catalog.

Databricks

DatabricksConnector reads governed-tag definitions from managed Databricks Unity Catalog via the Databricks SDK. Requires the neocarta[databricks] extra and a Databricks personal access token.

OSI (Open Semantic Interchange)

OsiConnector is a bidirectional connector for the OSI YAML spec. It ingests semantic models (tables, columns, metrics, joins, AI context, business terms) from a local path or HTTPS URL, and can export a subgraph back to a spec-compliant OSI YAML file.

Query Log files

QueryLogConnector parses local query-log JSON files (distinct from the live BigQuery Logs connector) to load Query, CTE, and table/column reference relationships from exported logs.

Prerequisites

Before using Neocarta you will need the following:
  • Python 3.10 or higher — Python 3.11+ is required if you use the [performance] extra.
  • A running Neo4j instance — any of the three options below work:
    • Neo4j AuraDB — managed cloud service with a free tier.
    • Neo4j Desktop — local GUI-based instance for development.
    • Docker — lightweight local instance, no installer needed.
  • Source credentials — relevant API keys or service account credentials for the data source you intend to ingest (e.g. a GCP service account for BigQuery, a Databricks PAT for the Databricks connector).
  • An embedding provider key (optional) — required only if you want to generate embeddings. OPENAI_API_KEY is the most common; LiteLLM supports Gemini, Cohere, Bedrock, Azure OpenAI, and others.

Build docs developers (and LLMs) love