Documentation Index
Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt
Use this file to discover all available pages before exploring further.
Every Neocarta connector transforms its source metadata into a shared graph schema stored in Neo4j. This shared schema is what makes connectors interoperable — a glossary loaded from Dataplex attaches to tables ingested from BigQuery because both use the same TAGGED_WITH relationship and the same Table node label. Every connector must conform to this schema to be compatible with the MCP server and the Neocarta CLI tooling.
The data model is organized into layers of concern. The structural RDBMS layer forms the foundation; the glossary, governance, query, and OSI layers extend it.
Core Schema (RDBMS)
The structural core models the hierarchy of a relational database from platform down to individual column values.
Node Properties
Database
Schema
Table
Column
Value
| Property | Type | Notes |
|---|
id | STRING | Unique key; generated via generate_id |
name | STRING | Database or project name |
description | STRING | null | Human-readable description; backed by full-text and vector indexes when populated |
embedding | VECTOR | null | Embedding of description; present only after running an embeddings connector |
platform | STRING | null | Platform name, uppercased (e.g., GCP) |
service | STRING | null | Service name, uppercased (e.g., BIGQUERY) |
| Property | Type | Notes |
|---|
id | STRING | Unique key |
name | STRING | Schema or dataset name |
description | STRING | null | Human-readable description |
embedding | VECTOR | null | Embedding of description |
| Property | Type | Notes |
|---|
id | STRING | Unique key |
name | STRING | Table name |
description | STRING | null | Human-readable description |
embedding | VECTOR | null | Embedding of description |
| Property | Type | Notes |
|---|
id | STRING | Unique key |
name | STRING | Column name |
description | STRING | null | Human-readable description |
embedding | VECTOR | null | Embedding of description |
type | STRING | null | Data type; may be absent for query-log-derived columns |
nullable | BOOLEAN | Defaults to true |
is_primary_key | BOOLEAN | null | null when source exposes no key metadata |
is_foreign_key | BOOLEAN | null | null when source exposes no key metadata |
| Property | Type | Notes |
|---|
id | STRING | Unique key |
value | STRING | A sample distinct value from the column |
Core Relationships
| Pattern | Meaning |
|---|
(:Database)-[:HAS_SCHEMA]->(:Schema) | Database-to-schema hierarchy |
(:Schema)-[:HAS_TABLE]->(:Table) | Schema-to-table hierarchy |
(:Table)-[:HAS_COLUMN]->(:Column) | Table-to-column membership |
(:Column)-[:HAS_VALUE]->(:Value) | Sample values for column disambiguation |
(:Column)-[:REFERENCES]->(:Column) | Foreign key — the target column is the referenced primary/unique key. Carries an optional criteria property holding the join condition. |
Glossary / Business Terms
The glossary layer links business vocabulary to the structural nodes it describes. It is populated by the Dataplex glossary connector, the OSI connector (via synonyms), and the CSV connector.
Glossary Node Properties
| Node | Key Properties |
|---|
Glossary | id, name, description, resource_path (e.g., Dataplex resource name) |
Category | id, name, description, resource_path |
BusinessTerm | id, name, description, embedding, resource_path |
BusinessTerm nodes are MERGEd on name, so terms from different sources (Dataplex catalog, OSI synonyms, CSV) collide cleanly into a single node.
Query History
Query history records real SQL queries and the schema objects they touched. It is populated by the BigQuery logs connector, the query-log file connector, and the OSI connector (for SQL-sourced datasets).
| Node | Key Properties |
|---|
Query | id, content (SQL text), name (logical name, if available), description, embedding |
CTE | id, name (alias), definition (inner SELECT body), query_id |
| Relationship | Meaning |
|---|
(:Query)-[:USES_TABLE]->(:Table) | Table referenced in the query |
(:Query)-[:USES_COLUMN]->(:Column) | Column referenced in the query |
(:Query)-[:DEFINES]->(:CTE) | CTE defined inline in the query |
CTE nodes represent query-scoped virtual tables — they are distinct from catalog Table nodes so that agents can distinguish real tables from inline sub-queries.
The governance layer attaches controlled-vocabulary labels to schema objects for policy, classification, and ownership. It is vendor-neutral and models patterns from Databricks Unity Catalog governed tags, Snowflake object tags, and GCP resource Tags.
The model has two layers:
- Definition layer —
GovernanceTagKey and GovernanceTagValue describe which tags exist, what they mean, and (for governed platforms) which values are allowed. This surface is full-text and vector searchable.
- Instance layer —
GovernanceTag represents a single applied (key, value) assignment on a schema object. When the value is governed, it links to its GovernanceTagValue definition via HAS_DEFINITION; a missing link means the value is free-form.
| Node | Key Properties |
|---|
GovernanceTagKey | id, name (e.g., sensitivity), description, embedding |
GovernanceTagValue | id, name (e.g., pii), description (optional) |
GovernanceTag | id, key, value (denormalized for single-hop lookups) |
| Relationship | Meaning |
|---|
(:GovernanceTagKey)-[:HAS_VALUE_OPTION]->(:GovernanceTagValue) | Allowed values for a governed tag key |
(:Column)-[:TAGGED_WITH]->(:GovernanceTag) | Column carries this tag assignment |
(:Table)-[:TAGGED_WITH]->(:GovernanceTag) | Table carries this tag assignment |
(:Schema)-[:TAGGED_WITH]->(:GovernanceTag) | Schema carries this tag assignment |
(:GovernanceTag)-[:HAS_DEFINITION]->(:GovernanceTagValue) | Applied value matches a governed definition (optional) |
Both the glossary and governance layers use the TAGGED_WITH relationship type, but they target different node labels: BusinessTerm for the glossary layer and GovernanceTag for the governance layer.
OSI Semantic Model
The OSI layer models the Open Semantic Interchange format — a YAML-based interchange for semantic models. It extends the structural core with domain containers, metric definitions, join specifications, and AI-context aspects.
Key nodes:
| Node | Description |
|---|
OsiSemanticModel (subtype of Domain) | Top-level container for a full OSI spec instance. Stored as (:Domain:OsiSemanticModel). |
OsiTable (subtype of Table) | Dataset with OSI-specific key metadata: source, primary_key, unique_keys. Stored as (:Table:OsiTable). |
OsiColumn (subtype of Column) | Column with OSI display metadata: label, is_time_dimension. Stored as (:Column:OsiColumn). |
Metric | A measurable quantity with name, description, and embedding. |
Expression | A dialect-specific computation: dialect (e.g., bigquery) and expression text. |
Join | A join between two tables with ordered from_columns and to_columns lists for composite-key support. |
OsiAiContext (subtype of Aspect) | Agent-facing context stored as a JSON-encoded data string (instructions, synonyms, examples). |
Key relationships beyond the structural core:
| Relationship | Meaning |
|---|
(:Domain)-[:HAS_TABLE]->(:OsiTable) | Semantic model owns a dataset directly |
(:Domain)-[:HAS_METRIC]->(:Metric) | Semantic model defines a metric |
(:Metric)-[:HAS_EXPRESSION]->(:Expression) | Metric dialect-specific expression |
(:Join)-[:HAS_SOURCE_TABLE]->(:Table) | Foreign-key (from) side of the join |
(:Join)-[:HAS_TARGET_TABLE]->(:Table) | Primary/unique-key (to) side of the join |
(:Column)-[:USED_IN_JOIN]->(:Join) | Column participates in a join |
(:*)-[:HAS_ASPECT]->(:Aspect) | Any entity carries an AI context or custom extension aspect |
NodeLabel and RelationshipType Enums
All node labels and relationship types are defined as canonical string enums in neocarta.enums. Using the enums ensures your code stays in sync with the schema and works with connector filtering.
from neocarta import NodeLabel, RelationshipType
# Node labels
NodeLabel.DATABASE # "Database"
NodeLabel.SCHEMA # "Schema"
NodeLabel.TABLE # "Table"
NodeLabel.COLUMN # "Column"
NodeLabel.VALUE # "Value"
NodeLabel.GLOSSARY # "Glossary"
NodeLabel.CATEGORY # "Category"
NodeLabel.BUSINESS_TERM # "BusinessTerm"
NodeLabel.GOVERNANCE_TAG_KEY # "GovernanceTagKey"
NodeLabel.GOVERNANCE_TAG_VALUE # "GovernanceTagValue"
NodeLabel.GOVERNANCE_TAG # "GovernanceTag"
NodeLabel.QUERY # "Query"
NodeLabel.CTE # "CTE"
NodeLabel.METRIC # "Metric"
NodeLabel.JOIN # "Join"
NodeLabel.EXPRESSION # "Expression"
NodeLabel.OSI_SEMANTIC_MODEL # "OsiSemanticModel"
NodeLabel.OSI_TABLE # "OsiTable"
NodeLabel.OSI_COLUMN # "OsiColumn"
# Relationship types
RelationshipType.HAS_SCHEMA # "HAS_SCHEMA"
RelationshipType.HAS_TABLE # "HAS_TABLE"
RelationshipType.HAS_COLUMN # "HAS_COLUMN"
RelationshipType.HAS_VALUE # "HAS_VALUE"
RelationshipType.REFERENCES # "REFERENCES"
RelationshipType.TAGGED_WITH # "TAGGED_WITH"
RelationshipType.HAS_CATEGORY # "HAS_CATEGORY"
RelationshipType.HAS_BUSINESS_TERM # "HAS_BUSINESS_TERM"
RelationshipType.HAS_VALUE_OPTION # "HAS_VALUE_OPTION"
RelationshipType.HAS_DEFINITION # "HAS_DEFINITION"
RelationshipType.USES_TABLE # "USES_TABLE"
RelationshipType.USES_COLUMN # "USES_COLUMN"
RelationshipType.DEFINES # "DEFINES"
RelationshipType.HAS_METRIC # "HAS_METRIC"
RelationshipType.HAS_EXPRESSION # "HAS_EXPRESSION"
RelationshipType.HAS_ASPECT # "HAS_ASPECT"
RelationshipType.USED_IN_JOIN # "USED_IN_JOIN"
RelationshipType.HAS_SOURCE_TABLE # "HAS_SOURCE_TABLE"
RelationshipType.HAS_TARGET_TABLE # "HAS_TARGET_TABLE"
Both NodeLabel and RelationshipType subclass str, so they can be used anywhere a plain string is expected and will format correctly in f-strings and Cypher queries.
Neo4j Indexes
Neocarta creates two classes of indexes during the load phase:
Full-Text Indexes
One full-text index is created per searchable label, over the name and description properties. These power the full-text search tools in the MCP server and are available without embeddings.
| Index name | Label |
|---|
database_full_text_index | Database |
schema_full_text_index | Schema |
table_full_text_index | Table |
column_full_text_index | Column |
business_term_full_text_index | BusinessTerm |
Vector Indexes
One vector index is created per label when embeddings are present. These power semantic and hybrid search tools.
| Index name | Label |
|---|
database_vector_index | Database |
schema_vector_index | Schema |
table_vector_index | Table |
column_vector_index | Column |
business_term_vector_index | BusinessTerm |
The MCP server probes for these indexes at startup and registers only the tools whose indexes are present. Running neocarta bigquery schema --embeddings creates both classes of index in a single step.