Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt

Use this file to discover all available pages before exploring further.

Every Neocarta connector transforms its source metadata into a shared graph schema stored in Neo4j. This shared schema is what makes connectors interoperable — a glossary loaded from Dataplex attaches to tables ingested from BigQuery because both use the same TAGGED_WITH relationship and the same Table node label. Every connector must conform to this schema to be compatible with the MCP server and the Neocarta CLI tooling. The data model is organized into layers of concern. The structural RDBMS layer forms the foundation; the glossary, governance, query, and OSI layers extend it.

Core Schema (RDBMS)

The structural core models the hierarchy of a relational database from platform down to individual column values.

Node Properties

PropertyTypeNotes
idSTRINGUnique key; generated via generate_id
nameSTRINGDatabase or project name
descriptionSTRING | nullHuman-readable description; backed by full-text and vector indexes when populated
embeddingVECTOR | nullEmbedding of description; present only after running an embeddings connector
platformSTRING | nullPlatform name, uppercased (e.g., GCP)
serviceSTRING | nullService name, uppercased (e.g., BIGQUERY)

Core Relationships

PatternMeaning
(:Database)-[:HAS_SCHEMA]->(:Schema)Database-to-schema hierarchy
(:Schema)-[:HAS_TABLE]->(:Table)Schema-to-table hierarchy
(:Table)-[:HAS_COLUMN]->(:Column)Table-to-column membership
(:Column)-[:HAS_VALUE]->(:Value)Sample values for column disambiguation
(:Column)-[:REFERENCES]->(:Column)Foreign key — the target column is the referenced primary/unique key. Carries an optional criteria property holding the join condition.

Glossary / Business Terms

The glossary layer links business vocabulary to the structural nodes it describes. It is populated by the Dataplex glossary connector, the OSI connector (via synonyms), and the CSV connector.

Glossary Node Properties

NodeKey Properties
Glossaryid, name, description, resource_path (e.g., Dataplex resource name)
Categoryid, name, description, resource_path
BusinessTermid, name, description, embedding, resource_path
BusinessTerm nodes are MERGEd on name, so terms from different sources (Dataplex catalog, OSI synonyms, CSV) collide cleanly into a single node.

Query History

Query history records real SQL queries and the schema objects they touched. It is populated by the BigQuery logs connector, the query-log file connector, and the OSI connector (for SQL-sourced datasets).
NodeKey Properties
Queryid, content (SQL text), name (logical name, if available), description, embedding
CTEid, name (alias), definition (inner SELECT body), query_id
RelationshipMeaning
(:Query)-[:USES_TABLE]->(:Table)Table referenced in the query
(:Query)-[:USES_COLUMN]->(:Column)Column referenced in the query
(:Query)-[:DEFINES]->(:CTE)CTE defined inline in the query
CTE nodes represent query-scoped virtual tables — they are distinct from catalog Table nodes so that agents can distinguish real tables from inline sub-queries.

Governance Tags

The governance layer attaches controlled-vocabulary labels to schema objects for policy, classification, and ownership. It is vendor-neutral and models patterns from Databricks Unity Catalog governed tags, Snowflake object tags, and GCP resource Tags. The model has two layers:
  • Definition layerGovernanceTagKey and GovernanceTagValue describe which tags exist, what they mean, and (for governed platforms) which values are allowed. This surface is full-text and vector searchable.
  • Instance layerGovernanceTag represents a single applied (key, value) assignment on a schema object. When the value is governed, it links to its GovernanceTagValue definition via HAS_DEFINITION; a missing link means the value is free-form.
NodeKey Properties
GovernanceTagKeyid, name (e.g., sensitivity), description, embedding
GovernanceTagValueid, name (e.g., pii), description (optional)
GovernanceTagid, key, value (denormalized for single-hop lookups)
RelationshipMeaning
(:GovernanceTagKey)-[:HAS_VALUE_OPTION]->(:GovernanceTagValue)Allowed values for a governed tag key
(:Column)-[:TAGGED_WITH]->(:GovernanceTag)Column carries this tag assignment
(:Table)-[:TAGGED_WITH]->(:GovernanceTag)Table carries this tag assignment
(:Schema)-[:TAGGED_WITH]->(:GovernanceTag)Schema carries this tag assignment
(:GovernanceTag)-[:HAS_DEFINITION]->(:GovernanceTagValue)Applied value matches a governed definition (optional)
Both the glossary and governance layers use the TAGGED_WITH relationship type, but they target different node labels: BusinessTerm for the glossary layer and GovernanceTag for the governance layer.

OSI Semantic Model

The OSI layer models the Open Semantic Interchange format — a YAML-based interchange for semantic models. It extends the structural core with domain containers, metric definitions, join specifications, and AI-context aspects. Key nodes:
NodeDescription
OsiSemanticModel (subtype of Domain)Top-level container for a full OSI spec instance. Stored as (:Domain:OsiSemanticModel).
OsiTable (subtype of Table)Dataset with OSI-specific key metadata: source, primary_key, unique_keys. Stored as (:Table:OsiTable).
OsiColumn (subtype of Column)Column with OSI display metadata: label, is_time_dimension. Stored as (:Column:OsiColumn).
MetricA measurable quantity with name, description, and embedding.
ExpressionA dialect-specific computation: dialect (e.g., bigquery) and expression text.
JoinA join between two tables with ordered from_columns and to_columns lists for composite-key support.
OsiAiContext (subtype of Aspect)Agent-facing context stored as a JSON-encoded data string (instructions, synonyms, examples).
Key relationships beyond the structural core:
RelationshipMeaning
(:Domain)-[:HAS_TABLE]->(:OsiTable)Semantic model owns a dataset directly
(:Domain)-[:HAS_METRIC]->(:Metric)Semantic model defines a metric
(:Metric)-[:HAS_EXPRESSION]->(:Expression)Metric dialect-specific expression
(:Join)-[:HAS_SOURCE_TABLE]->(:Table)Foreign-key (from) side of the join
(:Join)-[:HAS_TARGET_TABLE]->(:Table)Primary/unique-key (to) side of the join
(:Column)-[:USED_IN_JOIN]->(:Join)Column participates in a join
(:*)-[:HAS_ASPECT]->(:Aspect)Any entity carries an AI context or custom extension aspect

NodeLabel and RelationshipType Enums

All node labels and relationship types are defined as canonical string enums in neocarta.enums. Using the enums ensures your code stays in sync with the schema and works with connector filtering.
from neocarta import NodeLabel, RelationshipType

# Node labels
NodeLabel.DATABASE        # "Database"
NodeLabel.SCHEMA          # "Schema"
NodeLabel.TABLE           # "Table"
NodeLabel.COLUMN          # "Column"
NodeLabel.VALUE           # "Value"
NodeLabel.GLOSSARY        # "Glossary"
NodeLabel.CATEGORY        # "Category"
NodeLabel.BUSINESS_TERM   # "BusinessTerm"
NodeLabel.GOVERNANCE_TAG_KEY    # "GovernanceTagKey"
NodeLabel.GOVERNANCE_TAG_VALUE  # "GovernanceTagValue"
NodeLabel.GOVERNANCE_TAG        # "GovernanceTag"
NodeLabel.QUERY           # "Query"
NodeLabel.CTE             # "CTE"
NodeLabel.METRIC          # "Metric"
NodeLabel.JOIN            # "Join"
NodeLabel.EXPRESSION      # "Expression"
NodeLabel.OSI_SEMANTIC_MODEL    # "OsiSemanticModel"
NodeLabel.OSI_TABLE       # "OsiTable"
NodeLabel.OSI_COLUMN      # "OsiColumn"

# Relationship types
RelationshipType.HAS_SCHEMA         # "HAS_SCHEMA"
RelationshipType.HAS_TABLE          # "HAS_TABLE"
RelationshipType.HAS_COLUMN         # "HAS_COLUMN"
RelationshipType.HAS_VALUE          # "HAS_VALUE"
RelationshipType.REFERENCES         # "REFERENCES"
RelationshipType.TAGGED_WITH        # "TAGGED_WITH"
RelationshipType.HAS_CATEGORY       # "HAS_CATEGORY"
RelationshipType.HAS_BUSINESS_TERM  # "HAS_BUSINESS_TERM"
RelationshipType.HAS_VALUE_OPTION   # "HAS_VALUE_OPTION"
RelationshipType.HAS_DEFINITION     # "HAS_DEFINITION"
RelationshipType.USES_TABLE         # "USES_TABLE"
RelationshipType.USES_COLUMN        # "USES_COLUMN"
RelationshipType.DEFINES            # "DEFINES"
RelationshipType.HAS_METRIC         # "HAS_METRIC"
RelationshipType.HAS_EXPRESSION     # "HAS_EXPRESSION"
RelationshipType.HAS_ASPECT         # "HAS_ASPECT"
RelationshipType.USED_IN_JOIN       # "USED_IN_JOIN"
RelationshipType.HAS_SOURCE_TABLE   # "HAS_SOURCE_TABLE"
RelationshipType.HAS_TARGET_TABLE   # "HAS_TARGET_TABLE"
Both NodeLabel and RelationshipType subclass str, so they can be used anywhere a plain string is expected and will format correctly in f-strings and Cypher queries.

Neo4j Indexes

Neocarta creates two classes of indexes during the load phase:

Full-Text Indexes

One full-text index is created per searchable label, over the name and description properties. These power the full-text search tools in the MCP server and are available without embeddings.
Index nameLabel
database_full_text_indexDatabase
schema_full_text_indexSchema
table_full_text_indexTable
column_full_text_indexColumn
business_term_full_text_indexBusinessTerm

Vector Indexes

One vector index is created per label when embeddings are present. These power semantic and hybrid search tools.
Index nameLabel
database_vector_indexDatabase
schema_vector_indexSchema
table_vector_indexTable
column_vector_indexColumn
business_term_vector_indexBusinessTerm
The MCP server probes for these indexes at startup and registers only the tools whose indexes are present. Running neocarta bigquery schema --embeddings creates both classes of index in a single step.

Build docs developers (and LLMs) love