Neocarta Graph Data Model Reference

Every Neocarta connector transforms its source metadata into a shared graph schema stored in Neo4j. This shared schema is what makes connectors interoperable — a glossary loaded from Dataplex attaches to tables ingested from BigQuery because both use the same TAGGED_WITH relationship and the same Table node label. Every connector must conform to this schema to be compatible with the MCP server and the Neocarta CLI tooling. The data model is organized into layers of concern. The structural RDBMS layer forms the foundation; the glossary, governance, query, and OSI layers extend it.

Core Schema (RDBMS)

The structural core models the hierarchy of a relational database from platform down to individual column values.

Node Properties

Database
Schema
Table
Column
Value

Property	Type	Notes
`id`	`STRING`	Unique key; generated via `generate_id`
`name`	`STRING`	Database or project name
`description`	`STRING \| null`	Human-readable description; backed by full-text and vector indexes when populated
`embedding`	`VECTOR \| null`	Embedding of `description`; present only after running an embeddings connector
`platform`	`STRING \| null`	Platform name, uppercased (e.g., `GCP`)
`service`	`STRING \| null`	Service name, uppercased (e.g., `BIGQUERY`)

Property	Type	Notes
`id`	`STRING`	Unique key
`name`	`STRING`	Schema or dataset name
`description`	`STRING \| null`	Human-readable description
`embedding`	`VECTOR \| null`	Embedding of `description`

Property	Type	Notes
`id`	`STRING`	Unique key
`name`	`STRING`	Table name
`description`	`STRING \| null`	Human-readable description
`embedding`	`VECTOR \| null`	Embedding of `description`

Property	Type	Notes
`id`	`STRING`	Unique key
`name`	`STRING`	Column name
`description`	`STRING \| null`	Human-readable description
`embedding`	`VECTOR \| null`	Embedding of `description`
`type`	`STRING \| null`	Data type; may be absent for query-log-derived columns
`nullable`	`BOOLEAN`	Defaults to `true`
`is_primary_key`	`BOOLEAN \| null`	`null` when source exposes no key metadata
`is_foreign_key`	`BOOLEAN \| null`	`null` when source exposes no key metadata

Property	Type	Notes
`id`	`STRING`	Unique key
`value`	`STRING`	A sample distinct value from the column

Core Relationships

Pattern	Meaning
`(:Database)-[:HAS_SCHEMA]->(:Schema)`	Database-to-schema hierarchy
`(:Schema)-[:HAS_TABLE]->(:Table)`	Schema-to-table hierarchy
`(:Table)-[:HAS_COLUMN]->(:Column)`	Table-to-column membership
`(:Column)-[:HAS_VALUE]->(:Value)`	Sample values for column disambiguation
`(:Column)-[:REFERENCES]->(:Column)`	Foreign key — the target column is the referenced primary/unique key. Carries an optional `criteria` property holding the join condition.

Glossary / Business Terms

The glossary layer links business vocabulary to the structural nodes it describes. It is populated by the Dataplex glossary connector, the OSI connector (via synonyms), and the CSV connector.

Glossary Node Properties

Node	Key Properties
`Glossary`	`id`, `name`, `description`, `resource_path` (e.g., Dataplex resource name)
`Category`	`id`, `name`, `description`, `resource_path`
`BusinessTerm`	`id`, `name`, `description`, `embedding`, `resource_path`

BusinessTerm nodes are MERGEd on name, so terms from different sources (Dataplex catalog, OSI synonyms, CSV) collide cleanly into a single node.

Query History

Query history records real SQL queries and the schema objects they touched. It is populated by the BigQuery logs connector, the query-log file connector, and the OSI connector (for SQL-sourced datasets).

Node	Key Properties
`Query`	`id`, `content` (SQL text), `name` (logical name, if available), `description`, `embedding`
`CTE`	`id`, `name` (alias), `definition` (inner SELECT body), `query_id`

Relationship	Meaning
`(:Query)-[:USES_TABLE]->(:Table)`	Table referenced in the query
`(:Query)-[:USES_COLUMN]->(:Column)`	Column referenced in the query
`(:Query)-[:DEFINES]->(:CTE)`	CTE defined inline in the query

CTE nodes represent query-scoped virtual tables — they are distinct from catalog Table nodes so that agents can distinguish real tables from inline sub-queries.

Governance Tags

The governance layer attaches controlled-vocabulary labels to schema objects for policy, classification, and ownership. It is vendor-neutral and models patterns from Databricks Unity Catalog governed tags, Snowflake object tags, and GCP resource Tags. The model has two layers:

Definition layer — GovernanceTagKey and GovernanceTagValue describe which tags exist, what they mean, and (for governed platforms) which values are allowed. This surface is full-text and vector searchable.
Instance layer — GovernanceTag represents a single applied (key, value) assignment on a schema object. When the value is governed, it links to its GovernanceTagValue definition via HAS_DEFINITION; a missing link means the value is free-form.

Node	Key Properties
`GovernanceTagKey`	`id`, `name` (e.g., `sensitivity`), `description`, `embedding`
`GovernanceTagValue`	`id`, `name` (e.g., `pii`), `description` (optional)
`GovernanceTag`	`id`, `key`, `value` (denormalized for single-hop lookups)

Relationship	Meaning
`(:GovernanceTagKey)-[:HAS_VALUE_OPTION]->(:GovernanceTagValue)`	Allowed values for a governed tag key
`(:Column)-[:TAGGED_WITH]->(:GovernanceTag)`	Column carries this tag assignment
`(:Table)-[:TAGGED_WITH]->(:GovernanceTag)`	Table carries this tag assignment
`(:Schema)-[:TAGGED_WITH]->(:GovernanceTag)`	Schema carries this tag assignment
`(:GovernanceTag)-[:HAS_DEFINITION]->(:GovernanceTagValue)`	Applied value matches a governed definition (optional)

Both the glossary and governance layers use the TAGGED_WITH relationship type, but they target different node labels: BusinessTerm for the glossary layer and GovernanceTag for the governance layer.

OSI Semantic Model

The OSI layer models the Open Semantic Interchange format — a YAML-based interchange for semantic models. It extends the structural core with domain containers, metric definitions, join specifications, and AI-context aspects. Key nodes:

Node	Description
`OsiSemanticModel` (subtype of `Domain`)	Top-level container for a full OSI spec instance. Stored as `(:Domain:OsiSemanticModel)`.
`OsiTable` (subtype of `Table`)	Dataset with OSI-specific key metadata: `source`, `primary_key`, `unique_keys`. Stored as `(:Table:OsiTable)`.
`OsiColumn` (subtype of `Column`)	Column with OSI display metadata: `label`, `is_time_dimension`. Stored as `(:Column:OsiColumn)`.
`Metric`	A measurable quantity with `name`, `description`, and `embedding`.
`Expression`	A dialect-specific computation: `dialect` (e.g., `bigquery`) and `expression` text.
`Join`	A join between two tables with ordered `from_columns` and `to_columns` lists for composite-key support.
`OsiAiContext` (subtype of `Aspect`)	Agent-facing context stored as a JSON-encoded `data` string (instructions, synonyms, examples).

Key relationships beyond the structural core:

Relationship	Meaning
`(:Domain)-[:HAS_TABLE]->(:OsiTable)`	Semantic model owns a dataset directly
`(:Domain)-[:HAS_METRIC]->(:Metric)`	Semantic model defines a metric
`(:Metric)-[:HAS_EXPRESSION]->(:Expression)`	Metric dialect-specific expression
`(:Join)-[:HAS_SOURCE_TABLE]->(:Table)`	Foreign-key (from) side of the join
`(:Join)-[:HAS_TARGET_TABLE]->(:Table)`	Primary/unique-key (to) side of the join
`(:Column)-[:USED_IN_JOIN]->(:Join)`	Column participates in a join
`(:*)-[:HAS_ASPECT]->(:Aspect)`	Any entity carries an AI context or custom extension aspect

NodeLabel and RelationshipType Enums

All node labels and relationship types are defined as canonical string enums in neocarta.enums. Using the enums ensures your code stays in sync with the schema and works with connector filtering.

from neocarta import NodeLabel, RelationshipType

# Node labels
NodeLabel.DATABASE        # "Database"
NodeLabel.SCHEMA          # "Schema"
NodeLabel.TABLE           # "Table"
NodeLabel.COLUMN          # "Column"
NodeLabel.VALUE           # "Value"
NodeLabel.GLOSSARY        # "Glossary"
NodeLabel.CATEGORY        # "Category"
NodeLabel.BUSINESS_TERM   # "BusinessTerm"
NodeLabel.GOVERNANCE_TAG_KEY    # "GovernanceTagKey"
NodeLabel.GOVERNANCE_TAG_VALUE  # "GovernanceTagValue"
NodeLabel.GOVERNANCE_TAG        # "GovernanceTag"
NodeLabel.QUERY           # "Query"
NodeLabel.CTE             # "CTE"
NodeLabel.METRIC          # "Metric"
NodeLabel.JOIN            # "Join"
NodeLabel.EXPRESSION      # "Expression"
NodeLabel.OSI_SEMANTIC_MODEL    # "OsiSemanticModel"
NodeLabel.OSI_TABLE       # "OsiTable"
NodeLabel.OSI_COLUMN      # "OsiColumn"

# Relationship types
RelationshipType.HAS_SCHEMA         # "HAS_SCHEMA"
RelationshipType.HAS_TABLE          # "HAS_TABLE"
RelationshipType.HAS_COLUMN         # "HAS_COLUMN"
RelationshipType.HAS_VALUE          # "HAS_VALUE"
RelationshipType.REFERENCES         # "REFERENCES"
RelationshipType.TAGGED_WITH        # "TAGGED_WITH"
RelationshipType.HAS_CATEGORY       # "HAS_CATEGORY"
RelationshipType.HAS_BUSINESS_TERM  # "HAS_BUSINESS_TERM"
RelationshipType.HAS_VALUE_OPTION   # "HAS_VALUE_OPTION"
RelationshipType.HAS_DEFINITION     # "HAS_DEFINITION"
RelationshipType.USES_TABLE         # "USES_TABLE"
RelationshipType.USES_COLUMN        # "USES_COLUMN"
RelationshipType.DEFINES            # "DEFINES"
RelationshipType.HAS_METRIC         # "HAS_METRIC"
RelationshipType.HAS_EXPRESSION     # "HAS_EXPRESSION"
RelationshipType.HAS_ASPECT         # "HAS_ASPECT"
RelationshipType.USED_IN_JOIN       # "USED_IN_JOIN"
RelationshipType.HAS_SOURCE_TABLE   # "HAS_SOURCE_TABLE"
RelationshipType.HAS_TARGET_TABLE   # "HAS_TARGET_TABLE"

Both NodeLabel and RelationshipType subclass str, so they can be used anywhere a plain string is expected and will format correctly in f-strings and Cypher queries.

Neo4j Indexes

Neocarta creates two classes of indexes during the load phase:

Full-Text Indexes

One full-text index is created per searchable label, over the name and description properties. These power the full-text search tools in the MCP server and are available without embeddings.

Index name	Label
`database_full_text_index`	`Database`
`schema_full_text_index`	`Schema`
`table_full_text_index`	`Table`
`column_full_text_index`	`Column`
`business_term_full_text_index`	`BusinessTerm`

Vector Indexes

One vector index is created per label when embeddings are present. These power semantic and hybrid search tools.

Index name	Label
`database_vector_index`	`Database`
`schema_vector_index`	`Schema`
`table_vector_index`	`Table`
`column_vector_index`	`Column`
`business_term_vector_index`	`BusinessTerm`

The MCP server probes for these indexes at startup and registers only the tools whose indexes are present. Running neocarta bigquery schema --embeddings creates both classes of index in a single step.

Architecture

Guides

Neocarta Graph Data Model Reference

Core Schema (RDBMS)

Node Properties

Core Relationships

Glossary / Business Terms

Glossary Node Properties

Query History

Governance Tags

OSI Semantic Model

NodeLabel and RelationshipType Enums

Neo4j Indexes

Full-Text Indexes

Vector Indexes

Build docs developers (and LLMs) love

Architecture

Guides

Documentation Index

​Core Schema (RDBMS)

​Node Properties

​Core Relationships

​Glossary / Business Terms

​Glossary Node Properties

​Query History

​Governance Tags

​OSI Semantic Model

​NodeLabel and RelationshipType Enums

​Neo4j Indexes

​Full-Text Indexes

​Vector Indexes

Build docs developers (and LLMs) love

Core Schema (RDBMS)

Node Properties

Core Relationships

Glossary / Business Terms

Glossary Node Properties

Query History

Governance Tags

OSI Semantic Model

NodeLabel and RelationshipType Enums

Neo4j Indexes

Full-Text Indexes

Vector Indexes