The Databricks connector ingests governed-tag definitions from managed Databricks Unity Catalog into the Neo4j semantic graph. Governed tags are account-level controlled vocabularies — a tag key (e.g.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt
Use this file to discover all available pages before exploring further.
sensitivity) with an optional description and a list of allowed values (e.g. public, internal, pii). The connector reads tag definitions via the Databricks SDK (WorkspaceClient.tag_policies) — no SQL warehouse and no cluster are required.
Governance tags are modelled in their own right rather than as a business glossary. A tag’s values are controls and classifications (sensitivity = {pii, non_pii}), not business terms. This connector populates a vendor-neutral governance-tag layer shared across platforms (Snowflake object tags and GCP resource Tags have the same two-layer shape).
This connector is distinct from the Unity Catalog connector, which speaks the vendor-neutral open Unity Catalog REST API. Governed tags are a managed-Databricks feature and live in this separate
databricks package.What It Ingests
| Graph node / edge | Description |
|---|---|
GovernanceTagKey | One node per governed tag key, carrying name and optional description |
GovernanceTagValue | One node per allowed value, carrying name |
(:GovernanceTagKey)-[:HAS_VALUE_OPTION]->(:GovernanceTagValue) | Connects each key to its allowed values |
information_schema.*_tags and require a SQL warehouse to read. That is the instance layer of the governance model and a planned follow-up — no TAGGED_WITH edges are produced by this connector.
System Prefix Filtering
Platform-managed and partner-managed tags whose keys match one of the default system prefixes are excluded by default. This prevents auto-applied platform tags from swamping your user-authored governance vocabulary. Default excluded prefixes:system., class., ai., sap.
Pass include_system_tags=True to ingest() to include everything, or supply a custom system_prefixes tuple to the constructor to widen or narrow the filter.
Import
Parameters
An authenticated Databricks workspace client. The SDK natively honors
DATABRICKS_HOST / DATABRICKS_TOKEN (and other unified-auth variables) when built with no arguments.Connected Neo4j driver instance.
Target Neo4j database name.
Explicit namespace for governance-tag node IDs. When
None, derived from the workspace’s metastore ID (falling back to the workspace host). Useful when the workspace has no readable metastore assignment.Tag-key prefixes treated as platform/system tags and excluded unless
include_system_tags=True. When None, uses the default set ("system.", "class.", "ai.", "sap."). Pass an empty tuple () to disable prefix filtering entirely.ingest() Parameters
When
True, ingest all governed tags including those matching the system_prefixes. This overrides prefix filtering completely.Code Example
CLI
Required Environment Variables
| Variable | Example | Purpose |
|---|---|---|
NEO4J_URI | bolt://localhost:7687 | Neo4j connection URI |
NEO4J_USERNAME | neo4j | Neo4j username |
NEO4J_PASSWORD | your-password | Neo4j password |
NEO4J_DATABASE | neo4j | Target Neo4j database |
DATABRICKS_HOST | https://dbc-xxxx.cloud.databricks.com | Workspace URL |
DATABRICKS_TOKEN | dapi... | Personal access token |
Limitations
- Definitions only — no
TAGGED_WITHedges yet. Tag assignments (which objects carry which tags) are sourced frominformation_schema.*_tagsand require a SQL warehouse. That is a planned follow-up. - No per-value descriptions. Databricks allowed values carry no description, so
GovernanceTagValue.descriptionis not written. - Value-less tags. A governed tag with no allowed values produces a
GovernanceTagKeywith noGovernanceTagValueoptions. - Account-scoped IDs. Node IDs are namespaced by the metastore ID. The same tag ingested from different workspaces may produce separate nodes if their metastore IDs differ.