Databricks Tags Connector for Governance Metadata

The Databricks connector ingests governed-tag definitions from managed Databricks Unity Catalog into the Neo4j semantic graph. Governed tags are account-level controlled vocabularies — a tag key (e.g. sensitivity) with an optional description and a list of allowed values (e.g. public, internal, pii). The connector reads tag definitions via the Databricks SDK (WorkspaceClient.tag_policies) — no SQL warehouse and no cluster are required. Governance tags are modelled in their own right rather than as a business glossary. A tag’s values are controls and classifications (sensitivity = {pii, non_pii}), not business terms. This connector populates a vendor-neutral governance-tag layer shared across platforms (Snowflake object tags and GCP resource Tags have the same two-layer shape).

This connector requires the optional databricks extra. Install it before use:

pip install "neocarta[databricks]"

This connector is distinct from the Unity Catalog connector, which speaks the vendor-neutral open Unity Catalog REST API. Governed tags are a managed-Databricks feature and live in this separate databricks package.

What It Ingests

Graph node / edge	Description
`GovernanceTagKey`	One node per governed tag key, carrying `name` and optional `description`
`GovernanceTagValue`	One node per allowed value, carrying `name`
`(:GovernanceTagKey)-[:HAS_VALUE_OPTION]->(:GovernanceTagValue)`	Connects each key to its allowed values

Tag assignments (which columns/tables carry which tags) live in information_schema.*_tags and require a SQL warehouse to read. That is the instance layer of the governance model and a planned follow-up — no TAGGED_WITH edges are produced by this connector.

System Prefix Filtering

Platform-managed and partner-managed tags whose keys match one of the default system prefixes are excluded by default. This prevents auto-applied platform tags from swamping your user-authored governance vocabulary. Default excluded prefixes: system., class., ai., sap. Pass include_system_tags=True to ingest() to include everything, or supply a custom system_prefixes tuple to the constructor to widen or narrow the filter.

Import

from neocarta.connectors.databricks import DatabricksTagsConnector

Parameters

workspace_client

databricks.sdk.WorkspaceClient

required

An authenticated Databricks workspace client. The SDK natively honors DATABRICKS_HOST / DATABRICKS_TOKEN (and other unified-auth variables) when built with no arguments.

neo4j_driver

neo4j.Driver

required

Connected Neo4j driver instance.

database_name

str

default:"neo4j"

Target Neo4j database name.

source

str

Explicit namespace for governance-tag node IDs. When None, derived from the workspace’s metastore ID (falling back to the workspace host). Useful when the workspace has no readable metastore assignment.

system_prefixes

tuple[str, ...]

Tag-key prefixes treated as platform/system tags and excluded unless include_system_tags=True. When None, uses the default set ("system.", "class.", "ai.", "sap."). Pass an empty tuple () to disable prefix filtering entirely.

`ingest()` Parameters

include_system_tags

bool

default:"false"

When True, ingest all governed tags including those matching the system_prefixes. This overrides prefix filtering completely.

Code Example

import os
from databricks.sdk import WorkspaceClient
from dotenv import load_dotenv
from neo4j import GraphDatabase
from neocarta.connectors.databricks import DatabricksTagsConnector

load_dotenv()

neo4j_driver = GraphDatabase.driver(
    uri=os.getenv("NEO4J_URI"),
    auth=(os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD")),
)
neo4j_database = os.getenv("NEO4J_DATABASE", "neo4j")

# The SDK honors DATABRICKS_HOST / DATABRICKS_TOKEN when built with no args.
workspace_client = WorkspaceClient(
    host=os.getenv("DATABRICKS_HOST"),
    token=os.getenv("DATABRICKS_TOKEN"),
)

DatabricksTagsConnector(
    workspace_client=workspace_client,
    neo4j_driver=neo4j_driver,
    database_name=neo4j_database,
).ingest()  # pass include_system_tags=True to also pull system./class./ai./sap. tags

neo4j_driver.close()
print("Connector completed successfully!")

CLI

pip install "neocarta[cli]"

neocarta databricks tags \
  --workspace-host "https://dbc-xxxx.cloud.databricks.com" \
  --workspace-token "dapi..."

# Include system-managed tags:
neocarta databricks tags \
  --workspace-host "https://dbc-xxxx.cloud.databricks.com" \
  --workspace-token "dapi..." \
  --include-system-tags

Required Environment Variables

Variable	Example	Purpose
`NEO4J_URI`	`bolt://localhost:7687`	Neo4j connection URI
`NEO4J_USERNAME`	`neo4j`	Neo4j username
`NEO4J_PASSWORD`	`your-password`	Neo4j password
`NEO4J_DATABASE`	`neo4j`	Target Neo4j database
`DATABRICKS_HOST`	`https://dbc-xxxx.cloud.databricks.com`	Workspace URL
`DATABRICKS_TOKEN`	`dapi...`	Personal access token

Limitations

Definitions only — no TAGGED_WITH edges yet. Tag assignments (which objects carry which tags) are sourced from information_schema.*_tags and require a SQL warehouse. That is a planned follow-up.
No per-value descriptions. Databricks allowed values carry no description, so GovernanceTagValue.description is not written.
Value-less tags. A governed tag with no allowed values produces a GovernanceTagKey with no GovernanceTagValue options.
Account-scoped IDs. Node IDs are namespaced by the metastore ID. The same tag ingested from different workspaces may produce separate nodes if their metastore IDs differ.

Get Started

Connectors

Enrichment

MCP Server

CLI Reference

Databricks Tags Connector for Governance Metadata

What It Ingests

System Prefix Filtering

Import

Parameters

`ingest()` Parameters

Code Example

CLI

Required Environment Variables

Limitations

Build docs developers (and LLMs) love

Get Started

Connectors

Enrichment

MCP Server

CLI Reference

Documentation Index

​What It Ingests

​System Prefix Filtering

​Import

​Parameters

​ingest() Parameters

​Code Example

​CLI

​Required Environment Variables

​Limitations

Build docs developers (and LLMs) love

What It Ingests

System Prefix Filtering

Import

Parameters

`ingest()` Parameters

Code Example

CLI

Required Environment Variables

Limitations