Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt

Use this file to discover all available pages before exploring further.

The Databricks connector ingests governed-tag definitions from managed Databricks Unity Catalog into the Neo4j semantic graph. Governed tags are account-level controlled vocabularies — a tag key (e.g. sensitivity) with an optional description and a list of allowed values (e.g. public, internal, pii). The connector reads tag definitions via the Databricks SDK (WorkspaceClient.tag_policies) — no SQL warehouse and no cluster are required. Governance tags are modelled in their own right rather than as a business glossary. A tag’s values are controls and classifications (sensitivity = {pii, non_pii}), not business terms. This connector populates a vendor-neutral governance-tag layer shared across platforms (Snowflake object tags and GCP resource Tags have the same two-layer shape).
This connector requires the optional databricks extra. Install it before use:
pip install "neocarta[databricks]"
This connector is distinct from the Unity Catalog connector, which speaks the vendor-neutral open Unity Catalog REST API. Governed tags are a managed-Databricks feature and live in this separate databricks package.

What It Ingests

Graph node / edgeDescription
GovernanceTagKeyOne node per governed tag key, carrying name and optional description
GovernanceTagValueOne node per allowed value, carrying name
(:GovernanceTagKey)-[:HAS_VALUE_OPTION]->(:GovernanceTagValue)Connects each key to its allowed values
Tag assignments (which columns/tables carry which tags) live in information_schema.*_tags and require a SQL warehouse to read. That is the instance layer of the governance model and a planned follow-up — no TAGGED_WITH edges are produced by this connector.

System Prefix Filtering

Platform-managed and partner-managed tags whose keys match one of the default system prefixes are excluded by default. This prevents auto-applied platform tags from swamping your user-authored governance vocabulary. Default excluded prefixes: system., class., ai., sap. Pass include_system_tags=True to ingest() to include everything, or supply a custom system_prefixes tuple to the constructor to widen or narrow the filter.

Import

from neocarta.connectors.databricks import DatabricksTagsConnector

Parameters

workspace_client
databricks.sdk.WorkspaceClient
required
An authenticated Databricks workspace client. The SDK natively honors DATABRICKS_HOST / DATABRICKS_TOKEN (and other unified-auth variables) when built with no arguments.
neo4j_driver
neo4j.Driver
required
Connected Neo4j driver instance.
database_name
str
default:"neo4j"
Target Neo4j database name.
source
str
Explicit namespace for governance-tag node IDs. When None, derived from the workspace’s metastore ID (falling back to the workspace host). Useful when the workspace has no readable metastore assignment.
system_prefixes
tuple[str, ...]
Tag-key prefixes treated as platform/system tags and excluded unless include_system_tags=True. When None, uses the default set ("system.", "class.", "ai.", "sap."). Pass an empty tuple () to disable prefix filtering entirely.

ingest() Parameters

include_system_tags
bool
default:"false"
When True, ingest all governed tags including those matching the system_prefixes. This overrides prefix filtering completely.

Code Example

import os
from databricks.sdk import WorkspaceClient
from dotenv import load_dotenv
from neo4j import GraphDatabase
from neocarta.connectors.databricks import DatabricksTagsConnector

load_dotenv()

neo4j_driver = GraphDatabase.driver(
    uri=os.getenv("NEO4J_URI"),
    auth=(os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD")),
)
neo4j_database = os.getenv("NEO4J_DATABASE", "neo4j")

# The SDK honors DATABRICKS_HOST / DATABRICKS_TOKEN when built with no args.
workspace_client = WorkspaceClient(
    host=os.getenv("DATABRICKS_HOST"),
    token=os.getenv("DATABRICKS_TOKEN"),
)

DatabricksTagsConnector(
    workspace_client=workspace_client,
    neo4j_driver=neo4j_driver,
    database_name=neo4j_database,
).ingest()  # pass include_system_tags=True to also pull system./class./ai./sap. tags

neo4j_driver.close()
print("Connector completed successfully!")

CLI

pip install "neocarta[cli]"

neocarta databricks tags \
  --workspace-host "https://dbc-xxxx.cloud.databricks.com" \
  --workspace-token "dapi..."

# Include system-managed tags:
neocarta databricks tags \
  --workspace-host "https://dbc-xxxx.cloud.databricks.com" \
  --workspace-token "dapi..." \
  --include-system-tags

Required Environment Variables

VariableExamplePurpose
NEO4J_URIbolt://localhost:7687Neo4j connection URI
NEO4J_USERNAMEneo4jNeo4j username
NEO4J_PASSWORDyour-passwordNeo4j password
NEO4J_DATABASEneo4jTarget Neo4j database
DATABRICKS_HOSThttps://dbc-xxxx.cloud.databricks.comWorkspace URL
DATABRICKS_TOKENdapi...Personal access token

Limitations

  • Definitions only — no TAGGED_WITH edges yet. Tag assignments (which objects carry which tags) are sourced from information_schema.*_tags and require a SQL warehouse. That is a planned follow-up.
  • No per-value descriptions. Databricks allowed values carry no description, so GovernanceTagValue.description is not written.
  • Value-less tags. A governed tag with no allowed values produces a GovernanceTagKey with no GovernanceTagValue options.
  • Account-scoped IDs. Node IDs are namespaced by the metastore ID. The same tag ingested from different workspaces may produce separate nodes if their metastore IDs differ.

Build docs developers (and LLMs) love