Dataplex Schema and Glossary Connectors

GCP Dataplex Universal Catalog is Google’s managed metadata and data governance service. Neocarta provides two purpose-scoped connectors for it: DataplexSchemaConnector reads BigQuery structural metadata (tables and columns) from the Dataplex catalog, and DataplexGlossaryConnector reads business glossary terms and links them to the schema entities via TAGGED_WITH edges.

Always run DataplexSchemaConnector before DataplexGlossaryConnector. The glossary connector creates (:Column)-[:TAGGED_WITH]->(:BusinessTerm) and (:Table)-[:TAGGED_WITH]->(:BusinessTerm) edges, which require the Column and Table nodes to already exist in Neo4j. If you run the glossary connector first, those edges will reference missing nodes.

DataplexSchemaConnector

Extracts BigQuery catalog metadata via the Dataplex CatalogServiceClient and maps it to Database, Schema, Table, and Column nodes.

Dataplex catalog metadata is less comprehensive than reading BigQuery Information Schema directly. Primary and foreign keys are not available via the Dataplex API, so columns are loaded without is_primary_key / is_foreign_key flags and no REFERENCES edges are produced. Use BigQuerySchemaConnector if you need foreign key relationships.

What it ingests:

Database node for the GCP project
Schema nodes for each dataset
Table nodes with descriptions
Column nodes with types, nullability, and descriptions

Import

from neocarta.connectors.dataplex import DataplexSchemaConnector

Parameters

catalog_client

dataplex_v1.CatalogServiceClient

required

An authenticated Dataplex Catalog client.

project_id

str

required

GCP project ID (e.g. "my-project").

project_number

str

required

GCP project number (e.g. "123456789012"). Required for Dataplex entry path construction.

dataplex_location

str

required

Dataplex location, e.g. "us" or "us-central1".

neo4j_driver

neo4j.Driver

required

Connected Neo4j driver instance.

database_name

str

default:"neo4j"

Target Neo4j database name.

`ingest()` Parameters

dataset_id

str

required

The BigQuery dataset ID to extract.

Code Example

import os
from dotenv import load_dotenv
from google.cloud import dataplex_v1
from neo4j import GraphDatabase
from neocarta.connectors.dataplex import DataplexSchemaConnector, DataplexGlossaryConnector

load_dotenv()

neo4j_driver = GraphDatabase.driver(
    uri=os.getenv("NEO4J_URI"),
    auth=(os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD")),
)
neo4j_database = os.getenv("NEO4J_DATABASE", "neo4j")

catalog_client = dataplex_v1.CatalogServiceClient()
glossary_client = dataplex_v1.BusinessGlossaryServiceClient()

common = dict(
    project_id=os.getenv("GCP_PROJECT_ID"),
    project_number=os.getenv("GCP_PROJECT_NUMBER"),
    dataplex_location=os.getenv("DATAPLEX_LOCATION"),
    neo4j_driver=neo4j_driver,
    database_name=neo4j_database,
)

# Step 1 — Schema (must run before glossary)
DataplexSchemaConnector(catalog_client=catalog_client, **common).ingest(
    dataset_id=os.getenv("BIGQUERY_DATASET_ID")
)

CLI

neocarta dataplex schema \
  --project-id my-proj \
  --project-number 123456789012 \
  --dataplex-location us \
  --dataset-id sales

Required Environment Variables

Variable	Purpose
`NEO4J_URI`	Neo4j connection URI
`NEO4J_USERNAME`	Neo4j username
`NEO4J_PASSWORD`	Neo4j password
`NEO4J_DATABASE`	Target Neo4j database (default: `neo4j`)
`GCP_PROJECT_ID`	GCP project ID
`GCP_PROJECT_NUMBER`	GCP project number
`DATAPLEX_LOCATION`	Dataplex location (e.g. `us`, `us-central1`)
`BIGQUERY_DATASET_ID`	BigQuery dataset ID

DataplexGlossaryConnector

Reads Dataplex business glossary content and catalog-to-glossary entry links, producing Glossary, Category, and BusinessTerm nodes along with TAGGED_WITH edges that connect them to the schema entities ingested by DataplexSchemaConnector. What it ingests:

Glossary nodes with name, description, and resource path
Category nodes within each glossary
BusinessTerm nodes within each category
(:Glossary)-[:HAS_CATEGORY]->(:Category) edges
(:Category)-[:HAS_BUSINESS_TERM]->(:BusinessTerm) edges
(:Column)-[:TAGGED_WITH]->(:BusinessTerm) edges (when include_entry_links=True)
(:Table)-[:TAGGED_WITH]->(:BusinessTerm) edges (when include_entry_links=True)

Import

from neocarta.connectors.dataplex import DataplexGlossaryConnector

Parameters

glossary_client

dataplex_v1.BusinessGlossaryServiceClient

required

An authenticated Dataplex Business Glossary client.

project_id

str

required

GCP project ID.

project_number

str

required

GCP project number.

dataplex_location

str

required

Dataplex location, e.g. "us" or "us-central1".

neo4j_driver

neo4j.Driver

required

Connected Neo4j driver instance.

database_name

str

default:"neo4j"

Target Neo4j database name.

`ingest()` Parameters

include_entry_links

bool

default:"true"

Whether to also ingest catalog↔glossary entry links (TAGGED_WITH edges). Set to False if the schema catalog is not loaded in this Neo4j instance, or to skip the REST API round-trips when you only want the glossary content itself.

Code Example

# (continuing from the DataplexSchemaConnector example above)

# Step 2 — Glossary (after schema)
DataplexGlossaryConnector(glossary_client=glossary_client, **common).ingest(
    include_entry_links=True,   # default — attaches TAGGED_WITH edges to Column/Table
)

neo4j_driver.close()
print("Dataplex ingestion completed successfully!")

To load glossary content without creating TAGGED_WITH edges (e.g. you skipped schema ingest):

DataplexGlossaryConnector(glossary_client=glossary_client, **common).ingest(
    include_entry_links=False,
)

CLI

neocarta dataplex glossary \
  --project-id my-proj \
  --project-number 123456789012 \
  --dataplex-location us

Required Environment Variables

Variable	Purpose
`NEO4J_URI`	Neo4j connection URI
`NEO4J_USERNAME`	Neo4j username
`NEO4J_PASSWORD`	Neo4j password
`NEO4J_DATABASE`	Target Neo4j database (default: `neo4j`)
`GCP_PROJECT_ID`	GCP project ID
`GCP_PROJECT_NUMBER`	GCP project number
`DATAPLEX_LOCATION`	Dataplex location (e.g. `us`, `us-central1`)

Get Started

Connectors

Enrichment

MCP Server

CLI Reference

Dataplex Schema and Glossary Connectors

DataplexSchemaConnector

Import

Parameters

`ingest()` Parameters

Code Example

CLI

Required Environment Variables

DataplexGlossaryConnector

Import

Parameters

`ingest()` Parameters

Code Example

CLI

Required Environment Variables

Build docs developers (and LLMs) love

Get Started

Connectors

Enrichment

MCP Server

CLI Reference

Documentation Index

​DataplexSchemaConnector

​Import

​Parameters

​ingest() Parameters

​Code Example

​CLI

​Required Environment Variables

​DataplexGlossaryConnector

​Import

​Parameters

​ingest() Parameters

​Code Example

​CLI

​Required Environment Variables

Build docs developers (and LLMs) love

DataplexSchemaConnector

Import

Parameters

`ingest()` Parameters

Code Example

CLI

Required Environment Variables

DataplexGlossaryConnector

Import

Parameters

`ingest()` Parameters

Code Example

CLI

Required Environment Variables