Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt

Use this file to discover all available pages before exploring further.

GCP Dataplex Universal Catalog is Google’s managed metadata and data governance service. Neocarta provides two purpose-scoped connectors for it: DataplexSchemaConnector reads BigQuery structural metadata (tables and columns) from the Dataplex catalog, and DataplexGlossaryConnector reads business glossary terms and links them to the schema entities via TAGGED_WITH edges.
Always run DataplexSchemaConnector before DataplexGlossaryConnector. The glossary connector creates (:Column)-[:TAGGED_WITH]->(:BusinessTerm) and (:Table)-[:TAGGED_WITH]->(:BusinessTerm) edges, which require the Column and Table nodes to already exist in Neo4j. If you run the glossary connector first, those edges will reference missing nodes.

DataplexSchemaConnector

Extracts BigQuery catalog metadata via the Dataplex CatalogServiceClient and maps it to Database, Schema, Table, and Column nodes.
Dataplex catalog metadata is less comprehensive than reading BigQuery Information Schema directly. Primary and foreign keys are not available via the Dataplex API, so columns are loaded without is_primary_key / is_foreign_key flags and no REFERENCES edges are produced. Use BigQuerySchemaConnector if you need foreign key relationships.
What it ingests:
  • Database node for the GCP project
  • Schema nodes for each dataset
  • Table nodes with descriptions
  • Column nodes with types, nullability, and descriptions

Import

from neocarta.connectors.dataplex import DataplexSchemaConnector

Parameters

catalog_client
dataplex_v1.CatalogServiceClient
required
An authenticated Dataplex Catalog client.
project_id
str
required
GCP project ID (e.g. "my-project").
project_number
str
required
GCP project number (e.g. "123456789012"). Required for Dataplex entry path construction.
dataplex_location
str
required
Dataplex location, e.g. "us" or "us-central1".
neo4j_driver
neo4j.Driver
required
Connected Neo4j driver instance.
database_name
str
default:"neo4j"
Target Neo4j database name.

ingest() Parameters

dataset_id
str
required
The BigQuery dataset ID to extract.

Code Example

import os
from dotenv import load_dotenv
from google.cloud import dataplex_v1
from neo4j import GraphDatabase
from neocarta.connectors.dataplex import DataplexSchemaConnector, DataplexGlossaryConnector

load_dotenv()

neo4j_driver = GraphDatabase.driver(
    uri=os.getenv("NEO4J_URI"),
    auth=(os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD")),
)
neo4j_database = os.getenv("NEO4J_DATABASE", "neo4j")

catalog_client = dataplex_v1.CatalogServiceClient()
glossary_client = dataplex_v1.BusinessGlossaryServiceClient()

common = dict(
    project_id=os.getenv("GCP_PROJECT_ID"),
    project_number=os.getenv("GCP_PROJECT_NUMBER"),
    dataplex_location=os.getenv("DATAPLEX_LOCATION"),
    neo4j_driver=neo4j_driver,
    database_name=neo4j_database,
)

# Step 1 — Schema (must run before glossary)
DataplexSchemaConnector(catalog_client=catalog_client, **common).ingest(
    dataset_id=os.getenv("BIGQUERY_DATASET_ID")
)

CLI

neocarta dataplex schema \
  --project-id my-proj \
  --project-number 123456789012 \
  --dataplex-location us \
  --dataset-id sales

Required Environment Variables

VariablePurpose
NEO4J_URINeo4j connection URI
NEO4J_USERNAMENeo4j username
NEO4J_PASSWORDNeo4j password
NEO4J_DATABASETarget Neo4j database (default: neo4j)
GCP_PROJECT_IDGCP project ID
GCP_PROJECT_NUMBERGCP project number
DATAPLEX_LOCATIONDataplex location (e.g. us, us-central1)
BIGQUERY_DATASET_IDBigQuery dataset ID

DataplexGlossaryConnector

Reads Dataplex business glossary content and catalog-to-glossary entry links, producing Glossary, Category, and BusinessTerm nodes along with TAGGED_WITH edges that connect them to the schema entities ingested by DataplexSchemaConnector. What it ingests:
  • Glossary nodes with name, description, and resource path
  • Category nodes within each glossary
  • BusinessTerm nodes within each category
  • (:Glossary)-[:HAS_CATEGORY]->(:Category) edges
  • (:Category)-[:HAS_BUSINESS_TERM]->(:BusinessTerm) edges
  • (:Column)-[:TAGGED_WITH]->(:BusinessTerm) edges (when include_entry_links=True)
  • (:Table)-[:TAGGED_WITH]->(:BusinessTerm) edges (when include_entry_links=True)

Import

from neocarta.connectors.dataplex import DataplexGlossaryConnector

Parameters

glossary_client
dataplex_v1.BusinessGlossaryServiceClient
required
An authenticated Dataplex Business Glossary client.
project_id
str
required
GCP project ID.
project_number
str
required
GCP project number.
dataplex_location
str
required
Dataplex location, e.g. "us" or "us-central1".
neo4j_driver
neo4j.Driver
required
Connected Neo4j driver instance.
database_name
str
default:"neo4j"
Target Neo4j database name.

ingest() Parameters

include_entry_links
bool
default:"true"
Whether to also ingest catalog↔glossary entry links (TAGGED_WITH edges). Set to False if the schema catalog is not loaded in this Neo4j instance, or to skip the REST API round-trips when you only want the glossary content itself.

Code Example

# (continuing from the DataplexSchemaConnector example above)

# Step 2 — Glossary (after schema)
DataplexGlossaryConnector(glossary_client=glossary_client, **common).ingest(
    include_entry_links=True,   # default — attaches TAGGED_WITH edges to Column/Table
)

neo4j_driver.close()
print("Dataplex ingestion completed successfully!")
To load glossary content without creating TAGGED_WITH edges (e.g. you skipped schema ingest):
DataplexGlossaryConnector(glossary_client=glossary_client, **common).ingest(
    include_entry_links=False,
)

CLI

neocarta dataplex glossary \
  --project-id my-proj \
  --project-number 123456789012 \
  --dataplex-location us

Required Environment Variables

VariablePurpose
NEO4J_URINeo4j connection URI
NEO4J_USERNAMENeo4j username
NEO4J_PASSWORDNeo4j password
NEO4J_DATABASETarget Neo4j database (default: neo4j)
GCP_PROJECT_IDGCP project ID
GCP_PROJECT_NUMBERGCP project number
DATAPLEX_LOCATIONDataplex location (e.g. us, us-central1)

Build docs developers (and LLMs) love