Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt

Use this file to discover all available pages before exploring further.

Connectors are the ETL backbone of Neocarta. Each connector reads metadata from a specific data source, transforms it into the shared graph schema, and loads it into Neo4j. Only metadata crosses into Neo4j — your data stays in the source. All connectors share the same public API contract so you can mix and match sources in a single graph.

Connector Lifecycle

Every connector follows the same three-stage ETL pipeline. You can call the stages individually for fine-grained control, or let ingest() orchestrate everything at once.
1

extract()

Connects to the external system and reads raw metadata into an internal cache. Accepts source-specific arguments such as dataset_id or catalog.
2

transform()

Converts the cached raw data into typed graph data model objects (nodes and relationships). Raises StateError if called before extract().
3

load()

Writes the transformed objects into Neo4j using MERGE semantics. Raises StateError if called before transform().
ingest() runs all three in order and records a neocarta metadata node at the end. Format connectors (CSV and OSI) also expose an export() method that reads an entity subgraph from Neo4j and writes it back out in the connector’s native format.
Calling stages out of order — for example, transform() before extract() — raises StateError with a helpful suggestion. Always call ingest() unless you need stage-level control.

Context Manager Support

All connectors implement Python’s context manager protocol. Using with ensures that any connector-owned resources (HTTP clients, connection pools) are released cleanly on exit, even if an error occurs.
from neo4j import GraphDatabase
from neocarta.connectors.bigquery import BigQuerySchemaConnector

driver = GraphDatabase.driver(
    uri=os.getenv("NEO4J_URI"),
    auth=(os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD")),
)

with BigQuerySchemaConnector(
    client=bigquery_client,
    project_id=os.getenv("GCP_PROJECT_ID"),
    neo4j_driver=driver,
) as connector:
    connector.ingest(dataset_id=os.getenv("BIGQUERY_DATASET_ID"))
For connectors that own no long-lived client (BigQuery, Dataplex, query log), context manager usage is optional but encouraged for consistency. For connectors that do own a client (Unity Catalog creates an HTTP connection pool), it is strongly recommended.

Required Environment Variables

All connectors need a running Neo4j instance. Configure the connection with these four environment variables (read from a .env file or the shell):
VariableExamplePurpose
NEO4J_URIbolt://localhost:7687Neo4j connection URI
NEO4J_USERNAMEneo4jNeo4j username
NEO4J_PASSWORDyour-passwordNeo4j password
NEO4J_DATABASEneo4jTarget database (default: neo4j)
Each connector page lists any additional source-specific variables required beyond these four.

Available Connectors

BigQuery

Schema metadata and query log extraction from Google BigQuery, including foreign keys and sample values.

Dataplex

BigQuery schema and business glossary from GCP Dataplex Universal Catalog, with TAGGED_WITH entry links.

CSV

Load any metadata from structured CSV files — useful for manual curation or systems without direct API access.

JDBC

Schema metadata from any JDBC-compatible relational database via SchemaCrawler (PostgreSQL, MySQL, Oracle, and more).

Unity Catalog

Schema metadata from the open Unity Catalog REST API — works with any conformant server, not just Databricks.

Databricks

Governed-tag definitions from managed Databricks Unity Catalog, mapped into the vendor-neutral governance-tag layer.

Query Log

Parse a local query-log JSON file into Query, CTE, and usage relationship nodes in the graph.

OSI

Bidirectional connector for Open Semantic Interchange YAML — ingest a semantic model spec or export one from Neo4j.

Build docs developers (and LLMs) love