Neocarta ships two BigQuery connectors that complement each other.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt
Use this file to discover all available pages before exploring further.
BigQuerySchemaConnector reads structural metadata — tables, columns, foreign keys, and sample values — from BigQuery Information Schema tables. BigQueryLogsConnector reads historical SQL queries from INFORMATION_SCHEMA.JOBS_BY_PROJECT, parses them, and maps the table and column usage patterns into the graph. Run both together for the most complete picture of your dataset.
BigQuerySchemaConnector
Extracts structural metadata for a BigQuery dataset and maps it to the Neocarta graph schema. Primary and foreign keys must be defined in the BigQuery Information Schema forREFERENCES edges to be created.
What it extracts:
Databasenode representing the GCP projectSchemanodes for each datasetTablenodes with descriptionsColumnnodes with types, nullability, primary/foreign key flags, and descriptions(:Column)-[:REFERENCES]->(:Column)edges from foreign key definitionsValuenodes with sampled unique column values
Import
Parameters
An authenticated BigQuery client. The client’s
project attribute is used as the project ID when project_id is omitted.GCP project ID. Falls back to
client.project when not supplied explicitly.Connected Neo4j driver instance. The caller owns the driver; the connector does not close it.
Target Neo4j database name.
ingest() Parameters
The BigQuery dataset to ingest. Pass it here rather than to the constructor.
Code Example
CLI
Required Environment Variables
| Variable | Purpose |
|---|---|
NEO4J_URI | Neo4j connection URI |
NEO4J_USERNAME | Neo4j username |
NEO4J_PASSWORD | Neo4j password |
NEO4J_DATABASE | Target Neo4j database (default: neo4j) |
GCP_PROJECT_ID | GCP project ID |
BIGQUERY_DATASET_ID | BigQuery dataset to ingest |
BigQueryLogsConnector
Reads SQL queries fromINFORMATION_SCHEMA.JOBS_BY_PROJECT, parses them to discover table and column usage, and loads query patterns into the graph. This reveals how your data is actually being used rather than relying solely on declared schema.
What it extracts:
- SQL queries from BigQuery job history
- Tables and columns referenced in each query (via SQL parsing)
- Join relationships between tables (from SQL
JOINclauses)
| Node / Relationship | Properties |
|---|---|
Query | content (query text), query_id (hash of content) |
(:Query)-[:USES_TABLE]->(:Table) | — |
(:Query)-[:USES_COLUMN]->(:Column) | — |
Import
Parameters
An authenticated BigQuery client.
GCP project ID.
Connected Neo4j driver instance.
Target Neo4j database name.
ingest() Parameters
The BigQuery dataset to filter queries by.
The BigQuery region for
INFORMATION_SCHEMA.JOBS_BY_PROJECT.Optional ISO-8601 start of the query window, e.g.
"2024-01-01 00:00:00".Optional ISO-8601 end of the query window, e.g.
"2024-01-31 23:59:59".Maximum number of queries to extract.
Whether to exclude failed queries from the extract.
Code Example
CLI
Required Environment Variables
| Variable | Purpose |
|---|---|
NEO4J_URI | Neo4j connection URI |
NEO4J_USERNAME | Neo4j username |
NEO4J_PASSWORD | Neo4j password |
NEO4J_DATABASE | Target Neo4j database (default: neo4j) |
GCP_PROJECT_ID | GCP project ID |
BIGQUERY_DATASET_ID | BigQuery dataset to filter queries by |