Documentation Index
Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt
Use this file to discover all available pages before exploring further.
The connector commands form the core of the Neocarta CLI — each one wraps a Python connector class behind a consistent noun-verb interface that handles credential resolution, Neo4j driver lifecycle, and optional embedding generation. This page documents every flag, its corresponding environment variable, and runnable examples for each command group.
All connector commands accept --neo4j-uri, --neo4j-username, and --neo4j-database to override the corresponding env vars per invocation. NEO4J_PASSWORD is always env-only. See Configuration for the full environment variable reference.
neocarta bigquery
Run BigQuery connectors against your data warehouse. Two verbs are available: schema for structural metadata and logs for query history.
neocarta bigquery schema
Extracts BigQuery schema metadata and loads Database, Schema, Table, and Column nodes plus their relationships into the Neo4j semantic graph. Reads from BigQuery INFORMATION_SCHEMA tables; primary and foreign keys must be defined there for column-level REFERENCES edges to be created.
neocarta bigquery schema [OPTIONS]
| Flag | Type | Default | Env var | Description |
|---|
--project-id | text | — | GCP_PROJECT_ID | GCP project ID |
--dataset-id | text | — | BIGQUERY_DATASET_ID | BigQuery dataset to ingest |
--embeddings / --no-embeddings | flag | --no-embeddings | — | Generate description embeddings after load |
--embedding-model | text | text-embedding-3-small | EMBEDDING_MODEL | LiteLLM embedding model id |
--embedding-dimensions | int | auto-detected | EMBEDDING_DIMENSIONS | Vector dimensions (for models supporting truncation) |
--embedding-batch-size | int | 100 | EMBEDDING_BATCH_SIZE | Nodes per embedding batch |
--dry-run | flag | off | — | Print planned ingestion without touching Neo4j or BigQuery |
--neo4j-uri | text | — | NEO4J_URI | Neo4j Bolt URI override |
--neo4j-username | text | — | NEO4J_USERNAME | Neo4j username override |
--neo4j-database | text | neo4j | NEO4J_DATABASE | Neo4j database name override |
Basic ingest
With embeddings
Dry run (JSON)
From env vars
neocarta bigquery schema \
--project-id my-proj \
--dataset-id sales
neocarta bigquery schema \
--project-id my-proj \
--dataset-id sales \
--embeddings
neocarta bigquery schema \
--project-id my-proj \
--dataset-id sales \
--dry-run --json
GCP_PROJECT_ID=my-proj \
BIGQUERY_DATASET_ID=sales \
neocarta bigquery schema --json
neocarta bigquery logs
Extracts query history from BigQuery INFORMATION_SCHEMA.JOBS_BY_PROJECT and loads Query and CTE nodes plus the table/column references each query touches. This is distinct from neocarta query-log ingest, which reads a local file rather than calling the Cloud Logging API.
neocarta bigquery logs [OPTIONS]
| Flag | Type | Default | Env var | Description |
|---|
--project-id | text | — | GCP_PROJECT_ID | GCP project ID |
--dataset-id | text | — | BIGQUERY_DATASET_ID | Dataset whose queries to ingest |
--region | text | region-us | BIGQUERY_REGION | BigQuery region for INFORMATION_SCHEMA queries |
--start-date | text (ISO 8601) | 30 days ago | — | Inclusive start timestamp |
--end-date | text (ISO 8601) | now | — | Inclusive end timestamp |
--limit | int | 100 | — | Maximum number of queries to extract |
--include-failed-queries | flag | off | — | Retain queries that errored (default: exclude) |
--embeddings / --no-embeddings | flag | --no-embeddings | — | Generate embeddings after load |
--embedding-model | text | text-embedding-3-small | EMBEDDING_MODEL | LiteLLM embedding model id |
--embedding-dimensions | int | auto-detected | EMBEDDING_DIMENSIONS | Vector dimensions |
--embedding-batch-size | int | 100 | EMBEDDING_BATCH_SIZE | Nodes per embedding batch |
--dry-run | flag | off | — | Print planned ingestion without touching Neo4j or BigQuery |
--neo4j-uri | text | — | NEO4J_URI | Neo4j Bolt URI override |
--neo4j-username | text | — | NEO4J_USERNAME | Neo4j username override |
--neo4j-database | text | neo4j | NEO4J_DATABASE | Neo4j database name override |
Basic
Date range
Include failed queries
neocarta bigquery logs \
--dataset-id sales \
--limit 500
neocarta bigquery logs \
--dataset-id sales \
--start-date 2025-01-01 \
--end-date 2025-01-31 \
--json
neocarta bigquery logs \
--dataset-id sales \
--include-failed-queries \
--limit 1000
neocarta csv
Load metadata from a directory of CSV files into the Neo4j semantic graph using CSVConnector.
neocarta csv ingest
Ingests every entity CSV found in the directory (database_info.csv, schema_info.csv, table_info.csv, column_info.csv, column_references_info.csv, value_info.csv, query_info.csv, and glossary CSVs). Files that are absent are silently skipped. When --embeddings is enabled, description embeddings are generated and written back.
neocarta csv ingest [OPTIONS]
| Flag | Type | Default | Env var | Description |
|---|
--csv-directory | text | — | CSV_DIRECTORY | Directory containing the CSV metadata files |
--embeddings / --no-embeddings | flag | --no-embeddings | — | Generate description embeddings after ingest |
--embedding-model | text | text-embedding-3-small | EMBEDDING_MODEL | LiteLLM embedding model id |
--embedding-dimensions | int | auto-detected | EMBEDDING_DIMENSIONS | Vector dimensions |
--embedding-batch-size | int | 100 | EMBEDDING_BATCH_SIZE | Nodes per embedding batch |
--dry-run | flag | off | — | Print planned ingestion without touching Neo4j |
--neo4j-uri | text | — | NEO4J_URI | Neo4j Bolt URI override |
--neo4j-username | text | — | NEO4J_USERNAME | Neo4j username override |
--neo4j-database | text | neo4j | NEO4J_DATABASE | Neo4j database name override |
# Basic ingest from a local directory
neocarta csv ingest --csv-directory ./datasets/csv
# With embeddings
neocarta csv ingest --csv-directory ./datasets/csv --embeddings
# Dry run to validate configuration
CSV_DIRECTORY=./datasets/csv neocarta csv ingest --dry-run --json
neocarta dataplex
Run Dataplex connectors against your Google Cloud catalog. Two verbs are available: schema for BigQuery schema from Dataplex, and glossary for the business glossary.
neocarta dataplex schema
Loads BigQuery schema metadata (Database, Schema, Table, Column) from the Dataplex Universal Catalog using DataplexSchemaConnector. When --embeddings is enabled, Table and Column description embeddings are generated via LiteLLM and written back.
neocarta dataplex schema [OPTIONS]
| Flag | Type | Default | Env var | Description |
|---|
--project-id | text | — | GCP_PROJECT_ID | GCP project ID |
--project-number | text | — | GCP_PROJECT_NUMBER | GCP project number |
--dataplex-location | text | — | DATAPLEX_LOCATION | Dataplex location (e.g. us) |
--dataset-id | text | — | BIGQUERY_DATASET_ID | BigQuery dataset to ingest |
--embeddings / --no-embeddings | flag | --no-embeddings | — | Generate description embeddings after load |
--embedding-model | text | text-embedding-3-small | EMBEDDING_MODEL | LiteLLM embedding model id |
--embedding-dimensions | int | auto-detected | EMBEDDING_DIMENSIONS | Vector dimensions |
--embedding-batch-size | int | 100 | EMBEDDING_BATCH_SIZE | Nodes per embedding batch |
--dry-run | flag | off | — | Print planned ingestion without touching Neo4j or Dataplex |
--neo4j-uri | text | — | NEO4J_URI | Neo4j Bolt URI override |
--neo4j-username | text | — | NEO4J_USERNAME | Neo4j username override |
--neo4j-database | text | neo4j | NEO4J_DATABASE | Neo4j database name override |
neocarta dataplex schema \
--project-id my-proj \
--project-number 123456789 \
--dataplex-location us \
--dataset-id sales \
--embeddings
neocarta dataplex glossary
Loads the Dataplex business glossary (Glossary, Category, BusinessTerm) plus their relationships using DataplexGlossaryConnector. With --entry-links (the default), it also loads catalog↔glossary entry links as (:Column|:Table)-[:TAGGED_WITH]->(:BusinessTerm) edges. These edges attach to existing schema nodes, so run dataplex schema first for the tags to land correctly.
neocarta dataplex glossary [OPTIONS]
| Flag | Type | Default | Env var | Description |
|---|
--project-id | text | — | GCP_PROJECT_ID | GCP project ID |
--project-number | text | — | GCP_PROJECT_NUMBER | GCP project number |
--dataplex-location | text | — | DATAPLEX_LOCATION | Dataplex location (e.g. us) |
--entry-links / --no-entry-links | flag | --entry-links | — | Load TAGGED_WITH catalog entry links (requires schema nodes to exist) |
--embeddings / --no-embeddings | flag | --no-embeddings | — | Generate BusinessTerm description embeddings after load |
--embedding-model | text | text-embedding-3-small | EMBEDDING_MODEL | LiteLLM embedding model id |
--embedding-dimensions | int | auto-detected | EMBEDDING_DIMENSIONS | Vector dimensions |
--embedding-batch-size | int | 100 | EMBEDDING_BATCH_SIZE | Nodes per embedding batch |
--dry-run | flag | off | — | Print planned ingestion without touching Neo4j or Dataplex |
--neo4j-uri | text | — | NEO4J_URI | Neo4j Bolt URI override |
--neo4j-username | text | — | NEO4J_USERNAME | Neo4j username override |
--neo4j-database | text | neo4j | NEO4J_DATABASE | Neo4j database name override |
Run neocarta dataplex schema before neocarta dataplex glossary when --entry-links is enabled. Entry links attach TAGGED_WITH edges to schema nodes — if those nodes don’t exist yet, the tags have nothing to land on.
# Load glossary with entry links (default)
neocarta dataplex glossary \
--project-id my-proj \
--project-number 123456789 \
--dataplex-location us \
--embeddings
# Load glossary content only, skip entry link REST round-trips
neocarta dataplex glossary \
--project-id my-proj \
--project-number 123456789 \
--dataplex-location us \
--no-entry-links
neocarta jdbc
Extract schema from any JDBC-accessible relational database via SchemaCrawler.
neocarta jdbc schema
Shells out to SchemaCrawler (Java) to read Database, Schema, Table, Column, and foreign-key references from any JDBC-accessible database, then loads them into Neo4j using JdbcSchemaConnector. Works against PostgreSQL, MySQL, SQL Server, Oracle, and any other JDBC-compatible source.
neocarta jdbc schema [OPTIONS]
Requires Java 11+, a SchemaCrawler distribution JAR, and a JDBC driver JAR installed on the host. The database password is read only from JDBC_PASSWORD — it is never a CLI flag — to keep it out of shell history and the process list.
| Flag | Type | Default | Env var | Description |
|---|
--jdbc-url | text | — | JDBC_URL | JDBC connection URL (e.g. jdbc:postgresql://host:5432/mydb) |
--jdbc-driver | text | — | JDBC_DRIVER | Fully-qualified JDBC driver class (e.g. org.postgresql.Driver) |
--jdbc-driver-jar | text | — | JDBC_DRIVER_JAR | Filesystem path to the JDBC driver JAR |
--schemacrawler-jar | text | — | SCHEMACRAWLER_JAR | Path or classpath glob to the SchemaCrawler distribution JARs |
--db-user | text | — | JDBC_USER | Database username |
--source-database-name | text | derived from URL | JDBC_SOURCE_DATABASE_NAME | Name for the graph Database node (required for Oracle SID, SQL Server URLs) |
--platform | text | — | JDBC_PLATFORM | Hosting platform for the graph Database node (e.g. AWS_RDS) |
--service | text | SchemaCrawler-reported | JDBC_SERVICE | Database service/engine for the graph Database node |
--timeout | int | 120 | JDBC_TIMEOUT | Max seconds to wait for the SchemaCrawler subprocess |
--schema | text (repeatable) | all schemas | — | Schema name to include; repeatable for multiple schemas |
--embeddings / --no-embeddings | flag | --no-embeddings | — | Generate description embeddings after ingest |
--embedding-model | text | text-embedding-3-small | EMBEDDING_MODEL | LiteLLM embedding model id |
--embedding-dimensions | int | auto-detected | EMBEDDING_DIMENSIONS | Vector dimensions |
--embedding-batch-size | int | 100 | EMBEDDING_BATCH_SIZE | Nodes per embedding batch |
--dry-run | flag | off | — | Print planned ingestion without touching Neo4j or the source database |
--neo4j-uri | text | — | NEO4J_URI | Neo4j Bolt URI override |
--neo4j-username | text | — | NEO4J_USERNAME | Neo4j username override |
--neo4j-database | text | neo4j | NEO4J_DATABASE | Neo4j database name override |
PostgreSQL
Scoped schemas
Dry run
JDBC_PASSWORD=secret \
neocarta jdbc schema \
--jdbc-url jdbc:postgresql://localhost:5432/sales \
--jdbc-driver org.postgresql.Driver \
--jdbc-driver-jar ./drivers/postgresql.jar \
--schemacrawler-jar './schemacrawler/lib/*' \
--db-user analytics
neocarta jdbc schema \
--jdbc-url jdbc:postgresql://localhost:5432/sales \
--jdbc-driver org.postgresql.Driver \
--jdbc-driver-jar ./drivers/postgresql.jar \
--schemacrawler-jar './schemacrawler/lib/*' \
--schema public \
--schema sales \
--embeddings
JDBC_URL=jdbc:postgresql://localhost:5432/sales \
neocarta jdbc schema --dry-run --json
neocarta osi
Run OSI (Open Semantic Interchange) connectors. Two verbs are available: ingest loads a YAML spec into Neo4j, and export reads an OsiSemanticModel back out to a YAML file.
neocarta osi ingest
Reads an OSI YAML spec from a local path or an HTTP(S) URL and loads its semantic model into Neo4j using OsiConnector. Ingests OsiSemanticModel, OsiTable, OsiColumn, Query, Metric, Join, and aspect nodes; synonyms in ai_context are upserted as BusinessTerm nodes (merged on name, deduplicating against catalog-derived terms from Dataplex).
neocarta osi ingest [OPTIONS]
| Flag | Type | Default | Env var | Description |
|---|
--spec-source | text | — | OSI_SPEC_SOURCE | Local filesystem path or HTTP(S) URL to the OSI YAML spec |
--embeddings / --no-embeddings | flag | --no-embeddings | — | Generate description embeddings after ingest |
--embedding-model | text | text-embedding-3-small | EMBEDDING_MODEL | LiteLLM embedding model id |
--embedding-dimensions | int | auto-detected | EMBEDDING_DIMENSIONS | Vector dimensions |
--embedding-batch-size | int | 100 | EMBEDDING_BATCH_SIZE | Nodes per embedding batch |
--dry-run | flag | off | — | Print planned ingestion without touching Neo4j |
--neo4j-uri | text | — | NEO4J_URI | Neo4j Bolt URI override |
--neo4j-username | text | — | NEO4J_USERNAME | Neo4j username override |
--neo4j-database | text | neo4j | NEO4J_DATABASE | Neo4j database name override |
# Ingest from a local file
neocarta osi ingest --spec-source ./datasets/osi/acme_semantic_model.yaml
# Ingest from a URL with embeddings
neocarta osi ingest \
--spec-source https://example.com/models/my_model.yaml \
--embeddings
# Dry run
OSI_SPEC_SOURCE=./datasets/osi/acme_semantic_model.yaml \
neocarta osi ingest --dry-run --json
neocarta osi export
Exports an OsiSemanticModel subgraph from Neo4j back to a spec-compliant OSI YAML file using OsiConnector. Reads everything owned by the named model (tables, columns, metrics, joins, aspects) and serializes it with preserved column ordering, native ai_context structure, and literal-block JSON for custom extensions.
neocarta osi export [OPTIONS]
| Flag | Type | Default | Env var | Description |
|---|
--semantic-model-name | text | — | OSI_SEMANTIC_MODEL_NAME | Name of the OsiSemanticModel to export |
--output-path | text | — | — | Destination path for the exported OSI YAML file (required) |
--dry-run | flag | off | — | Print planned export without touching Neo4j |
--neo4j-uri | text | — | NEO4J_URI | Neo4j Bolt URI override |
--neo4j-username | text | — | NEO4J_USERNAME | Neo4j username override |
--neo4j-database | text | neo4j | NEO4J_DATABASE | Neo4j database name override |
If --semantic-model-name does not match any OsiSemanticModel in the graph, the command exits with code 3 (not_found). Use neocarta tool list-schemas to inspect available models.
# Export to a local file
neocarta osi export \
--semantic-model-name acme_corp_model \
--output-path ./acme.yaml
# Using env var for model name
OSI_SEMANTIC_MODEL_NAME=acme_corp_model \
neocarta osi export --output-path ./acme.yaml
# Dry run
neocarta osi export \
--semantic-model-name acme_corp_model \
--output-path ./acme.yaml \
--dry-run --json
neocarta query-log
Parse a local query-log JSON file into Neo4j.
neocarta query-log ingest
Reads a local query-log JSON file (currently the BigQuery export format) and loads Query and CTE nodes plus the Database, Schema, Table, and Column structure and table/column references each query touches. No embeddings are generated — query-log nodes carry no descriptions to embed.
This is distinct from neocarta bigquery logs, which pulls query logs live from the BigQuery Cloud Logging API. Use this command when you already have an exported log file on disk.
neocarta query-log ingest [OPTIONS]
| Flag | Type | Default | Env var | Description |
|---|
--query-log-file | text | — | QUERY_LOG_FILE | Path to the query-log JSON file |
--source | text | bigquery | — | Source/format of the query-log file |
--dry-run | flag | off | — | Print planned ingestion without reading the file or touching Neo4j |
--neo4j-uri | text | — | NEO4J_URI | Neo4j Bolt URI override |
--neo4j-username | text | — | NEO4J_USERNAME | Neo4j username override |
--neo4j-database | text | neo4j | NEO4J_DATABASE | Neo4j database name override |
# Ingest from a local file
neocarta query-log ingest --query-log-file ./query_logs.json
# Using env var
QUERY_LOG_FILE=./query_logs.json neocarta query-log ingest --dry-run --json
neocarta databricks
Load Databricks Unity Catalog governed-tag definitions into the Neo4j semantic graph.
Reads governed-tag definitions (tag policies) via the Databricks SDK — no SQL warehouse required — and loads GovernanceTagKey and GovernanceTagValue nodes with HAS_VALUE_OPTION edges. Platform/partner-managed tags (matching --system-prefixes) are excluded by default. When --embeddings is enabled, GovernanceTagKey description embeddings are generated via LiteLLM and written back.
neocarta databricks tags [OPTIONS]
Requires pip install "neocarta[databricks]". The workspace access token is read only from DATABRICKS_TOKEN — it is never a CLI flag. If the databricks extra is not installed, the command exits with a clear usage_error (code 2) rather than a raw import traceback.
| Flag | Type | Default | Env var | Description |
|---|
--host | text | — | DATABRICKS_HOST | Databricks workspace URL (e.g. https://dbc-xxxx.cloud.databricks.com) |
--source | text | metastore ID | — | Explicit namespace for governance-tag node IDs |
--system-prefixes | text | system.,class.,ai.,sap. | — | Comma-separated tag-key prefixes treated as system tags and excluded by default |
--include-system-tags / --no-include-system-tags | flag | --no-include-system-tags | — | Also ingest platform-managed tags matching --system-prefixes |
--embeddings / --no-embeddings | flag | --no-embeddings | — | Generate GovernanceTagKey description embeddings after load |
--embedding-model | text | text-embedding-3-small | EMBEDDING_MODEL | LiteLLM embedding model id |
--embedding-dimensions | int | auto-detected | EMBEDDING_DIMENSIONS | Vector dimensions |
--embedding-batch-size | int | 100 | EMBEDDING_BATCH_SIZE | Nodes per embedding batch |
--dry-run | flag | off | — | Print planned ingestion without touching Neo4j or Databricks |
# Load user-defined tags (excluding system tags)
DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com \
DATABRICKS_TOKEN=dapi... \
neocarta databricks tags
# Include system tags and generate embeddings
neocarta databricks tags \
--host https://dbc-xxxx.cloud.databricks.com \
--include-system-tags \
--embeddings
# Dry run to verify configuration
neocarta databricks tags \
--host https://dbc-xxxx.cloud.databricks.com \
--dry-run --json
neocarta mcp serve
Starts the Neocarta MCP server over stdio, exposing the Neo4j semantic-layer retrieval tools to any MCP client (Claude Desktop, the bundled Text2SQL agent, or any other MCP-compatible host). This is the CLI-native equivalent of the standalone neocarta-mcp console script.
neocarta mcp serve [OPTIONS]
Requires pip install "neocarta[mcp]" in addition to [cli]. If the mcp extra is not installed, the command exits with a clear usage_error (code 2). Install both at once with pip install "neocarta[cli,mcp]".
| Flag | Type | Default | Description |
|---|
--dry-run | flag | off | Print the planned server configuration without starting it |
The server communicates over stdio and owns stdout for the MCP protocol. The startup notice and all diagnostics go to stderr. Configuration is read from the standard NEO4J_* and EMBEDDING_* environment variables (or a .env file).
# Start the MCP server
neocarta mcp serve
# Verify configuration without starting
neocarta mcp serve --dry-run --json
For Claude Desktop integration, use the standalone neocarta-mcp console script via uvx rather than neocarta mcp serve. See the MCP server documentation for the full claude_desktop_config.json snippet.