Neocarta CLI Connector Command Reference

The connector commands form the core of the Neocarta CLI — each one wraps a Python connector class behind a consistent noun-verb interface that handles credential resolution, Neo4j driver lifecycle, and optional embedding generation. This page documents every flag, its corresponding environment variable, and runnable examples for each command group.

All connector commands accept --neo4j-uri, --neo4j-username, and --neo4j-database to override the corresponding env vars per invocation. NEO4J_PASSWORD is always env-only. See Configuration for the full environment variable reference.

`neocarta bigquery`

Run BigQuery connectors against your data warehouse. Two verbs are available: schema for structural metadata and logs for query history.

`neocarta bigquery schema`

Extracts BigQuery schema metadata and loads Database, Schema, Table, and Column nodes plus their relationships into the Neo4j semantic graph. Reads from BigQuery INFORMATION_SCHEMA tables; primary and foreign keys must be defined there for column-level REFERENCES edges to be created.

neocarta bigquery schema [OPTIONS]

Flag	Type	Default	Env var	Description
`--project-id`	text	—	`GCP_PROJECT_ID`	GCP project ID
`--dataset-id`	text	—	`BIGQUERY_DATASET_ID`	BigQuery dataset to ingest
`--embeddings` / `--no-embeddings`	flag	`--no-embeddings`	—	Generate description embeddings after load
`--embedding-model`	text	`text-embedding-3-small`	`EMBEDDING_MODEL`	LiteLLM embedding model id
`--embedding-dimensions`	int	auto-detected	`EMBEDDING_DIMENSIONS`	Vector dimensions (for models supporting truncation)
`--embedding-batch-size`	int	`100`	`EMBEDDING_BATCH_SIZE`	Nodes per embedding batch
`--dry-run`	flag	off	—	Print planned ingestion without touching Neo4j or BigQuery
`--neo4j-uri`	text	—	`NEO4J_URI`	Neo4j Bolt URI override
`--neo4j-username`	text	—	`NEO4J_USERNAME`	Neo4j username override
`--neo4j-database`	text	`neo4j`	`NEO4J_DATABASE`	Neo4j database name override

Basic ingest
With embeddings
Dry run (JSON)
From env vars

neocarta bigquery schema \
  --project-id my-proj \
  --dataset-id sales

neocarta bigquery schema \
  --project-id my-proj \
  --dataset-id sales \
  --embeddings

neocarta bigquery schema \
  --project-id my-proj \
  --dataset-id sales \
  --dry-run --json

GCP_PROJECT_ID=my-proj \
BIGQUERY_DATASET_ID=sales \
neocarta bigquery schema --json

`neocarta bigquery logs`

Extracts query history from BigQuery INFORMATION_SCHEMA.JOBS_BY_PROJECT and loads Query and CTE nodes plus the table/column references each query touches. This is distinct from neocarta query-log ingest, which reads a local file rather than calling the Cloud Logging API.

neocarta bigquery logs [OPTIONS]

Flag	Type	Default	Env var	Description
`--project-id`	text	—	`GCP_PROJECT_ID`	GCP project ID
`--dataset-id`	text	—	`BIGQUERY_DATASET_ID`	Dataset whose queries to ingest
`--region`	text	`region-us`	`BIGQUERY_REGION`	BigQuery region for `INFORMATION_SCHEMA` queries
`--start-date`	text (ISO 8601)	30 days ago	—	Inclusive start timestamp
`--end-date`	text (ISO 8601)	now	—	Inclusive end timestamp
`--limit`	int	`100`	—	Maximum number of queries to extract
`--include-failed-queries`	flag	off	—	Retain queries that errored (default: exclude)
`--embeddings` / `--no-embeddings`	flag	`--no-embeddings`	—	Generate embeddings after load
`--embedding-model`	text	`text-embedding-3-small`	`EMBEDDING_MODEL`	LiteLLM embedding model id
`--embedding-dimensions`	int	auto-detected	`EMBEDDING_DIMENSIONS`	Vector dimensions
`--embedding-batch-size`	int	`100`	`EMBEDDING_BATCH_SIZE`	Nodes per embedding batch
`--dry-run`	flag	off	—	Print planned ingestion without touching Neo4j or BigQuery
`--neo4j-uri`	text	—	`NEO4J_URI`	Neo4j Bolt URI override
`--neo4j-username`	text	—	`NEO4J_USERNAME`	Neo4j username override
`--neo4j-database`	text	`neo4j`	`NEO4J_DATABASE`	Neo4j database name override

Basic
Date range
Include failed queries

neocarta bigquery logs \
  --dataset-id sales \
  --limit 500

neocarta bigquery logs \
  --dataset-id sales \
  --start-date 2025-01-01 \
  --end-date 2025-01-31 \
  --json

neocarta bigquery logs \
  --dataset-id sales \
  --include-failed-queries \
  --limit 1000

`neocarta csv`

Load metadata from a directory of CSV files into the Neo4j semantic graph using CSVConnector.

`neocarta csv ingest`

Ingests every entity CSV found in the directory (database_info.csv, schema_info.csv, table_info.csv, column_info.csv, column_references_info.csv, value_info.csv, query_info.csv, and glossary CSVs). Files that are absent are silently skipped. When --embeddings is enabled, description embeddings are generated and written back.

neocarta csv ingest [OPTIONS]

Flag	Type	Default	Env var	Description
`--csv-directory`	text	—	`CSV_DIRECTORY`	Directory containing the CSV metadata files
`--embeddings` / `--no-embeddings`	flag	`--no-embeddings`	—	Generate description embeddings after ingest
`--embedding-model`	text	`text-embedding-3-small`	`EMBEDDING_MODEL`	LiteLLM embedding model id
`--embedding-dimensions`	int	auto-detected	`EMBEDDING_DIMENSIONS`	Vector dimensions
`--embedding-batch-size`	int	`100`	`EMBEDDING_BATCH_SIZE`	Nodes per embedding batch
`--dry-run`	flag	off	—	Print planned ingestion without touching Neo4j
`--neo4j-uri`	text	—	`NEO4J_URI`	Neo4j Bolt URI override
`--neo4j-username`	text	—	`NEO4J_USERNAME`	Neo4j username override
`--neo4j-database`	text	`neo4j`	`NEO4J_DATABASE`	Neo4j database name override

# Basic ingest from a local directory
neocarta csv ingest --csv-directory ./datasets/csv

# With embeddings
neocarta csv ingest --csv-directory ./datasets/csv --embeddings

# Dry run to validate configuration
CSV_DIRECTORY=./datasets/csv neocarta csv ingest --dry-run --json

`neocarta dataplex`

Run Dataplex connectors against your Google Cloud catalog. Two verbs are available: schema for BigQuery schema from Dataplex, and glossary for the business glossary.

`neocarta dataplex schema`

Loads BigQuery schema metadata (Database, Schema, Table, Column) from the Dataplex Universal Catalog using DataplexSchemaConnector. When --embeddings is enabled, Table and Column description embeddings are generated via LiteLLM and written back.

neocarta dataplex schema [OPTIONS]

Flag	Type	Default	Env var	Description
`--project-id`	text	—	`GCP_PROJECT_ID`	GCP project ID
`--project-number`	text	—	`GCP_PROJECT_NUMBER`	GCP project number
`--dataplex-location`	text	—	`DATAPLEX_LOCATION`	Dataplex location (e.g. `us`)
`--dataset-id`	text	—	`BIGQUERY_DATASET_ID`	BigQuery dataset to ingest
`--embeddings` / `--no-embeddings`	flag	`--no-embeddings`	—	Generate description embeddings after load
`--embedding-model`	text	`text-embedding-3-small`	`EMBEDDING_MODEL`	LiteLLM embedding model id
`--embedding-dimensions`	int	auto-detected	`EMBEDDING_DIMENSIONS`	Vector dimensions
`--embedding-batch-size`	int	`100`	`EMBEDDING_BATCH_SIZE`	Nodes per embedding batch
`--dry-run`	flag	off	—	Print planned ingestion without touching Neo4j or Dataplex
`--neo4j-uri`	text	—	`NEO4J_URI`	Neo4j Bolt URI override
`--neo4j-username`	text	—	`NEO4J_USERNAME`	Neo4j username override
`--neo4j-database`	text	`neo4j`	`NEO4J_DATABASE`	Neo4j database name override

neocarta dataplex schema \
  --project-id my-proj \
  --project-number 123456789 \
  --dataplex-location us \
  --dataset-id sales \
  --embeddings

`neocarta dataplex glossary`

Loads the Dataplex business glossary (Glossary, Category, BusinessTerm) plus their relationships using DataplexGlossaryConnector. With --entry-links (the default), it also loads catalog↔glossary entry links as (:Column|:Table)-[:TAGGED_WITH]->(:BusinessTerm) edges. These edges attach to existing schema nodes, so run dataplex schema first for the tags to land correctly.

neocarta dataplex glossary [OPTIONS]

Flag	Type	Default	Env var	Description
`--project-id`	text	—	`GCP_PROJECT_ID`	GCP project ID
`--project-number`	text	—	`GCP_PROJECT_NUMBER`	GCP project number
`--dataplex-location`	text	—	`DATAPLEX_LOCATION`	Dataplex location (e.g. `us`)
`--entry-links` / `--no-entry-links`	flag	`--entry-links`	—	Load `TAGGED_WITH` catalog entry links (requires schema nodes to exist)
`--embeddings` / `--no-embeddings`	flag	`--no-embeddings`	—	Generate `BusinessTerm` description embeddings after load
`--embedding-model`	text	`text-embedding-3-small`	`EMBEDDING_MODEL`	LiteLLM embedding model id
`--embedding-dimensions`	int	auto-detected	`EMBEDDING_DIMENSIONS`	Vector dimensions
`--embedding-batch-size`	int	`100`	`EMBEDDING_BATCH_SIZE`	Nodes per embedding batch
`--dry-run`	flag	off	—	Print planned ingestion without touching Neo4j or Dataplex
`--neo4j-uri`	text	—	`NEO4J_URI`	Neo4j Bolt URI override
`--neo4j-username`	text	—	`NEO4J_USERNAME`	Neo4j username override
`--neo4j-database`	text	`neo4j`	`NEO4J_DATABASE`	Neo4j database name override

Run neocarta dataplex schema before neocarta dataplex glossary when --entry-links is enabled. Entry links attach TAGGED_WITH edges to schema nodes — if those nodes don’t exist yet, the tags have nothing to land on.

# Load glossary with entry links (default)
neocarta dataplex glossary \
  --project-id my-proj \
  --project-number 123456789 \
  --dataplex-location us \
  --embeddings

# Load glossary content only, skip entry link REST round-trips
neocarta dataplex glossary \
  --project-id my-proj \
  --project-number 123456789 \
  --dataplex-location us \
  --no-entry-links

`neocarta jdbc`

Extract schema from any JDBC-accessible relational database via SchemaCrawler.

`neocarta jdbc schema`

Shells out to SchemaCrawler (Java) to read Database, Schema, Table, Column, and foreign-key references from any JDBC-accessible database, then loads them into Neo4j using JdbcSchemaConnector. Works against PostgreSQL, MySQL, SQL Server, Oracle, and any other JDBC-compatible source.

neocarta jdbc schema [OPTIONS]

Requires Java 11+, a SchemaCrawler distribution JAR, and a JDBC driver JAR installed on the host. The database password is read only from JDBC_PASSWORD — it is never a CLI flag — to keep it out of shell history and the process list.

Flag	Type	Default	Env var	Description
`--jdbc-url`	text	—	`JDBC_URL`	JDBC connection URL (e.g. `jdbc:postgresql://host:5432/mydb`)
`--jdbc-driver`	text	—	`JDBC_DRIVER`	Fully-qualified JDBC driver class (e.g. `org.postgresql.Driver`)
`--jdbc-driver-jar`	text	—	`JDBC_DRIVER_JAR`	Filesystem path to the JDBC driver JAR
`--schemacrawler-jar`	text	—	`SCHEMACRAWLER_JAR`	Path or classpath glob to the SchemaCrawler distribution JARs
`--db-user`	text	—	`JDBC_USER`	Database username
`--source-database-name`	text	derived from URL	`JDBC_SOURCE_DATABASE_NAME`	Name for the graph `Database` node (required for Oracle SID, SQL Server URLs)
`--platform`	text	—	`JDBC_PLATFORM`	Hosting platform for the graph `Database` node (e.g. `AWS_RDS`)
`--service`	text	SchemaCrawler-reported	`JDBC_SERVICE`	Database service/engine for the graph `Database` node
`--timeout`	int	`120`	`JDBC_TIMEOUT`	Max seconds to wait for the SchemaCrawler subprocess
`--schema`	text (repeatable)	all schemas	—	Schema name to include; repeatable for multiple schemas
`--embeddings` / `--no-embeddings`	flag	`--no-embeddings`	—	Generate description embeddings after ingest
`--embedding-model`	text	`text-embedding-3-small`	`EMBEDDING_MODEL`	LiteLLM embedding model id
`--embedding-dimensions`	int	auto-detected	`EMBEDDING_DIMENSIONS`	Vector dimensions
`--embedding-batch-size`	int	`100`	`EMBEDDING_BATCH_SIZE`	Nodes per embedding batch
`--dry-run`	flag	off	—	Print planned ingestion without touching Neo4j or the source database
`--neo4j-uri`	text	—	`NEO4J_URI`	Neo4j Bolt URI override
`--neo4j-username`	text	—	`NEO4J_USERNAME`	Neo4j username override
`--neo4j-database`	text	`neo4j`	`NEO4J_DATABASE`	Neo4j database name override

PostgreSQL
Scoped schemas
Dry run

JDBC_PASSWORD=secret \
neocarta jdbc schema \
  --jdbc-url jdbc:postgresql://localhost:5432/sales \
  --jdbc-driver org.postgresql.Driver \
  --jdbc-driver-jar ./drivers/postgresql.jar \
  --schemacrawler-jar './schemacrawler/lib/*' \
  --db-user analytics

neocarta jdbc schema \
  --jdbc-url jdbc:postgresql://localhost:5432/sales \
  --jdbc-driver org.postgresql.Driver \
  --jdbc-driver-jar ./drivers/postgresql.jar \
  --schemacrawler-jar './schemacrawler/lib/*' \
  --schema public \
  --schema sales \
  --embeddings

JDBC_URL=jdbc:postgresql://localhost:5432/sales \
neocarta jdbc schema --dry-run --json

`neocarta osi`

Run OSI (Open Semantic Interchange) connectors. Two verbs are available: ingest loads a YAML spec into Neo4j, and export reads an OsiSemanticModel back out to a YAML file.

`neocarta osi ingest`

Reads an OSI YAML spec from a local path or an HTTP(S) URL and loads its semantic model into Neo4j using OsiConnector. Ingests OsiSemanticModel, OsiTable, OsiColumn, Query, Metric, Join, and aspect nodes; synonyms in ai_context are upserted as BusinessTerm nodes (merged on name, deduplicating against catalog-derived terms from Dataplex).

neocarta osi ingest [OPTIONS]

Flag	Type	Default	Env var	Description
`--spec-source`	text	—	`OSI_SPEC_SOURCE`	Local filesystem path or HTTP(S) URL to the OSI YAML spec
`--embeddings` / `--no-embeddings`	flag	`--no-embeddings`	—	Generate description embeddings after ingest
`--embedding-model`	text	`text-embedding-3-small`	`EMBEDDING_MODEL`	LiteLLM embedding model id
`--embedding-dimensions`	int	auto-detected	`EMBEDDING_DIMENSIONS`	Vector dimensions
`--embedding-batch-size`	int	`100`	`EMBEDDING_BATCH_SIZE`	Nodes per embedding batch
`--dry-run`	flag	off	—	Print planned ingestion without touching Neo4j
`--neo4j-uri`	text	—	`NEO4J_URI`	Neo4j Bolt URI override
`--neo4j-username`	text	—	`NEO4J_USERNAME`	Neo4j username override
`--neo4j-database`	text	`neo4j`	`NEO4J_DATABASE`	Neo4j database name override

# Ingest from a local file
neocarta osi ingest --spec-source ./datasets/osi/acme_semantic_model.yaml

# Ingest from a URL with embeddings
neocarta osi ingest \
  --spec-source https://example.com/models/my_model.yaml \
  --embeddings

# Dry run
OSI_SPEC_SOURCE=./datasets/osi/acme_semantic_model.yaml \
neocarta osi ingest --dry-run --json

`neocarta osi export`

Exports an OsiSemanticModel subgraph from Neo4j back to a spec-compliant OSI YAML file using OsiConnector. Reads everything owned by the named model (tables, columns, metrics, joins, aspects) and serializes it with preserved column ordering, native ai_context structure, and literal-block JSON for custom extensions.

neocarta osi export [OPTIONS]

Flag	Type	Default	Env var	Description
`--semantic-model-name`	text	—	`OSI_SEMANTIC_MODEL_NAME`	Name of the `OsiSemanticModel` to export
`--output-path`	text	—	—	Destination path for the exported OSI YAML file (required)
`--dry-run`	flag	off	—	Print planned export without touching Neo4j
`--neo4j-uri`	text	—	`NEO4J_URI`	Neo4j Bolt URI override
`--neo4j-username`	text	—	`NEO4J_USERNAME`	Neo4j username override
`--neo4j-database`	text	`neo4j`	`NEO4J_DATABASE`	Neo4j database name override

If --semantic-model-name does not match any OsiSemanticModel in the graph, the command exits with code 3 (not_found). Use neocarta tool list-schemas to inspect available models.

# Export to a local file
neocarta osi export \
  --semantic-model-name acme_corp_model \
  --output-path ./acme.yaml

# Using env var for model name
OSI_SEMANTIC_MODEL_NAME=acme_corp_model \
neocarta osi export --output-path ./acme.yaml

# Dry run
neocarta osi export \
  --semantic-model-name acme_corp_model \
  --output-path ./acme.yaml \
  --dry-run --json

`neocarta query-log`

Parse a local query-log JSON file into Neo4j.

`neocarta query-log ingest`

Reads a local query-log JSON file (currently the BigQuery export format) and loads Query and CTE nodes plus the Database, Schema, Table, and Column structure and table/column references each query touches. No embeddings are generated — query-log nodes carry no descriptions to embed. This is distinct from neocarta bigquery logs, which pulls query logs live from the BigQuery Cloud Logging API. Use this command when you already have an exported log file on disk.

neocarta query-log ingest [OPTIONS]

Flag	Type	Default	Env var	Description
`--query-log-file`	text	—	`QUERY_LOG_FILE`	Path to the query-log JSON file
`--source`	text	`bigquery`	—	Source/format of the query-log file
`--dry-run`	flag	off	—	Print planned ingestion without reading the file or touching Neo4j
`--neo4j-uri`	text	—	`NEO4J_URI`	Neo4j Bolt URI override
`--neo4j-username`	text	—	`NEO4J_USERNAME`	Neo4j username override
`--neo4j-database`	text	`neo4j`	`NEO4J_DATABASE`	Neo4j database name override

# Ingest from a local file
neocarta query-log ingest --query-log-file ./query_logs.json

# Using env var
QUERY_LOG_FILE=./query_logs.json neocarta query-log ingest --dry-run --json

`neocarta databricks`

Load Databricks Unity Catalog governed-tag definitions into the Neo4j semantic graph.

`neocarta databricks tags`

Reads governed-tag definitions (tag policies) via the Databricks SDK — no SQL warehouse required — and loads GovernanceTagKey and GovernanceTagValue nodes with HAS_VALUE_OPTION edges. Platform/partner-managed tags (matching --system-prefixes) are excluded by default. When --embeddings is enabled, GovernanceTagKey description embeddings are generated via LiteLLM and written back.

neocarta databricks tags [OPTIONS]

Requires pip install "neocarta[databricks]". The workspace access token is read only from DATABRICKS_TOKEN — it is never a CLI flag. If the databricks extra is not installed, the command exits with a clear usage_error (code 2) rather than a raw import traceback.

Flag	Type	Default	Env var	Description
`--host`	text	—	`DATABRICKS_HOST`	Databricks workspace URL (e.g. `https://dbc-xxxx.cloud.databricks.com`)
`--source`	text	metastore ID	—	Explicit namespace for governance-tag node IDs
`--system-prefixes`	text	`system.,class.,ai.,sap.`	—	Comma-separated tag-key prefixes treated as system tags and excluded by default
`--include-system-tags` / `--no-include-system-tags`	flag	`--no-include-system-tags`	—	Also ingest platform-managed tags matching `--system-prefixes`
`--embeddings` / `--no-embeddings`	flag	`--no-embeddings`	—	Generate `GovernanceTagKey` description embeddings after load
`--embedding-model`	text	`text-embedding-3-small`	`EMBEDDING_MODEL`	LiteLLM embedding model id
`--embedding-dimensions`	int	auto-detected	`EMBEDDING_DIMENSIONS`	Vector dimensions
`--embedding-batch-size`	int	`100`	`EMBEDDING_BATCH_SIZE`	Nodes per embedding batch
`--dry-run`	flag	off	—	Print planned ingestion without touching Neo4j or Databricks

# Load user-defined tags (excluding system tags)
DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com \
DATABRICKS_TOKEN=dapi... \
neocarta databricks tags

# Include system tags and generate embeddings
neocarta databricks tags \
  --host https://dbc-xxxx.cloud.databricks.com \
  --include-system-tags \
  --embeddings

# Dry run to verify configuration
neocarta databricks tags \
  --host https://dbc-xxxx.cloud.databricks.com \
  --dry-run --json

`neocarta mcp serve`

Starts the Neocarta MCP server over stdio, exposing the Neo4j semantic-layer retrieval tools to any MCP client (Claude Desktop, the bundled Text2SQL agent, or any other MCP-compatible host). This is the CLI-native equivalent of the standalone neocarta-mcp console script.

neocarta mcp serve [OPTIONS]

Requires pip install "neocarta[mcp]" in addition to [cli]. If the mcp extra is not installed, the command exits with a clear usage_error (code 2). Install both at once with pip install "neocarta[cli,mcp]".

Flag	Type	Default	Description
`--dry-run`	flag	off	Print the planned server configuration without starting it

The server communicates over stdio and owns stdout for the MCP protocol. The startup notice and all diagnostics go to stderr. Configuration is read from the standard NEO4J_* and EMBEDDING_* environment variables (or a .env file).

# Start the MCP server
neocarta mcp serve

# Verify configuration without starting
neocarta mcp serve --dry-run --json

For Claude Desktop integration, use the standalone neocarta-mcp console script via uvx rather than neocarta mcp serve. See the MCP server documentation for the full claude_desktop_config.json snippet.

Get Started

Connectors

Enrichment

MCP Server

CLI Reference

Neocarta CLI Connector Command Reference

`neocarta bigquery`

`neocarta bigquery schema`

`neocarta bigquery logs`

`neocarta csv`

`neocarta csv ingest`

`neocarta dataplex`

`neocarta dataplex schema`

`neocarta dataplex glossary`

`neocarta jdbc`

`neocarta jdbc schema`

`neocarta osi`

`neocarta osi ingest`

`neocarta osi export`

`neocarta query-log`

`neocarta query-log ingest`

`neocarta databricks`

`neocarta databricks tags`

`neocarta mcp serve`

Build docs developers (and LLMs) love

Get Started

Connectors

Enrichment

MCP Server

CLI Reference

Documentation Index

​neocarta bigquery

​neocarta bigquery schema

​neocarta bigquery logs

​neocarta csv

​neocarta csv ingest

​neocarta dataplex

​neocarta dataplex schema

​neocarta dataplex glossary

​neocarta jdbc

​neocarta jdbc schema

​neocarta osi

​neocarta osi ingest

​neocarta osi export

​neocarta query-log

​neocarta query-log ingest

​neocarta databricks

​neocarta databricks tags

​neocarta mcp serve

Build docs developers (and LLMs) love

`neocarta bigquery`

`neocarta bigquery schema`

`neocarta bigquery logs`

`neocarta csv`

`neocarta csv ingest`

`neocarta dataplex`

`neocarta dataplex schema`

`neocarta dataplex glossary`

`neocarta jdbc`

`neocarta jdbc schema`

`neocarta osi`

`neocarta osi ingest`

`neocarta osi export`

`neocarta query-log`

`neocarta query-log ingest`

`neocarta databricks`

`neocarta databricks tags`

`neocarta mcp serve`