Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt

Use this file to discover all available pages before exploring further.

The connector commands form the core of the Neocarta CLI — each one wraps a Python connector class behind a consistent noun-verb interface that handles credential resolution, Neo4j driver lifecycle, and optional embedding generation. This page documents every flag, its corresponding environment variable, and runnable examples for each command group.
All connector commands accept --neo4j-uri, --neo4j-username, and --neo4j-database to override the corresponding env vars per invocation. NEO4J_PASSWORD is always env-only. See Configuration for the full environment variable reference.

neocarta bigquery

Run BigQuery connectors against your data warehouse. Two verbs are available: schema for structural metadata and logs for query history.

neocarta bigquery schema

Extracts BigQuery schema metadata and loads Database, Schema, Table, and Column nodes plus their relationships into the Neo4j semantic graph. Reads from BigQuery INFORMATION_SCHEMA tables; primary and foreign keys must be defined there for column-level REFERENCES edges to be created.
neocarta bigquery schema [OPTIONS]
FlagTypeDefaultEnv varDescription
--project-idtextGCP_PROJECT_IDGCP project ID
--dataset-idtextBIGQUERY_DATASET_IDBigQuery dataset to ingest
--embeddings / --no-embeddingsflag--no-embeddingsGenerate description embeddings after load
--embedding-modeltexttext-embedding-3-smallEMBEDDING_MODELLiteLLM embedding model id
--embedding-dimensionsintauto-detectedEMBEDDING_DIMENSIONSVector dimensions (for models supporting truncation)
--embedding-batch-sizeint100EMBEDDING_BATCH_SIZENodes per embedding batch
--dry-runflagoffPrint planned ingestion without touching Neo4j or BigQuery
--neo4j-uritextNEO4J_URINeo4j Bolt URI override
--neo4j-usernametextNEO4J_USERNAMENeo4j username override
--neo4j-databasetextneo4jNEO4J_DATABASENeo4j database name override
neocarta bigquery schema \
  --project-id my-proj \
  --dataset-id sales

neocarta bigquery logs

Extracts query history from BigQuery INFORMATION_SCHEMA.JOBS_BY_PROJECT and loads Query and CTE nodes plus the table/column references each query touches. This is distinct from neocarta query-log ingest, which reads a local file rather than calling the Cloud Logging API.
neocarta bigquery logs [OPTIONS]
FlagTypeDefaultEnv varDescription
--project-idtextGCP_PROJECT_IDGCP project ID
--dataset-idtextBIGQUERY_DATASET_IDDataset whose queries to ingest
--regiontextregion-usBIGQUERY_REGIONBigQuery region for INFORMATION_SCHEMA queries
--start-datetext (ISO 8601)30 days agoInclusive start timestamp
--end-datetext (ISO 8601)nowInclusive end timestamp
--limitint100Maximum number of queries to extract
--include-failed-queriesflagoffRetain queries that errored (default: exclude)
--embeddings / --no-embeddingsflag--no-embeddingsGenerate embeddings after load
--embedding-modeltexttext-embedding-3-smallEMBEDDING_MODELLiteLLM embedding model id
--embedding-dimensionsintauto-detectedEMBEDDING_DIMENSIONSVector dimensions
--embedding-batch-sizeint100EMBEDDING_BATCH_SIZENodes per embedding batch
--dry-runflagoffPrint planned ingestion without touching Neo4j or BigQuery
--neo4j-uritextNEO4J_URINeo4j Bolt URI override
--neo4j-usernametextNEO4J_USERNAMENeo4j username override
--neo4j-databasetextneo4jNEO4J_DATABASENeo4j database name override
neocarta bigquery logs \
  --dataset-id sales \
  --limit 500

neocarta csv

Load metadata from a directory of CSV files into the Neo4j semantic graph using CSVConnector.

neocarta csv ingest

Ingests every entity CSV found in the directory (database_info.csv, schema_info.csv, table_info.csv, column_info.csv, column_references_info.csv, value_info.csv, query_info.csv, and glossary CSVs). Files that are absent are silently skipped. When --embeddings is enabled, description embeddings are generated and written back.
neocarta csv ingest [OPTIONS]
FlagTypeDefaultEnv varDescription
--csv-directorytextCSV_DIRECTORYDirectory containing the CSV metadata files
--embeddings / --no-embeddingsflag--no-embeddingsGenerate description embeddings after ingest
--embedding-modeltexttext-embedding-3-smallEMBEDDING_MODELLiteLLM embedding model id
--embedding-dimensionsintauto-detectedEMBEDDING_DIMENSIONSVector dimensions
--embedding-batch-sizeint100EMBEDDING_BATCH_SIZENodes per embedding batch
--dry-runflagoffPrint planned ingestion without touching Neo4j
--neo4j-uritextNEO4J_URINeo4j Bolt URI override
--neo4j-usernametextNEO4J_USERNAMENeo4j username override
--neo4j-databasetextneo4jNEO4J_DATABASENeo4j database name override
# Basic ingest from a local directory
neocarta csv ingest --csv-directory ./datasets/csv

# With embeddings
neocarta csv ingest --csv-directory ./datasets/csv --embeddings

# Dry run to validate configuration
CSV_DIRECTORY=./datasets/csv neocarta csv ingest --dry-run --json

neocarta dataplex

Run Dataplex connectors against your Google Cloud catalog. Two verbs are available: schema for BigQuery schema from Dataplex, and glossary for the business glossary.

neocarta dataplex schema

Loads BigQuery schema metadata (Database, Schema, Table, Column) from the Dataplex Universal Catalog using DataplexSchemaConnector. When --embeddings is enabled, Table and Column description embeddings are generated via LiteLLM and written back.
neocarta dataplex schema [OPTIONS]
FlagTypeDefaultEnv varDescription
--project-idtextGCP_PROJECT_IDGCP project ID
--project-numbertextGCP_PROJECT_NUMBERGCP project number
--dataplex-locationtextDATAPLEX_LOCATIONDataplex location (e.g. us)
--dataset-idtextBIGQUERY_DATASET_IDBigQuery dataset to ingest
--embeddings / --no-embeddingsflag--no-embeddingsGenerate description embeddings after load
--embedding-modeltexttext-embedding-3-smallEMBEDDING_MODELLiteLLM embedding model id
--embedding-dimensionsintauto-detectedEMBEDDING_DIMENSIONSVector dimensions
--embedding-batch-sizeint100EMBEDDING_BATCH_SIZENodes per embedding batch
--dry-runflagoffPrint planned ingestion without touching Neo4j or Dataplex
--neo4j-uritextNEO4J_URINeo4j Bolt URI override
--neo4j-usernametextNEO4J_USERNAMENeo4j username override
--neo4j-databasetextneo4jNEO4J_DATABASENeo4j database name override
neocarta dataplex schema \
  --project-id my-proj \
  --project-number 123456789 \
  --dataplex-location us \
  --dataset-id sales \
  --embeddings

neocarta dataplex glossary

Loads the Dataplex business glossary (Glossary, Category, BusinessTerm) plus their relationships using DataplexGlossaryConnector. With --entry-links (the default), it also loads catalog↔glossary entry links as (:Column|:Table)-[:TAGGED_WITH]->(:BusinessTerm) edges. These edges attach to existing schema nodes, so run dataplex schema first for the tags to land correctly.
neocarta dataplex glossary [OPTIONS]
FlagTypeDefaultEnv varDescription
--project-idtextGCP_PROJECT_IDGCP project ID
--project-numbertextGCP_PROJECT_NUMBERGCP project number
--dataplex-locationtextDATAPLEX_LOCATIONDataplex location (e.g. us)
--entry-links / --no-entry-linksflag--entry-linksLoad TAGGED_WITH catalog entry links (requires schema nodes to exist)
--embeddings / --no-embeddingsflag--no-embeddingsGenerate BusinessTerm description embeddings after load
--embedding-modeltexttext-embedding-3-smallEMBEDDING_MODELLiteLLM embedding model id
--embedding-dimensionsintauto-detectedEMBEDDING_DIMENSIONSVector dimensions
--embedding-batch-sizeint100EMBEDDING_BATCH_SIZENodes per embedding batch
--dry-runflagoffPrint planned ingestion without touching Neo4j or Dataplex
--neo4j-uritextNEO4J_URINeo4j Bolt URI override
--neo4j-usernametextNEO4J_USERNAMENeo4j username override
--neo4j-databasetextneo4jNEO4J_DATABASENeo4j database name override
Run neocarta dataplex schema before neocarta dataplex glossary when --entry-links is enabled. Entry links attach TAGGED_WITH edges to schema nodes — if those nodes don’t exist yet, the tags have nothing to land on.
# Load glossary with entry links (default)
neocarta dataplex glossary \
  --project-id my-proj \
  --project-number 123456789 \
  --dataplex-location us \
  --embeddings

# Load glossary content only, skip entry link REST round-trips
neocarta dataplex glossary \
  --project-id my-proj \
  --project-number 123456789 \
  --dataplex-location us \
  --no-entry-links

neocarta jdbc

Extract schema from any JDBC-accessible relational database via SchemaCrawler.

neocarta jdbc schema

Shells out to SchemaCrawler (Java) to read Database, Schema, Table, Column, and foreign-key references from any JDBC-accessible database, then loads them into Neo4j using JdbcSchemaConnector. Works against PostgreSQL, MySQL, SQL Server, Oracle, and any other JDBC-compatible source.
neocarta jdbc schema [OPTIONS]
Requires Java 11+, a SchemaCrawler distribution JAR, and a JDBC driver JAR installed on the host. The database password is read only from JDBC_PASSWORD — it is never a CLI flag — to keep it out of shell history and the process list.
FlagTypeDefaultEnv varDescription
--jdbc-urltextJDBC_URLJDBC connection URL (e.g. jdbc:postgresql://host:5432/mydb)
--jdbc-drivertextJDBC_DRIVERFully-qualified JDBC driver class (e.g. org.postgresql.Driver)
--jdbc-driver-jartextJDBC_DRIVER_JARFilesystem path to the JDBC driver JAR
--schemacrawler-jartextSCHEMACRAWLER_JARPath or classpath glob to the SchemaCrawler distribution JARs
--db-usertextJDBC_USERDatabase username
--source-database-nametextderived from URLJDBC_SOURCE_DATABASE_NAMEName for the graph Database node (required for Oracle SID, SQL Server URLs)
--platformtextJDBC_PLATFORMHosting platform for the graph Database node (e.g. AWS_RDS)
--servicetextSchemaCrawler-reportedJDBC_SERVICEDatabase service/engine for the graph Database node
--timeoutint120JDBC_TIMEOUTMax seconds to wait for the SchemaCrawler subprocess
--schematext (repeatable)all schemasSchema name to include; repeatable for multiple schemas
--embeddings / --no-embeddingsflag--no-embeddingsGenerate description embeddings after ingest
--embedding-modeltexttext-embedding-3-smallEMBEDDING_MODELLiteLLM embedding model id
--embedding-dimensionsintauto-detectedEMBEDDING_DIMENSIONSVector dimensions
--embedding-batch-sizeint100EMBEDDING_BATCH_SIZENodes per embedding batch
--dry-runflagoffPrint planned ingestion without touching Neo4j or the source database
--neo4j-uritextNEO4J_URINeo4j Bolt URI override
--neo4j-usernametextNEO4J_USERNAMENeo4j username override
--neo4j-databasetextneo4jNEO4J_DATABASENeo4j database name override
JDBC_PASSWORD=secret \
neocarta jdbc schema \
  --jdbc-url jdbc:postgresql://localhost:5432/sales \
  --jdbc-driver org.postgresql.Driver \
  --jdbc-driver-jar ./drivers/postgresql.jar \
  --schemacrawler-jar './schemacrawler/lib/*' \
  --db-user analytics

neocarta osi

Run OSI (Open Semantic Interchange) connectors. Two verbs are available: ingest loads a YAML spec into Neo4j, and export reads an OsiSemanticModel back out to a YAML file.

neocarta osi ingest

Reads an OSI YAML spec from a local path or an HTTP(S) URL and loads its semantic model into Neo4j using OsiConnector. Ingests OsiSemanticModel, OsiTable, OsiColumn, Query, Metric, Join, and aspect nodes; synonyms in ai_context are upserted as BusinessTerm nodes (merged on name, deduplicating against catalog-derived terms from Dataplex).
neocarta osi ingest [OPTIONS]
FlagTypeDefaultEnv varDescription
--spec-sourcetextOSI_SPEC_SOURCELocal filesystem path or HTTP(S) URL to the OSI YAML spec
--embeddings / --no-embeddingsflag--no-embeddingsGenerate description embeddings after ingest
--embedding-modeltexttext-embedding-3-smallEMBEDDING_MODELLiteLLM embedding model id
--embedding-dimensionsintauto-detectedEMBEDDING_DIMENSIONSVector dimensions
--embedding-batch-sizeint100EMBEDDING_BATCH_SIZENodes per embedding batch
--dry-runflagoffPrint planned ingestion without touching Neo4j
--neo4j-uritextNEO4J_URINeo4j Bolt URI override
--neo4j-usernametextNEO4J_USERNAMENeo4j username override
--neo4j-databasetextneo4jNEO4J_DATABASENeo4j database name override
# Ingest from a local file
neocarta osi ingest --spec-source ./datasets/osi/acme_semantic_model.yaml

# Ingest from a URL with embeddings
neocarta osi ingest \
  --spec-source https://example.com/models/my_model.yaml \
  --embeddings

# Dry run
OSI_SPEC_SOURCE=./datasets/osi/acme_semantic_model.yaml \
neocarta osi ingest --dry-run --json

neocarta osi export

Exports an OsiSemanticModel subgraph from Neo4j back to a spec-compliant OSI YAML file using OsiConnector. Reads everything owned by the named model (tables, columns, metrics, joins, aspects) and serializes it with preserved column ordering, native ai_context structure, and literal-block JSON for custom extensions.
neocarta osi export [OPTIONS]
FlagTypeDefaultEnv varDescription
--semantic-model-nametextOSI_SEMANTIC_MODEL_NAMEName of the OsiSemanticModel to export
--output-pathtextDestination path for the exported OSI YAML file (required)
--dry-runflagoffPrint planned export without touching Neo4j
--neo4j-uritextNEO4J_URINeo4j Bolt URI override
--neo4j-usernametextNEO4J_USERNAMENeo4j username override
--neo4j-databasetextneo4jNEO4J_DATABASENeo4j database name override
If --semantic-model-name does not match any OsiSemanticModel in the graph, the command exits with code 3 (not_found). Use neocarta tool list-schemas to inspect available models.
# Export to a local file
neocarta osi export \
  --semantic-model-name acme_corp_model \
  --output-path ./acme.yaml

# Using env var for model name
OSI_SEMANTIC_MODEL_NAME=acme_corp_model \
neocarta osi export --output-path ./acme.yaml

# Dry run
neocarta osi export \
  --semantic-model-name acme_corp_model \
  --output-path ./acme.yaml \
  --dry-run --json

neocarta query-log

Parse a local query-log JSON file into Neo4j.

neocarta query-log ingest

Reads a local query-log JSON file (currently the BigQuery export format) and loads Query and CTE nodes plus the Database, Schema, Table, and Column structure and table/column references each query touches. No embeddings are generated — query-log nodes carry no descriptions to embed. This is distinct from neocarta bigquery logs, which pulls query logs live from the BigQuery Cloud Logging API. Use this command when you already have an exported log file on disk.
neocarta query-log ingest [OPTIONS]
FlagTypeDefaultEnv varDescription
--query-log-filetextQUERY_LOG_FILEPath to the query-log JSON file
--sourcetextbigquerySource/format of the query-log file
--dry-runflagoffPrint planned ingestion without reading the file or touching Neo4j
--neo4j-uritextNEO4J_URINeo4j Bolt URI override
--neo4j-usernametextNEO4J_USERNAMENeo4j username override
--neo4j-databasetextneo4jNEO4J_DATABASENeo4j database name override
# Ingest from a local file
neocarta query-log ingest --query-log-file ./query_logs.json

# Using env var
QUERY_LOG_FILE=./query_logs.json neocarta query-log ingest --dry-run --json

neocarta databricks

Load Databricks Unity Catalog governed-tag definitions into the Neo4j semantic graph.

neocarta databricks tags

Reads governed-tag definitions (tag policies) via the Databricks SDK — no SQL warehouse required — and loads GovernanceTagKey and GovernanceTagValue nodes with HAS_VALUE_OPTION edges. Platform/partner-managed tags (matching --system-prefixes) are excluded by default. When --embeddings is enabled, GovernanceTagKey description embeddings are generated via LiteLLM and written back.
neocarta databricks tags [OPTIONS]
Requires pip install "neocarta[databricks]". The workspace access token is read only from DATABRICKS_TOKEN — it is never a CLI flag. If the databricks extra is not installed, the command exits with a clear usage_error (code 2) rather than a raw import traceback.
FlagTypeDefaultEnv varDescription
--hosttextDATABRICKS_HOSTDatabricks workspace URL (e.g. https://dbc-xxxx.cloud.databricks.com)
--sourcetextmetastore IDExplicit namespace for governance-tag node IDs
--system-prefixestextsystem.,class.,ai.,sap.Comma-separated tag-key prefixes treated as system tags and excluded by default
--include-system-tags / --no-include-system-tagsflag--no-include-system-tagsAlso ingest platform-managed tags matching --system-prefixes
--embeddings / --no-embeddingsflag--no-embeddingsGenerate GovernanceTagKey description embeddings after load
--embedding-modeltexttext-embedding-3-smallEMBEDDING_MODELLiteLLM embedding model id
--embedding-dimensionsintauto-detectedEMBEDDING_DIMENSIONSVector dimensions
--embedding-batch-sizeint100EMBEDDING_BATCH_SIZENodes per embedding batch
--dry-runflagoffPrint planned ingestion without touching Neo4j or Databricks
# Load user-defined tags (excluding system tags)
DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com \
DATABRICKS_TOKEN=dapi... \
neocarta databricks tags

# Include system tags and generate embeddings
neocarta databricks tags \
  --host https://dbc-xxxx.cloud.databricks.com \
  --include-system-tags \
  --embeddings

# Dry run to verify configuration
neocarta databricks tags \
  --host https://dbc-xxxx.cloud.databricks.com \
  --dry-run --json

neocarta mcp serve

Starts the Neocarta MCP server over stdio, exposing the Neo4j semantic-layer retrieval tools to any MCP client (Claude Desktop, the bundled Text2SQL agent, or any other MCP-compatible host). This is the CLI-native equivalent of the standalone neocarta-mcp console script.
neocarta mcp serve [OPTIONS]
Requires pip install "neocarta[mcp]" in addition to [cli]. If the mcp extra is not installed, the command exits with a clear usage_error (code 2). Install both at once with pip install "neocarta[cli,mcp]".
FlagTypeDefaultDescription
--dry-runflagoffPrint the planned server configuration without starting it
The server communicates over stdio and owns stdout for the MCP protocol. The startup notice and all diagnostics go to stderr. Configuration is read from the standard NEO4J_* and EMBEDDING_* environment variables (or a .env file).
# Start the MCP server
neocarta mcp serve

# Verify configuration without starting
neocarta mcp serve --dry-run --json
For Claude Desktop integration, use the standalone neocarta-mcp console script via uvx rather than neocarta mcp serve. See the MCP server documentation for the full claude_desktop_config.json snippet.

Build docs developers (and LLMs) love