Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt

Use this file to discover all available pages before exploring further.

The CLI reads configuration from environment variables, with a .env file in the working directory loaded automatically by python-dotenv before any settings are resolved. CLI flags take the highest priority, overriding both the environment and built-in defaults. Two secrets — NEO4J_PASSWORD and JDBC_PASSWORD — are environment-only and are never accepted as flags, keeping them out of shell history and the process list.

Flag Override Precedence

Resolution order, highest priority first:
  1. CLI flag — e.g. --project-id my-proj on the command line
  2. Environment variable — e.g. GCP_PROJECT_ID=my-proj in the shell or .env
  3. Built-in default — the value declared in the settings model (if any)
NEO4J_PASSWORD and JDBC_PASSWORD are environment-only. They are never accepted as CLI flags so the raw secrets stay out of shell history, ps output, and log files.

Environment Variable Reference

Neo4j Connection

Required by every connector command and tool command.
VariableRequiredDefaultDescription
NEO4J_URIYesNeo4j Bolt URI (e.g. bolt://localhost:7687 or neo4j+s://xxx.databases.neo4j.io)
NEO4J_USERNAMEYesNeo4j username
NEO4J_PASSWORDYesNeo4j password (secret — env-only, never a flag)
NEO4J_DATABASENoneo4jTarget Neo4j database name

Embeddings

Required when --embeddings is passed to any connector command, or when running a vector or hybrid search tool.
VariableRequiredDefaultDescription
EMBEDDING_MODELNotext-embedding-3-smallLiteLLM embedding model id (e.g. text-embedding-3-small, gemini-embedding-001)
EMBEDDING_DIMENSIONSNoauto-detectedVector dimension for models that support truncation; silently ignored by models that don’t
EMBEDDING_BATCH_SIZENo100Nodes embedded per provider request during CLI ingest runs. Not used by the MCP server.
Provider credentials — set whichever variable your EMBEDDING_MODEL requires:
ProviderVariable(s)
OpenAIOPENAI_API_KEY
Gemini (AI Studio)GEMINI_API_KEY
CohereCOHERE_API_KEY
AnthropicANTHROPIC_API_KEY
Azure OpenAIAZURE_API_KEY, AZURE_API_BASE, AZURE_API_VERSION
AWS BedrockAWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME
Vertex AIVERTEXAI_PROJECT, VERTEXAI_LOCATION (+ Application Default Credentials)
The embedding vector dimension is auto-detected from the model on first use, and the Neo4j vector index is created at that size. Set EMBEDDING_DIMENSIONS only to request truncation on models that support it. If you switch to a model with a different dimension on an existing graph, drop the existing *_vector_index indexes and re-ingest with --embeddings.

Google Cloud / BigQuery

Required for neocarta bigquery * commands.
VariableRequiredDefaultDescription
GCP_PROJECT_IDYes (bigquery *, dataplex *)GCP project ID (string, e.g. my-project)
GCP_PROJECT_NUMBERYes (dataplex *)Numeric GCP project number
BIGQUERY_DATASET_IDYes (bigquery *)BigQuery dataset ID to extract metadata from
BIGQUERY_REGIONNoregion-usRegion string used when querying INFORMATION_SCHEMA job logs
DATAPLEX_LOCATIONYes (dataplex *)Dataplex location (e.g. us, us-central1)
GOOGLE_APPLICATION_CREDENTIALSWhen outside a GCP-authenticated shellPath to a GCP service-account JSON key file (secret)
GCP authentication uses Application Default Credentials. Run gcloud auth application-default login before using any BigQuery or Dataplex command when running outside a GCP-authenticated environment.

JDBC

Required for neocarta jdbc schema.
VariableRequiredDefaultDescription
JDBC_URLYesJDBC connection URL (e.g. jdbc:postgresql://host:5432/mydb)
JDBC_DRIVERYesFully-qualified JDBC driver class (e.g. org.postgresql.Driver)
JDBC_DRIVER_JARYesFilesystem path to the JDBC driver JAR
SCHEMACRAWLER_JARYesFilesystem path or classpath glob to the SchemaCrawler distribution JARs
JDBC_USERNoDatabase username
JDBC_PASSWORDNoDatabase password (secret — env-only, never a flag)
JDBC_SOURCE_DATABASE_NAMENoderived from URLName for the graph Database node; required when it cannot be derived from the URL (e.g. Oracle SID, SQL Server)
JDBC_PLATFORMNoHosting platform for the graph Database node (e.g. AWS_RDS)
JDBC_SERVICENoSchemaCrawler-reportedDatabase service/engine for the graph Database node
JDBC_TIMEOUTNo120Maximum seconds to wait for the SchemaCrawler subprocess

Databricks

Required for neocarta databricks tags. Also requires pip install "neocarta[databricks]".
VariableRequiredDefaultDescription
DATABRICKS_HOSTYesDatabricks workspace URL (e.g. https://dbc-xxxx.cloud.databricks.com)
DATABRICKS_TOKENYesDatabricks personal access token (secret — env-only, never a flag)

Connector-Specific

VariableRequiredDefaultUsed byDescription
CSV_DIRECTORYYes (csv ingest)neocarta csv ingestDirectory containing CSV metadata files
OSI_SPEC_SOURCEYes (osi ingest)neocarta osi ingestLocal filesystem path or HTTP(S) URL to the OSI YAML spec
OSI_SEMANTIC_MODEL_NAMEYes (osi export)neocarta osi exportName of the OsiSemanticModel to export
QUERY_LOG_FILEYes (query-log ingest)neocarta query-log ingestPath to a local query-log JSON file

.env File

The CLI automatically loads a .env file from the current working directory using python-dotenv. Copy the example below to .env and fill in the values relevant to your setup.
# .env — Neocarta CLI configuration
# Copy this file, fill in the values you need, and save it as .env
# in the directory where you run neocarta commands.

# ── Neo4j (required for all commands) ─────────────────────────────────────────
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-neo4j-password
NEO4J_DATABASE=neo4j

# ── Embeddings ─────────────────────────────────────────────────────────────────
# Set the provider API key for your chosen embedding model.
OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=...
# COHERE_API_KEY=...
# AZURE_API_KEY=...
# AZURE_API_BASE=https://your-resource.openai.azure.com/
# AZURE_API_VERSION=2024-02-01

EMBEDDING_MODEL=text-embedding-3-small
# EMBEDDING_DIMENSIONS=1536   # optional: request truncation on supported models
# EMBEDDING_BATCH_SIZE=100    # optional: tune ingest throughput

# ── Google Cloud / BigQuery ────────────────────────────────────────────────────
GCP_PROJECT_ID=my-gcp-project
GCP_PROJECT_NUMBER=123456789
BIGQUERY_DATASET_ID=my_dataset
# BIGQUERY_REGION=region-us   # default: region-us

# ── Dataplex ───────────────────────────────────────────────────────────────────
# DATAPLEX_LOCATION=us

# ── JDBC ───────────────────────────────────────────────────────────────────────
# JDBC_URL=jdbc:postgresql://localhost:5432/mydb
# JDBC_DRIVER=org.postgresql.Driver
# JDBC_DRIVER_JAR=/path/to/postgresql.jar
# SCHEMACRAWLER_JAR=/path/to/schemacrawler/lib/*
# JDBC_USER=analytics
# JDBC_PASSWORD=your-db-password   # env-only, never a flag
# JDBC_SOURCE_DATABASE_NAME=       # needed for Oracle SID / SQL Server
# JDBC_PLATFORM=                   # e.g. AWS_RDS
# JDBC_SERVICE=                    # defaults to SchemaCrawler-reported product
# JDBC_TIMEOUT=120

# ── Databricks ─────────────────────────────────────────────────────────────────
# DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com
# DATABRICKS_TOKEN=dapi...   # env-only, never a flag

# ── Connector-specific ─────────────────────────────────────────────────────────
# CSV_DIRECTORY=./datasets/csv
# OSI_SPEC_SOURCE=./datasets/osi/acme_semantic_model.yaml
# OSI_SEMANTIC_MODEL_NAME=acme_corp_model
# QUERY_LOG_FILE=./query_logs.json
The .env file is loaded by python-dotenv before environment variables are read. Shell environment variables take precedence over .env values, which in turn take precedence over built-in defaults. CLI flags override everything except NEO4J_PASSWORD and JDBC_PASSWORD, which are always env-only.

Quick Reference by Command

CommandRequired variables
Any connector or toolNEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD
neocarta bigquery schema+ GCP_PROJECT_ID, BIGQUERY_DATASET_ID
neocarta bigquery logs+ GCP_PROJECT_ID, BIGQUERY_DATASET_ID, BIGQUERY_REGION
neocarta dataplex schema+ GCP_PROJECT_ID, GCP_PROJECT_NUMBER, DATAPLEX_LOCATION, BIGQUERY_DATASET_ID
neocarta dataplex glossary+ GCP_PROJECT_ID, GCP_PROJECT_NUMBER, DATAPLEX_LOCATION
neocarta jdbc schema+ JDBC_URL, JDBC_DRIVER, JDBC_DRIVER_JAR, SCHEMACRAWLER_JAR
neocarta csv ingest+ CSV_DIRECTORY
neocarta osi ingest+ OSI_SPEC_SOURCE
neocarta osi export+ OSI_SEMANTIC_MODEL_NAME
neocarta query-log ingest+ QUERY_LOG_FILE
neocarta databricks tags+ DATABRICKS_HOST, DATABRICKS_TOKEN
Any command with --embeddings+ provider key (e.g. OPENAI_API_KEY)
Vector / hybrid tool commands+ EMBEDDING_MODEL + provider key

Build docs developers (and LLMs) love