Configuring the Neocarta CLI with Environment Variables

The CLI reads configuration from environment variables, with a .env file in the working directory loaded automatically by python-dotenv before any settings are resolved. CLI flags take the highest priority, overriding both the environment and built-in defaults. Two secrets — NEO4J_PASSWORD and JDBC_PASSWORD — are environment-only and are never accepted as flags, keeping them out of shell history and the process list.

Flag Override Precedence

Resolution order, highest priority first:

CLI flag — e.g. --project-id my-proj on the command line
Environment variable — e.g. GCP_PROJECT_ID=my-proj in the shell or .env
Built-in default — the value declared in the settings model (if any)

NEO4J_PASSWORD and JDBC_PASSWORD are environment-only. They are never accepted as CLI flags so the raw secrets stay out of shell history, ps output, and log files.

Environment Variable Reference

Neo4j Connection

Required by every connector command and tool command.

Variable	Required	Default	Description
`NEO4J_URI`	Yes	—	Neo4j Bolt URI (e.g. `bolt://localhost:7687` or `neo4j+s://xxx.databases.neo4j.io`)
`NEO4J_USERNAME`	Yes	—	Neo4j username
`NEO4J_PASSWORD`	Yes	—	Neo4j password (secret — env-only, never a flag)
`NEO4J_DATABASE`	No	`neo4j`	Target Neo4j database name

Embeddings

Required when --embeddings is passed to any connector command, or when running a vector or hybrid search tool.

Variable	Required	Default	Description
`EMBEDDING_MODEL`	No	`text-embedding-3-small`	LiteLLM embedding model id (e.g. `text-embedding-3-small`, `gemini-embedding-001`)
`EMBEDDING_DIMENSIONS`	No	auto-detected	Vector dimension for models that support truncation; silently ignored by models that don’t
`EMBEDDING_BATCH_SIZE`	No	`100`	Nodes embedded per provider request during CLI ingest runs. Not used by the MCP server.

Provider credentials — set whichever variable your EMBEDDING_MODEL requires:

Provider	Variable(s)
OpenAI	`OPENAI_API_KEY`
Gemini (AI Studio)	`GEMINI_API_KEY`
Cohere	`COHERE_API_KEY`
Anthropic	`ANTHROPIC_API_KEY`
Azure OpenAI	`AZURE_API_KEY`, `AZURE_API_BASE`, `AZURE_API_VERSION`
AWS Bedrock	`AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `AWS_REGION_NAME`
Vertex AI	`VERTEXAI_PROJECT`, `VERTEXAI_LOCATION` (+ Application Default Credentials)

The embedding vector dimension is auto-detected from the model on first use, and the Neo4j vector index is created at that size. Set EMBEDDING_DIMENSIONS only to request truncation on models that support it. If you switch to a model with a different dimension on an existing graph, drop the existing *_vector_index indexes and re-ingest with --embeddings.

Google Cloud / BigQuery

Required for neocarta bigquery * commands.

Variable	Required	Default	Description
`GCP_PROJECT_ID`	Yes (`bigquery `, `dataplex `)	—	GCP project ID (string, e.g. `my-project`)
`GCP_PROJECT_NUMBER`	Yes (`dataplex *`)	—	Numeric GCP project number
`BIGQUERY_DATASET_ID`	Yes (`bigquery *`)	—	BigQuery dataset ID to extract metadata from
`BIGQUERY_REGION`	No	`region-us`	Region string used when querying `INFORMATION_SCHEMA` job logs
`DATAPLEX_LOCATION`	Yes (`dataplex *`)	—	Dataplex location (e.g. `us`, `us-central1`)
`GOOGLE_APPLICATION_CREDENTIALS`	When outside a GCP-authenticated shell	—	Path to a GCP service-account JSON key file (secret)

GCP authentication uses Application Default Credentials. Run gcloud auth application-default login before using any BigQuery or Dataplex command when running outside a GCP-authenticated environment.

JDBC

Required for neocarta jdbc schema.

Variable	Required	Default	Description
`JDBC_URL`	Yes	—	JDBC connection URL (e.g. `jdbc:postgresql://host:5432/mydb`)
`JDBC_DRIVER`	Yes	—	Fully-qualified JDBC driver class (e.g. `org.postgresql.Driver`)
`JDBC_DRIVER_JAR`	Yes	—	Filesystem path to the JDBC driver JAR
`SCHEMACRAWLER_JAR`	Yes	—	Filesystem path or classpath glob to the SchemaCrawler distribution JARs
`JDBC_USER`	No	—	Database username
`JDBC_PASSWORD`	No	—	Database password (secret — env-only, never a flag)
`JDBC_SOURCE_DATABASE_NAME`	No	derived from URL	Name for the graph `Database` node; required when it cannot be derived from the URL (e.g. Oracle SID, SQL Server)
`JDBC_PLATFORM`	No	—	Hosting platform for the graph `Database` node (e.g. `AWS_RDS`)
`JDBC_SERVICE`	No	SchemaCrawler-reported	Database service/engine for the graph `Database` node
`JDBC_TIMEOUT`	No	`120`	Maximum seconds to wait for the SchemaCrawler subprocess

Databricks

Required for neocarta databricks tags. Also requires pip install "neocarta[databricks]".

Variable	Required	Default	Description
`DATABRICKS_HOST`	Yes	—	Databricks workspace URL (e.g. `https://dbc-xxxx.cloud.databricks.com`)
`DATABRICKS_TOKEN`	Yes	—	Databricks personal access token (secret — env-only, never a flag)

Connector-Specific

Variable	Required	Default	Used by	Description
`CSV_DIRECTORY`	Yes (`csv ingest`)	—	`neocarta csv ingest`	Directory containing CSV metadata files
`OSI_SPEC_SOURCE`	Yes (`osi ingest`)	—	`neocarta osi ingest`	Local filesystem path or HTTP(S) URL to the OSI YAML spec
`OSI_SEMANTIC_MODEL_NAME`	Yes (`osi export`)	—	`neocarta osi export`	Name of the `OsiSemanticModel` to export
`QUERY_LOG_FILE`	Yes (`query-log ingest`)	—	`neocarta query-log ingest`	Path to a local query-log JSON file

.env File

The CLI automatically loads a .env file from the current working directory using python-dotenv. Copy the example below to .env and fill in the values relevant to your setup.

# .env — Neocarta CLI configuration
# Copy this file, fill in the values you need, and save it as .env
# in the directory where you run neocarta commands.

# ── Neo4j (required for all commands) ─────────────────────────────────────────
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-neo4j-password
NEO4J_DATABASE=neo4j

# ── Embeddings ─────────────────────────────────────────────────────────────────
# Set the provider API key for your chosen embedding model.
OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=...
# COHERE_API_KEY=...
# AZURE_API_KEY=...
# AZURE_API_BASE=https://your-resource.openai.azure.com/
# AZURE_API_VERSION=2024-02-01

EMBEDDING_MODEL=text-embedding-3-small
# EMBEDDING_DIMENSIONS=1536   # optional: request truncation on supported models
# EMBEDDING_BATCH_SIZE=100    # optional: tune ingest throughput

# ── Google Cloud / BigQuery ────────────────────────────────────────────────────
GCP_PROJECT_ID=my-gcp-project
GCP_PROJECT_NUMBER=123456789
BIGQUERY_DATASET_ID=my_dataset
# BIGQUERY_REGION=region-us   # default: region-us

# ── Dataplex ───────────────────────────────────────────────────────────────────
# DATAPLEX_LOCATION=us

# ── JDBC ───────────────────────────────────────────────────────────────────────
# JDBC_URL=jdbc:postgresql://localhost:5432/mydb
# JDBC_DRIVER=org.postgresql.Driver
# JDBC_DRIVER_JAR=/path/to/postgresql.jar
# SCHEMACRAWLER_JAR=/path/to/schemacrawler/lib/*
# JDBC_USER=analytics
# JDBC_PASSWORD=your-db-password   # env-only, never a flag
# JDBC_SOURCE_DATABASE_NAME=       # needed for Oracle SID / SQL Server
# JDBC_PLATFORM=                   # e.g. AWS_RDS
# JDBC_SERVICE=                    # defaults to SchemaCrawler-reported product
# JDBC_TIMEOUT=120

# ── Databricks ─────────────────────────────────────────────────────────────────
# DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com
# DATABRICKS_TOKEN=dapi...   # env-only, never a flag

# ── Connector-specific ─────────────────────────────────────────────────────────
# CSV_DIRECTORY=./datasets/csv
# OSI_SPEC_SOURCE=./datasets/osi/acme_semantic_model.yaml
# OSI_SEMANTIC_MODEL_NAME=acme_corp_model
# QUERY_LOG_FILE=./query_logs.json

The .env file is loaded by python-dotenv before environment variables are read. Shell environment variables take precedence over .env values, which in turn take precedence over built-in defaults. CLI flags override everything except NEO4J_PASSWORD and JDBC_PASSWORD, which are always env-only.

Quick Reference by Command

Command	Required variables
Any connector or tool	`NEO4J_URI`, `NEO4J_USERNAME`, `NEO4J_PASSWORD`
`neocarta bigquery schema`	+ `GCP_PROJECT_ID`, `BIGQUERY_DATASET_ID`
`neocarta bigquery logs`	+ `GCP_PROJECT_ID`, `BIGQUERY_DATASET_ID`, `BIGQUERY_REGION`
`neocarta dataplex schema`	+ `GCP_PROJECT_ID`, `GCP_PROJECT_NUMBER`, `DATAPLEX_LOCATION`, `BIGQUERY_DATASET_ID`
`neocarta dataplex glossary`	+ `GCP_PROJECT_ID`, `GCP_PROJECT_NUMBER`, `DATAPLEX_LOCATION`
`neocarta jdbc schema`	+ `JDBC_URL`, `JDBC_DRIVER`, `JDBC_DRIVER_JAR`, `SCHEMACRAWLER_JAR`
`neocarta csv ingest`	+ `CSV_DIRECTORY`
`neocarta osi ingest`	+ `OSI_SPEC_SOURCE`
`neocarta osi export`	+ `OSI_SEMANTIC_MODEL_NAME`
`neocarta query-log ingest`	+ `QUERY_LOG_FILE`
`neocarta databricks tags`	+ `DATABRICKS_HOST`, `DATABRICKS_TOKEN`
Any command with `--embeddings`	+ provider key (e.g. `OPENAI_API_KEY`)
Vector / hybrid tool commands	+ `EMBEDDING_MODEL` + provider key

Get Started

Connectors

Enrichment

MCP Server

CLI Reference

Configuring the Neocarta CLI with Environment Variables

Flag Override Precedence

Environment Variable Reference

Neo4j Connection

Embeddings

Google Cloud / BigQuery

JDBC

Databricks

Connector-Specific

.env File

Quick Reference by Command

Build docs developers (and LLMs) love

Get Started

Connectors

Enrichment

MCP Server

CLI Reference

Documentation Index

​Flag Override Precedence

​Environment Variable Reference

​Neo4j Connection

​Embeddings

​Google Cloud / BigQuery

​JDBC

​Databricks

​Connector-Specific

​.env File

​Quick Reference by Command

Build docs developers (and LLMs) love

Flag Override Precedence

Environment Variable Reference

Neo4j Connection

Embeddings

Google Cloud / BigQuery

JDBC

Databricks

Connector-Specific

.env File

Quick Reference by Command