Documentation Index
Fetch the complete documentation index at: https://mintlify.com/neo4j-labs/neocarta/llms.txt
Use this file to discover all available pages before exploring further.
The CLI reads configuration from environment variables, with a .env file in the working directory loaded automatically by python-dotenv before any settings are resolved. CLI flags take the highest priority, overriding both the environment and built-in defaults. Two secrets — NEO4J_PASSWORD and JDBC_PASSWORD — are environment-only and are never accepted as flags, keeping them out of shell history and the process list.
Flag Override Precedence
Resolution order, highest priority first:
- CLI flag — e.g.
--project-id my-proj on the command line
- Environment variable — e.g.
GCP_PROJECT_ID=my-proj in the shell or .env
- Built-in default — the value declared in the settings model (if any)
NEO4J_PASSWORD and JDBC_PASSWORD are environment-only. They are never accepted as CLI flags so the raw secrets stay out of shell history, ps output, and log files.
Environment Variable Reference
Neo4j Connection
Required by every connector command and tool command.
| Variable | Required | Default | Description |
|---|
NEO4J_URI | Yes | — | Neo4j Bolt URI (e.g. bolt://localhost:7687 or neo4j+s://xxx.databases.neo4j.io) |
NEO4J_USERNAME | Yes | — | Neo4j username |
NEO4J_PASSWORD | Yes | — | Neo4j password (secret — env-only, never a flag) |
NEO4J_DATABASE | No | neo4j | Target Neo4j database name |
Embeddings
Required when --embeddings is passed to any connector command, or when running a vector or hybrid search tool.
| Variable | Required | Default | Description |
|---|
EMBEDDING_MODEL | No | text-embedding-3-small | LiteLLM embedding model id (e.g. text-embedding-3-small, gemini-embedding-001) |
EMBEDDING_DIMENSIONS | No | auto-detected | Vector dimension for models that support truncation; silently ignored by models that don’t |
EMBEDDING_BATCH_SIZE | No | 100 | Nodes embedded per provider request during CLI ingest runs. Not used by the MCP server. |
Provider credentials — set whichever variable your EMBEDDING_MODEL requires:
| Provider | Variable(s) |
|---|
| OpenAI | OPENAI_API_KEY |
| Gemini (AI Studio) | GEMINI_API_KEY |
| Cohere | COHERE_API_KEY |
| Anthropic | ANTHROPIC_API_KEY |
| Azure OpenAI | AZURE_API_KEY, AZURE_API_BASE, AZURE_API_VERSION |
| AWS Bedrock | AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION_NAME |
| Vertex AI | VERTEXAI_PROJECT, VERTEXAI_LOCATION (+ Application Default Credentials) |
The embedding vector dimension is auto-detected from the model on first use, and the Neo4j vector index is created at that size. Set EMBEDDING_DIMENSIONS only to request truncation on models that support it. If you switch to a model with a different dimension on an existing graph, drop the existing *_vector_index indexes and re-ingest with --embeddings.
Google Cloud / BigQuery
Required for neocarta bigquery * commands.
| Variable | Required | Default | Description |
|---|
GCP_PROJECT_ID | Yes (bigquery *, dataplex *) | — | GCP project ID (string, e.g. my-project) |
GCP_PROJECT_NUMBER | Yes (dataplex *) | — | Numeric GCP project number |
BIGQUERY_DATASET_ID | Yes (bigquery *) | — | BigQuery dataset ID to extract metadata from |
BIGQUERY_REGION | No | region-us | Region string used when querying INFORMATION_SCHEMA job logs |
DATAPLEX_LOCATION | Yes (dataplex *) | — | Dataplex location (e.g. us, us-central1) |
GOOGLE_APPLICATION_CREDENTIALS | When outside a GCP-authenticated shell | — | Path to a GCP service-account JSON key file (secret) |
GCP authentication uses Application Default Credentials. Run gcloud auth application-default login before using any BigQuery or Dataplex command when running outside a GCP-authenticated environment.
JDBC
Required for neocarta jdbc schema.
| Variable | Required | Default | Description |
|---|
JDBC_URL | Yes | — | JDBC connection URL (e.g. jdbc:postgresql://host:5432/mydb) |
JDBC_DRIVER | Yes | — | Fully-qualified JDBC driver class (e.g. org.postgresql.Driver) |
JDBC_DRIVER_JAR | Yes | — | Filesystem path to the JDBC driver JAR |
SCHEMACRAWLER_JAR | Yes | — | Filesystem path or classpath glob to the SchemaCrawler distribution JARs |
JDBC_USER | No | — | Database username |
JDBC_PASSWORD | No | — | Database password (secret — env-only, never a flag) |
JDBC_SOURCE_DATABASE_NAME | No | derived from URL | Name for the graph Database node; required when it cannot be derived from the URL (e.g. Oracle SID, SQL Server) |
JDBC_PLATFORM | No | — | Hosting platform for the graph Database node (e.g. AWS_RDS) |
JDBC_SERVICE | No | SchemaCrawler-reported | Database service/engine for the graph Database node |
JDBC_TIMEOUT | No | 120 | Maximum seconds to wait for the SchemaCrawler subprocess |
Databricks
Required for neocarta databricks tags. Also requires pip install "neocarta[databricks]".
| Variable | Required | Default | Description |
|---|
DATABRICKS_HOST | Yes | — | Databricks workspace URL (e.g. https://dbc-xxxx.cloud.databricks.com) |
DATABRICKS_TOKEN | Yes | — | Databricks personal access token (secret — env-only, never a flag) |
Connector-Specific
| Variable | Required | Default | Used by | Description |
|---|
CSV_DIRECTORY | Yes (csv ingest) | — | neocarta csv ingest | Directory containing CSV metadata files |
OSI_SPEC_SOURCE | Yes (osi ingest) | — | neocarta osi ingest | Local filesystem path or HTTP(S) URL to the OSI YAML spec |
OSI_SEMANTIC_MODEL_NAME | Yes (osi export) | — | neocarta osi export | Name of the OsiSemanticModel to export |
QUERY_LOG_FILE | Yes (query-log ingest) | — | neocarta query-log ingest | Path to a local query-log JSON file |
.env File
The CLI automatically loads a .env file from the current working directory using python-dotenv. Copy the example below to .env and fill in the values relevant to your setup.
# .env — Neocarta CLI configuration
# Copy this file, fill in the values you need, and save it as .env
# in the directory where you run neocarta commands.
# ── Neo4j (required for all commands) ─────────────────────────────────────────
NEO4J_URI=bolt://localhost:7687
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=your-neo4j-password
NEO4J_DATABASE=neo4j
# ── Embeddings ─────────────────────────────────────────────────────────────────
# Set the provider API key for your chosen embedding model.
OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=...
# COHERE_API_KEY=...
# AZURE_API_KEY=...
# AZURE_API_BASE=https://your-resource.openai.azure.com/
# AZURE_API_VERSION=2024-02-01
EMBEDDING_MODEL=text-embedding-3-small
# EMBEDDING_DIMENSIONS=1536 # optional: request truncation on supported models
# EMBEDDING_BATCH_SIZE=100 # optional: tune ingest throughput
# ── Google Cloud / BigQuery ────────────────────────────────────────────────────
GCP_PROJECT_ID=my-gcp-project
GCP_PROJECT_NUMBER=123456789
BIGQUERY_DATASET_ID=my_dataset
# BIGQUERY_REGION=region-us # default: region-us
# ── Dataplex ───────────────────────────────────────────────────────────────────
# DATAPLEX_LOCATION=us
# ── JDBC ───────────────────────────────────────────────────────────────────────
# JDBC_URL=jdbc:postgresql://localhost:5432/mydb
# JDBC_DRIVER=org.postgresql.Driver
# JDBC_DRIVER_JAR=/path/to/postgresql.jar
# SCHEMACRAWLER_JAR=/path/to/schemacrawler/lib/*
# JDBC_USER=analytics
# JDBC_PASSWORD=your-db-password # env-only, never a flag
# JDBC_SOURCE_DATABASE_NAME= # needed for Oracle SID / SQL Server
# JDBC_PLATFORM= # e.g. AWS_RDS
# JDBC_SERVICE= # defaults to SchemaCrawler-reported product
# JDBC_TIMEOUT=120
# ── Databricks ─────────────────────────────────────────────────────────────────
# DATABRICKS_HOST=https://dbc-xxxx.cloud.databricks.com
# DATABRICKS_TOKEN=dapi... # env-only, never a flag
# ── Connector-specific ─────────────────────────────────────────────────────────
# CSV_DIRECTORY=./datasets/csv
# OSI_SPEC_SOURCE=./datasets/osi/acme_semantic_model.yaml
# OSI_SEMANTIC_MODEL_NAME=acme_corp_model
# QUERY_LOG_FILE=./query_logs.json
The .env file is loaded by python-dotenv before environment variables are read. Shell environment variables take precedence over .env values, which in turn take precedence over built-in defaults. CLI flags override everything except NEO4J_PASSWORD and JDBC_PASSWORD, which are always env-only.
Quick Reference by Command
| Command | Required variables |
|---|
| Any connector or tool | NEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD |
neocarta bigquery schema | + GCP_PROJECT_ID, BIGQUERY_DATASET_ID |
neocarta bigquery logs | + GCP_PROJECT_ID, BIGQUERY_DATASET_ID, BIGQUERY_REGION |
neocarta dataplex schema | + GCP_PROJECT_ID, GCP_PROJECT_NUMBER, DATAPLEX_LOCATION, BIGQUERY_DATASET_ID |
neocarta dataplex glossary | + GCP_PROJECT_ID, GCP_PROJECT_NUMBER, DATAPLEX_LOCATION |
neocarta jdbc schema | + JDBC_URL, JDBC_DRIVER, JDBC_DRIVER_JAR, SCHEMACRAWLER_JAR |
neocarta csv ingest | + CSV_DIRECTORY |
neocarta osi ingest | + OSI_SPEC_SOURCE |
neocarta osi export | + OSI_SEMANTIC_MODEL_NAME |
neocarta query-log ingest | + QUERY_LOG_FILE |
neocarta databricks tags | + DATABRICKS_HOST, DATABRICKS_TOKEN |
Any command with --embeddings | + provider key (e.g. OPENAI_API_KEY) |
| Vector / hybrid tool commands | + EMBEDDING_MODEL + provider key |