CSV Connector: Load Metadata from CSV Files

The CSV connector loads metadata into the Neo4j semantic graph from a directory of structured CSV files. It is useful when your data source doesn’t have a dedicated Neocarta connector, when you want to manually curate or augment metadata, or when migrating metadata from another tool. A sample e-commerce dataset is included in datasets/csv/ so you can test the connector immediately.

Import

from neocarta.connectors.csv import CSVConnector

Parameters

csv_directory

str

required

Filesystem path to the directory containing CSV files.

neo4j_driver

neo4j.Driver

required

Connected Neo4j driver instance.

database_name

str

default:"neo4j"

Target Neo4j database name.

csv_file_map

dict[str, str]

Optional mapping from NodeLabel / RelationshipType enum members (or their string values) to custom CSV filenames. Merges with the default filename map, allowing partial overrides. See Custom File Mapping below.

`ingest()` Parameters

include_nodes

list[NodeLabel]

Node types to load. When omitted, all available CSV files are loaded. Accepts NodeLabel enum members (recommended) or their exact string values, e.g. "Database".

include_relationships

list[RelationshipType]

Relationship types to load. When omitted, all available CSV files are loaded. Accepts RelationshipType enum members (recommended) or their exact string values, e.g. "HAS_SCHEMA".

CSV files that don’t exist on disk are skipped with a warning rather than raising an error, so you can start with a minimal set and add files incrementally.

Default CSV Filenames

The connector expects CSV files with the following default names. Any file can be renamed using csv_file_map.

CSV File	Node / Relationship
`database_info.csv`	`Database`
`schema_info.csv`	`Schema`
`table_info.csv`	`Table`
`column_info.csv`	`Column`
`column_references_info.csv`	`REFERENCES`
`value_info.csv`	`Value`
`query_info.csv`	`Query`
`query_table_info.csv`	`USES_TABLE`
`query_column_info.csv`	`USES_COLUMN`
`glossary_info.csv`	`Glossary`
`category_info.csv`	`Category`
`business_term_info.csv`	`BusinessTerm`

ID Strategy

Entity IDs are computed from name columns using a dot-separated hierarchy. Choose one strategy and apply it consistently across all files.

Auto-generated (recommended)
Explicit IDs

Omit all *_id columns. IDs are built automatically:

Entity	Required columns	Generated ID
Database	`database_name`	`{database_name}`
Schema	`database_name`, `schema_name`	`{database_name}.{schema_name}`
Table	`database_name`, `schema_name`, `table_name`	`{database_name}.{schema_name}.{table_name}`
Column	`database_name`, `schema_name`, `table_name`, `column_name`	`{database_name}.{schema_name}.{table_name}.{column_name}`
Glossary	`glossary_name`	`{glossary_name}`
Category	`glossary_name`, `category_name`	`{glossary_name}.{category_name}`
BusinessTerm	`glossary_name`, `category_name`, `term_name`	`{glossary_name}.{category_name}.{term_name}`

Supply an *_id column (database_id, schema_id, table_id, column_id, glossary_id, category_id, business_term_id) in every CSV file in the hierarchy. IDs are used as-is and must be consistent across files.

Mixing explicit and auto-generated IDs across files in the same hierarchy is not supported and will produce inconsistent node references. If loading glossary data from both the CSV connector and the Dataplex connector into the same graph, you must supply explicit IDs in the CSV that match the Dataplex resource paths.

Code Example

import os
from dotenv import load_dotenv
from neo4j import GraphDatabase
from neocarta import NodeLabel, RelationshipType
from neocarta.connectors.csv import CSVConnector

load_dotenv()

neo4j_driver = GraphDatabase.driver(
    uri=os.getenv("NEO4J_URI"),
    auth=(os.getenv("NEO4J_USERNAME"), os.getenv("NEO4J_PASSWORD")),
)
neo4j_database = os.getenv("NEO4J_DATABASE", "neo4j")

connector = CSVConnector(
    csv_directory="datasets/csv",
    neo4j_driver=neo4j_driver,
    database_name=neo4j_database,
)

# Load everything (all available CSV files)
connector.ingest()

Selective Ingest

Use include_nodes and include_relationships to load only a subset of the available data:

from neocarta import NodeLabel as nl, RelationshipType as rt

# Load core schema + query usage, skip glossary
connector.ingest(
    include_nodes=[
        nl.DATABASE,
        nl.SCHEMA,
        nl.TABLE,
        nl.COLUMN,
        nl.VALUE,
        nl.QUERY,
    ],
    include_relationships=[
        rt.HAS_SCHEMA,
        rt.HAS_TABLE,
        rt.HAS_COLUMN,
        rt.HAS_VALUE,
        rt.REFERENCES,
        rt.USES_TABLE,
        rt.USES_COLUMN,
    ],
)

neo4j_driver.close()
print("Connector completed successfully!")

Custom File Mapping

Configure custom filenames at construction time using csv_file_map. The map merges with the defaults, so you only need to specify the files you want to rename:

from neocarta import NodeLabel

custom_file_map = {
    NodeLabel.DATABASE: "my_databases.csv",
    NodeLabel.TABLE:    "my_tables.csv",
    NodeLabel.COLUMN:   "my_columns.csv",
}

connector = CSVConnector(
    csv_directory="path/to/my/data",
    neo4j_driver=neo4j_driver,
    database_name="neo4j",
    csv_file_map=custom_file_map,
)

connector.ingest()

CLI

pip install "neocarta[cli]"

neocarta csv ingest --csv-directory ./datasets/csv

Required Environment Variables

Variable	Purpose
`NEO4J_URI`	Neo4j connection URI
`NEO4J_USERNAME`	Neo4j username
`NEO4J_PASSWORD`	Neo4j password
`NEO4J_DATABASE`	Target Neo4j database (default: `neo4j`)

A ready-to-use sample e-commerce dataset lives in datasets/csv/. It demonstrates the full CSV structure — database, schema, tables, columns, foreign keys, sample values, query logs, and a business glossary — and can be used to test the connector without any external data source.

Get Started

Connectors

Enrichment

MCP Server

CLI Reference

CSV Connector: Load Metadata from CSV Files

Import

Parameters

`ingest()` Parameters

Default CSV Filenames

ID Strategy

Code Example

Selective Ingest

Custom File Mapping

CLI

Required Environment Variables

Build docs developers (and LLMs) love

Get Started

Connectors

Enrichment

MCP Server

CLI Reference

Documentation Index

​Import

​Parameters

​ingest() Parameters

​Default CSV Filenames

​ID Strategy

​Code Example

​Selective Ingest

​Custom File Mapping

​CLI

​Required Environment Variables

Build docs developers (and LLMs) love

Import

Parameters

`ingest()` Parameters

Default CSV Filenames

ID Strategy

Code Example

Selective Ingest

Custom File Mapping

CLI

Required Environment Variables