Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/avnlp/agentic-med-diag/llms.txt

Use this file to discover all available pages before exploring further.

Neo4j is the graph database that stores the entity and relationship knowledge graph built from medical documents. It powers the relational channel of the agentic pipeline, enabling SPO (Subject–Predicate–Object) triple-based retrieval over structured clinical knowledge. By persisting named entities and their pairwise connections, Neo4j gives the system a dedicated store for structured medical reasoning — separate from, but complementary to, the dense vector index used by the semantic channel.

Role in the System

The agentic pipeline interacts with Neo4j at two distinct stages of the data lifecycle: during document indexing and during multi-hop retrieval. During indexing, an LLM processes each document chunk and extracts named entities alongside their pairwise relationships. Those entities and relationships are written directly into Neo4j, building up a structured knowledge graph that grows with every new document ingested. During retrieval, the relational channel issues SPO triple queries against Neo4j. Each triple encodes a (Subject, Predicate, Object) pattern that the graph database resolves by traversing stored nodes and edges. The resulting triplets are then summarised and combined with the semantic channel’s output before the final answer is synthesised.
Only LightRAG uses Neo4j as its graph storage backend. MiniRAG, PathRAG, and HyperGraphRAG store their entity and relationship graphs locally in the working directory and do not require a running Neo4j instance.

Setup

The quickest way to get Neo4j running locally is with Docker.
1

Start the Neo4j container

Run the following command to pull and start a Neo4j instance with default authentication. The standard Neo4j Docker ports are 7474 (HTTP browser) and 7687 (Bolt protocol):
docker run --name neo4j \
  -p 7474:7474 -p 7687:7687 \
  -e NEO4J_AUTH=neo4j/password \
  neo4j:latest
2

Verify the instance

Once the container is running, you can access the Neo4j Browser at:
http://localhost:7474
The Bolt protocol — used by the application driver — is exposed on:
bolt://localhost:7687
Log in with the credentials you set in NEO4J_AUTH (neo4j / password in the example above).
For production deployments, consider using the official Neo4j Helm chart for Kubernetes or the managed Neo4j AuraDB cloud service.

Configuration

The pipeline connects to Neo4j using environment variables for the Bolt URI, username, and password. The table below shows typical configuration variable names — set these in your shell or in a .env file and never hardcode credentials in source code or commit them to version control.
VariableDescriptionExample
NEO4J_URIBolt connection URI for the Neo4j instancebolt://localhost:7687
NEO4J_USERNAMENeo4j database usernameneo4j
NEO4J_PASSWORDNeo4j database passwordpassword
Do not hardcode Neo4j credentials in your source code. Always supply them through environment variables or a dedicated secrets manager such as AWS Secrets Manager or HashiCorp Vault.

Data Model

The Neo4j graph used by the agentic pipeline follows a straightforward entity–relationship schema derived directly from the LLM extraction step. Nodes represent named entities extracted from medical documents by the LLM during indexing. Each node corresponds to a named medical concept identified in the source text — such as a disease, drug, symptom, or biomarker. Relationships represent pairwise connections between entities — the edges that make the graph navigable. The LLM extracts these relationships directly from document text, encoding the clinical or biological link between two named entities. Vector index — alongside the graph structure, LightRAG stores dense embeddings in a vector index. These embeddings sit next to the raw document chunks and are used by the semantic retrieval channel during hybrid search, ensuring both structured and unstructured knowledge are accessible from within the same indexing pass.
If you are using a backend other than LightRAG (i.e., MiniRAG, PathRAG, or HyperGraphRAG), Neo4j is not required. Those backends persist their graphs locally and have no external graph database dependency.

Build docs developers (and LLMs) love