Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/avnlp/agentic-med-diag/llms.txt

Use this file to discover all available pages before exploring further.

LightRAG builds a dual-level knowledge graph — a local graph of fine-grained entity mentions and a global graph of high-level concept clusters — and retrieves from both simultaneously in hybrid mode. It is the most comprehensive backend in this system and the only one that requires an external Neo4j instance for graph storage.

Paper & Repository

Indexing

During ingestion, LightRAG processes each document through an LLM-driven entity and relationship extraction pipeline before storing the resulting graph in Neo4j alongside dense embeddings in a vector index.
1

Chunking

Source documents are split into overlapping chunks (400 tokens, 50-token overlap) to ensure that entity mentions near chunk boundaries are not lost.
2

Entity & relationship extraction

Each chunk is passed to an LLM that identifies named entities and extracts pairwise relationships between them. This step produces the raw triples that populate the knowledge graph.
3

Graph storage (Neo4j)

Extracted entities and relationships are written to Neo4j, forming both the local (entity-mention) and global (concept-cluster) layers of the dual-level KG.
4

Embedding & vector index

Dense embeddings of entities, relationships, and raw document chunks are stored in a vector index alongside the Neo4j graph, enabling hybrid retrieval in the query phase.
5

Parallel async ingestion

Parallel async inserts and async LLM calls accelerate large-scale ingestion, keeping wall-clock time manageable even for large medical corpora.

Retrieval (Hybrid Mode)

LightRAG’s hybrid retrieval combines keyword-based knowledge graph traversal with dense vector search over document chunks, surfacing both relational and semantic evidence for every query. The retrieval returns a structured JSON context with four sections:
SectionContent
Knowledge Graph Data (Entity)Entities matched by KG traversal
Knowledge Graph Data (Relationship)Relationships between matched entities
Document ChunksRaw text chunks surfaced by dense vector search
Reference Document ListSource document references for the returned chunks
The system’s context_filter splits this JSON response across the two retrieval channels:
  • Semantic channel receives Document Chunks + Reference Document List
  • Relational channel receives Knowledge Graph Data (Entity) + Knowledge Graph Data (Relationship)

Storage Requirements

LightRAG requires two storage backends:
StoragePurpose
Neo4jStores the entity/relationship knowledge graph
Vector indexStores dense embeddings for semantic retrieval over document chunks
LightRAG is the only backend in this system that requires Neo4j. MiniRAG, PathRAG, and HyperGraphRAG all store their graphs locally in the working directory and have no external graph-database dependency.

Build docs developers (and LLMs) love