Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt

Use this file to discover all available pages before exploring further.

Session 23 of Going Meta, broadcast on December 6, 2023, goes beyond the foundational RAG pipeline introduced in Session 22 and dives into advanced retrieval strategies. Jesus Barrasa demonstrates two complete worked examples: a legislation assistant built over UK parliamentary data, and a Streamlit-powered art gallery assistant using the Tate collection dataset — both showcasing patterns like parent-child chunking, multi-hop graph traversal, and context enrichment that go well beyond naive vector lookup.

What You’ll Learn

  • How parent-child chunking improves retrieval precision and context quality
  • How multi-hop graph traversal surfaces related knowledge not reachable by vector similarity alone
  • How to build an end-to-end Streamlit chatbot using Neo4j as the knowledge backend
  • How graph context enriches LLM prompts to produce more grounded answers
  • The trade-offs between the legislation and art gallery retrieval patterns

Two Worked Examples

Legislation Assistant

Built on UK parliamentary legislation data loaded from a Neo4j database dump. Demonstrates chunk-level vector retrieval enriched with document-level graph context through parent-child relationships.

Art Gallery Assistant (Tate)

Built on the Tate collection dataset, structured as a knowledge graph. Demonstrates a Streamlit chatbot that uses multi-hop graph traversal to answer questions about artworks, artists, and movements.

Setup Instructions

Legislation Example

1

Load the legislation database

Restore the pre-built Neo4j database dump provided in the session repository:
./neo4j-admin database load --from-path=/path/to/file/ legislation
2

Run the legislation notebook

Open and execute GM23_legislation_example.ipynb in the session folder. The notebook implements the RAG pipeline over the loaded legislation graph.
1

Build the Tate collection graph

Run the graph creation script to populate Neo4j with artworks, artists, movements, and their relationships from the Tate collection:
// Run the contents of art-graph-creation.cypher in Neo4j Browser
// to populate the Tate collection knowledge graph.
2

Run the art gallery assistant notebook

Open and execute GM23_ArtGalleryAssistant.ipynb. This notebook builds the Streamlit interface and RAG chain that powers the art gallery assistant.

Advanced RAG Patterns

Parent-Child Chunking

In simple RAG, you store and retrieve individual text chunks. With parent-child chunking, child chunks (paragraphs or sentences) are stored for precise retrieval, but the parent document (full section or article) is passed to the LLM for fuller context.
[ Parent Document ]
    ├── [ Child Chunk 1 ]  ← vector index retrieval targets here
    ├── [ Child Chunk 2 ]
    └── [ Child Chunk 3 ]  ← matched chunk → return Parent for LLM context
Returning the parent node’s full text to the LLM — rather than just the matched chunk — dramatically reduces the chance of missing critical context that spans chunk boundaries.

Multi-Hop Graph Traversal

Instead of stopping at directly linked nodes, multi-hop traversal follows relationship chains to surface indirectly related knowledge. For the Tate collection, this means an answer about a painting can draw on the artist, their associated movement, contemporaries, and exhibition history — all connected through graph relationships.

Graph-Enriched Context

The graph structure around retrieved nodes is serialised into natural-language fragments and appended to the LLM prompt. This grounds the generated answer in verifiable facts from the knowledge graph rather than relying solely on the LLM’s parametric knowledge.
Graph-enriched context is especially powerful for knowledge domains where precision matters — legislation, medicine, compliance — because the graph captures structured relationships that embeddings may not reliably encode.

Key Architectural Decisions

Chunk Granularity

Smaller chunks improve retrieval precision; larger context windows improve answer quality. Parent-child chunking resolves this tension by decoupling retrieval granularity from generation context size.

Graph as Context Enricher

The knowledge graph is not just a retrieval index — it provides structured context (relationships, properties, connected entities) that complements raw text chunks.

Streamlit for Rapid Prototyping

The Streamlit art gallery assistant shows how quickly an interactive RAG application can be built when the retrieval and generation layers are cleanly separated.

Domain-Specific Datasets

Both the UK legislation data and the Tate collection are rich, structured datasets that demonstrate RAG patterns in realistic, non-trivial settings.

Resources

Watch the Recording

Full session recording on YouTube — December 6, 2023.

Session Code

Notebooks, Cypher scripts, and Streamlit app on GitHub.

Build docs developers (and LLMs) love