Session 23 of Going Meta, broadcast on December 6, 2023, goes beyond the foundational RAG pipeline introduced in Session 22 and dives into advanced retrieval strategies. Jesus Barrasa demonstrates two complete worked examples: a legislation assistant built over UK parliamentary data, and a Streamlit-powered art gallery assistant using the Tate collection dataset — both showcasing patterns like parent-child chunking, multi-hop graph traversal, and context enrichment that go well beyond naive vector lookup.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jbarrasa/goingmeta/llms.txt
Use this file to discover all available pages before exploring further.
What You’ll Learn
- How parent-child chunking improves retrieval precision and context quality
- How multi-hop graph traversal surfaces related knowledge not reachable by vector similarity alone
- How to build an end-to-end Streamlit chatbot using Neo4j as the knowledge backend
- How graph context enriches LLM prompts to produce more grounded answers
- The trade-offs between the legislation and art gallery retrieval patterns
Two Worked Examples
Legislation Assistant
Built on UK parliamentary legislation data loaded from a Neo4j database dump. Demonstrates chunk-level vector retrieval enriched with document-level graph context through parent-child relationships.
Art Gallery Assistant (Tate)
Built on the Tate collection dataset, structured as a knowledge graph. Demonstrates a Streamlit chatbot that uses multi-hop graph traversal to answer questions about artworks, artists, and movements.
Setup Instructions
Legislation Example
Load the legislation database
Restore the pre-built Neo4j database dump provided in the session repository:
Art Gallery Assistant
Build the Tate collection graph
Run the graph creation script to populate Neo4j with artworks, artists, movements, and their relationships from the Tate collection:
Advanced RAG Patterns
Parent-Child Chunking
In simple RAG, you store and retrieve individual text chunks. With parent-child chunking, child chunks (paragraphs or sentences) are stored for precise retrieval, but the parent document (full section or article) is passed to the LLM for fuller context.Returning the parent node’s full text to the LLM — rather than just the matched chunk — dramatically reduces the chance of missing critical context that spans chunk boundaries.
Multi-Hop Graph Traversal
Instead of stopping at directly linked nodes, multi-hop traversal follows relationship chains to surface indirectly related knowledge. For the Tate collection, this means an answer about a painting can draw on the artist, their associated movement, contemporaries, and exhibition history — all connected through graph relationships.Graph-Enriched Context
The graph structure around retrieved nodes is serialised into natural-language fragments and appended to the LLM prompt. This grounds the generated answer in verifiable facts from the knowledge graph rather than relying solely on the LLM’s parametric knowledge.Key Architectural Decisions
Chunk Granularity
Smaller chunks improve retrieval precision; larger context windows improve answer quality. Parent-child chunking resolves this tension by decoupling retrieval granularity from generation context size.
Graph as Context Enricher
The knowledge graph is not just a retrieval index — it provides structured context (relationships, properties, connected entities) that complements raw text chunks.
Streamlit for Rapid Prototyping
The Streamlit art gallery assistant shows how quickly an interactive RAG application can be built when the retrieval and generation layers are cleanly separated.
Domain-Specific Datasets
Both the UK legislation data and the Tate collection are rich, structured datasets that demonstrate RAG patterns in realistic, non-trivial settings.
Resources
Watch the Recording
Full session recording on YouTube — December 6, 2023.
Session Code
Notebooks, Cypher scripts, and Streamlit app on GitHub.