Skip to main content

Overview

The AgenticRetriever class performs strategic document section selection using query decomposition results. It combines intent-based targeting, entity matching, and term frequency scoring to retrieve the 3-5 most relevant sections.

Class Definition

class AgenticRetriever:
    INTENT_SECTION_MAP = {
        "penalty": ["Late Payment Penalties", "Payment Terms"],
        "payment_terms": ["Payment Terms", "Late Payment Penalties"],
        "intellectual_property": ["Intellectual Property Rights"],
        "indemnification": ["Indemnification"],
        "termination": ["Termination for Convenience"],
        "confidentiality": ["Confidentiality"],
        "scope_of_services": ["Scope of Services"],
    }
    
    def __init__(self, doc_store: MockDocumentStore)

Parameters

doc_store
MockDocumentStore
required
Document store instance for accessing document sections

Methods

retrieve

Retrieves the most relevant document sections based on query decomposition.
async def retrieve(self, query: str, decomposition: Dict) -> List[Dict]
query
str
required
The original user query
decomposition
Dict
required
Query decomposition from QueryDecomposer.decompose()
intent
str
Detected intent category
entities
List[str]
Extracted entities
constraints
Dict
Extracted constraints
sections
List[Dict]
List of 3-5 most relevant sections, each containing:
title
str
Section title from the document
content
str
Section content text
page_num
int
Page number where the section appears
_relevance_score
float
Computed relevance score (higher is better)

Example

retriever = AgenticRetriever(doc_store)

query = "What is the late payment penalty?"
decomposition = {
    "intent": "penalty",
    "entities": ["late", "payment", "penalty"],
    "constraints": {},
    "temporals": []
}

sections = await retriever.retrieve(query, decomposition)

for section in sections:
    print(f"{section['title']} (page {section['page_num']})")
    print(f"Relevance: {section['_relevance_score']}")
    print(section['content'][:200])
    print()

Intent-Section Mapping

The retriever uses a predefined mapping to target specific sections based on detected intent. The first section in each list is the primary target (receives +7.0 score boost), while secondary sections receive +5.0.
INTENT_SECTION_MAP = {
    "penalty": ["Late Payment Penalties", "Payment Terms"],
    "payment_terms": ["Payment Terms", "Late Payment Penalties"],
    "intellectual_property": ["Intellectual Property Rights"],
    "indemnification": ["Indemnification"],
    "termination": ["Termination for Convenience"],
    "confidentiality": ["Confidentiality"],
    "scope_of_services": ["Scope of Services"],
}

Relevance Scoring Algorithm

_score_section

Computes a multi-factor relevance score for each section.
def _score_section(self, section: Dict, query: str, decomposition: Dict) -> float
Scoring Weights:
  1. Intent-Based Scoring (Highest Priority)
    • Primary intent match: +7.0 (5.0 + 2.0 bonus)
    • Secondary intent match: +5.0
  2. Entity Matching
    • Entity in content: +1.0 per entity
    • Entity in title: +1.5 per entity
  3. Query Term Matching
    • Term in content (>3 chars): +0.5 per term
  4. Text Search Boost
    • Applied from section["_text_boost"] if present
Example Scoring:
# Query: "What is the late payment penalty?"
# Intent: "penalty"
# Entities: ["late", "payment", "penalty"]

# Section: "Late Payment Penalties"
score = 7.0  # Primary intent match
score += 1.5 * 3  # 3 entities in title
score += 0.5 * 4  # 4 query terms in content
# Total: ~13.5

Private Helper Methods

_filter_irrelevant

Removes sections below the relevance threshold.
def _filter_irrelevant(self, sections: List[Dict], threshold: float = 1.0) -> List[Dict]
sections
List[Dict]
required
Sections with _relevance_score field
threshold
float
default:"1.0"
Minimum relevance score to include (default: 2.0 in retrieve method)
Threshold in retrieve(): 2.0 Sections must score at least 2.0 to be considered relevant. If no sections meet this threshold, the top-scoring section is returned as a fallback.

Retrieval Strategy

  1. Full-text search: Query the document store
  2. Fallback: If no results, retrieve all sections from the sample contract
  3. Scoring: Apply multi-factor scoring to all sections
  4. Filtering: Keep only sections with score ≥ 2.0
  5. Selection: Return top 5 relevant sections (or top 1 if none meet threshold)
  6. Logging: Log retrieval results with intent and section count

Usage Example

from components import QueryDecomposer, AgenticRetriever
from mock_data import MockDocumentStore

# Initialize components
decomposer = QueryDecomposer()
doc_store = MockDocumentStore()
retriever = AgenticRetriever(doc_store)

# Process query
query = "What intellectual property rights does the client retain?"
decomposition = await decomposer.decompose(query)

# Retrieve sections
sections = await retriever.retrieve(query, decomposition)

print(f"Found {len(sections)} relevant sections")
for section in sections:
    print(f"\n{section['title']} (Score: {section['_relevance_score']:.1f})")
    print(f"Page {section['page_num']}")

Performance Characteristics

  • Target sections: 3-5 per query
  • Minimum threshold: 2.0 relevance score
  • Fallback behavior: Returns 1 section if none meet threshold
  • Primary intent boost: 7.0 points
  • Entity title match: 1.5 points each

Integration

The retriever sits between the QueryDecomposer and the response generation pipeline:
# Full pipeline
decomposition = await decomposer.decompose(query)
sections = await retriever.retrieve(query, decomposition)  # <- AgenticRetriever
response = generator.generate(sections)
verdict = await judge.evaluate(response, sections)

Build docs developers (and LLMs) love