What is RAG?
Retrieval-Augmented Generation combines:- Retrieval: Search for relevant information from your knowledge base
- Augmentation: Add retrieved context to the user’s query
- Generation: Generate responses using both the query and retrieved context
- Grounded in your specific data
- More accurate and factual
- Verifiable with source citations
RAG helps reduce hallucinations by providing the model with relevant facts before generating responses.
Supported Knowledge Sources
Bedrock Chat supports multiple knowledge source types:File Uploads
PDF, TXT, MD, CSV, XLSX, DOCX, and more. Files are automatically parsed and embedded.
Web URLs
Individual web pages are crawled, parsed, and indexed for retrieval.
Sitemaps
Provide a sitemap URL to automatically index all pages in a website.
S3 URLs
Reference files stored in S3 buckets (requires appropriate IAM permissions).
Knowledge Base Architecture
Knowledge bases are powered by Amazon Bedrock Knowledge Bases with OpenSearch Serverless:Components
- Amazon Bedrock Knowledge Bases: Managed RAG service
- OpenSearch Serverless: Vector database for semantic search
- Step Functions: Orchestrates document ingestion
- Amazon Titan Embeddings: Converts text to vectors
Knowledge Base Types
Bedrock Chat offers two deployment models:Dedicated Knowledge Base
Each bot gets its own Knowledge Base:- Isolated data per bot
- Dedicated resources
- Higher limit consumption (100 KBs per account by default)
Multi-Tenant Knowledge Base (Recommended)
Multiple bots share a common Knowledge Base with data isolation:- Single Knowledge Base across multiple bots
- Data filtered by Bot ID metadata
- Significantly reduces account limits
- Default for new bots
Multi-tenant mode is the default for new bots. To migrate existing bots, change the bot’s knowledge settings to “Create a tenant in a shared Knowledge Base.”
Bulk Migration to Multi-Tenant
Bulk Migration to Multi-Tenant
To migrate multiple bots to multi-tenant mode:
Adding Knowledge to Bots
Via UI
- Create or edit a bot
- Navigate to the Knowledge section
- Add your knowledge sources:
- Upload files directly
- Enter web URLs
- Provide sitemap URLs
- Reference S3 URLs
Via API
Ingestion Pipeline
When you add knowledge sources:- Queue: Bot sync status set to
QUEUED - Download: Step Functions downloads/fetches content
- Parse: Documents are parsed and chunked
- Embed: Text chunks converted to vectors
- Index: Vectors stored in OpenSearch Serverless
- Complete: Sync status set to
SUCCEEDED
Ingestion time varies based on document size and quantity. Monitor the bot’s sync status to know when it’s ready.
Chunking Strategies
Control how documents are split for embedding:Fixed-Size Chunking (Default)
No Chunking
Keep documents as single chunks (for small documents):Semantic Chunking
Split based on semantic boundaries:Hierarchical Chunking
Create parent-child chunk relationships:Choosing a Chunking Strategy
Choosing a Chunking Strategy
- Fixed-size: Good default for most documents
- No chunking: Small documents, structured data
- Semantic: Long-form content where context matters
- Hierarchical: Complex documents with nested sections
Advanced Parsing
Enable foundation model parsing for better extraction:- Better handling of complex layouts
- Improved table and chart extraction
- Enhanced multi-column processing
Advanced parsing incurs additional costs but significantly improves extraction quality for complex documents.
Importing Existing Knowledge Bases
Connect to an existing Amazon Bedrock Knowledge Base:- Reuse existing Knowledge Bases
- Share knowledge across applications
- Use externally managed data sources
Retrieval at Query Time
When a user sends a message:- Query is embedded using Amazon Titan
- Vector search finds similar chunks
- Top-k chunks (default: 5) retrieved
- Chunks added to prompt context
- Model generates response
Displaying Retrieved Chunks
Show users which sources were used:Contextual Grounding with Guardrails
Reduce hallucinations using Bedrock Guardrails:OpenSearch Serverless Configuration
Replicas
Control availability and cost with replicas:- Enabled: 2 OCUs minimum, higher availability
- Disabled: 1 OCU minimum, lower cost
As of June 2024, OpenSearch Serverless supports 0.5 OCU, reducing entry costs. It automatically scales based on workload.
Collection Language
Optimize text analysis for your content language:Updating Knowledge
Modify knowledge sources anytime:- Edit the bot
- Add/remove knowledge sources
- Save changes
- Sync status →
QUEUED - Old knowledge remains available during sync
- Sync status →
SUCCEEDEDwhen complete
Performance Optimization
Chunk Size
Smaller chunks (200-300 tokens) for precise retrieval. Larger chunks (500-1000) for more context.
Overlap
Use 10-20% overlap to avoid losing context at chunk boundaries.
Document Quality
Clean, well-structured documents improve retrieval accuracy. Remove boilerplate and noise.
Query Optimization
Encourage users to be specific in queries for better retrieval results.
Troubleshooting
Sync Status: FAILED
Sync Status: FAILED
Check
sync_status_reason for error details. Common issues:- Invalid URLs or file formats
- Permission errors for S3 access
- Parsing failures for complex documents
Poor Retrieval Results
Poor Retrieval Results
- Try different chunking strategies
- Enable advanced parsing for complex docs
- Increase chunk overlap
- Improve document structure and formatting
High Costs
High Costs
- Use multi-tenant Knowledge Bases
- Disable replicas for dev environments
- Reduce chunk count by using larger chunks
- Enable prompt caching
Example Configurations
Documentation Bot
Documentation Bot
Research Assistant
Research Assistant
Next Steps
Create Custom Bot
Build a bot with knowledge integration
Enable Agents
Combine knowledge with tool usage
Configure Guardrails
Add content filters and grounding checks
Bot Store
Share knowledge-powered bots