Knowledge Bases and RAG

Knowledge bases enable your custom bots to access and reason over domain-specific information through Retrieval-Augmented Generation (RAG). This allows bots to provide accurate, up-to-date answers grounded in your documents and data.

What is RAG?

Retrieval-Augmented Generation combines:

Retrieval: Search for relevant information from your knowledge base
Augmentation: Add retrieved context to the user’s query
Generation: Generate responses using both the query and retrieved context

This approach ensures responses are:

Grounded in your specific data
More accurate and factual
Verifiable with source citations

RAG helps reduce hallucinations by providing the model with relevant facts before generating responses.

Supported Knowledge Sources

Bedrock Chat supports multiple knowledge source types:

File Uploads

PDF, TXT, MD, CSV, XLSX, DOCX, and more. Files are automatically parsed and embedded.

Web URLs

Individual web pages are crawled, parsed, and indexed for retrieval.

Sitemaps

Provide a sitemap URL to automatically index all pages in a website.

S3 URLs

Reference files stored in S3 buckets (requires appropriate IAM permissions).

Knowledge Base Architecture

Knowledge bases are powered by Amazon Bedrock Knowledge Bases with OpenSearch Serverless:

Documents → Ingestion Pipeline → Embedding → OpenSearch Serverless
                   ↓                 ↓
              Parsing          Amazon Titan
            (Step Functions)   Embeddings

Components

Amazon Bedrock Knowledge Bases: Managed RAG service
OpenSearch Serverless: Vector database for semantic search
Step Functions: Orchestrates document ingestion
Amazon Titan Embeddings: Converts text to vectors

Knowledge Base Types

Bedrock Chat offers two deployment models:

Dedicated Knowledge Base

Each bot gets its own Knowledge Base:

Isolated data per bot
Dedicated resources
Higher limit consumption (100 KBs per account by default)

Multi-Tenant Knowledge Base (Recommended)

Multiple bots share a common Knowledge Base with data isolation:

Single Knowledge Base across multiple bots
Data filtered by Bot ID metadata
Significantly reduces account limits
Default for new bots

Multi-tenant mode is the default for new bots. To migrate existing bots, change the bot’s knowledge settings to “Create a tenant in a shared Knowledge Base.”

Bulk Migration to Multi-Tenant

To migrate multiple bots to multi-tenant mode:

aws dynamodb execute-statement --statement \
  "UPDATE \"$BotTableNameV3\" \
   SET BedrockKnowledgeBase.type='shared' \
   SET SyncStatus='QUEUED' \
   WHERE PK='$UserID' AND SK='BOT#$BotID'"

# Repeat for all target bots, then start sync:
aws stepfunctions start-execution \
  --state-machine-arn $EmbeddingStateMachineArn

Adding Knowledge to Bots

Via UI

Create or edit a bot
Navigate to the Knowledge section
Add your knowledge sources:
- Upload files directly
- Enter web URLs
- Provide sitemap URLs
- Reference S3 URLs

Via API

# POST /bot
{
  "title": "My Bot",
  "instruction": "...",
  "knowledge": {
    "source_urls": ["https://example.com/docs"],
    "sitemap_urls": ["https://example.com/sitemap.xml"],
    "filenames": ["guide.pdf", "manual.docx"],
    "s3_urls": ["s3://my-bucket/documents/data.pdf"]
  }
}

Ingestion Pipeline

When you add knowledge sources:

Queue: Bot sync status set to QUEUED
Download: Step Functions downloads/fetches content
Parse: Documents are parsed and chunked
Embed: Text chunks converted to vectors
Index: Vectors stored in OpenSearch Serverless
Complete: Sync status set to SUCCEEDED

Ingestion time varies based on document size and quantity. Monitor the bot’s sync status to know when it’s ready.

Chunking Strategies

Control how documents are split for embedding:

Fixed-Size Chunking (Default)

chunking_configuration = {
  "chunking_strategy": "fixed_size",
  "fixed_size_chunking_configuration": {
    "max_tokens": 300,
    "overlap_percentage": 20
  }
}

No Chunking

Keep documents as single chunks (for small documents):

chunking_configuration = {
  "chunking_strategy": "none"
}

Semantic Chunking

Split based on semantic boundaries:

chunking_configuration = {
  "chunking_strategy": "semantic",
  "semantic_chunking_configuration": {
    "max_tokens": 300,
    "buffer_size": 0,
    "breakpoint_percentile_threshold": 95
  }
}

Hierarchical Chunking

Create parent-child chunk relationships:

chunking_configuration = {
  "chunking_strategy": "hierarchical",
  "hierarchical_chunking_configuration": {
    "level_configurations": [
      {"max_tokens": 1500},  # Parent level
      {"max_tokens": 300}    # Child level
    ],
    "overlap_tokens": 60
  }
}

Choosing a Chunking Strategy

Fixed-size: Good default for most documents
No chunking: Small documents, structured data
Semantic: Long-form content where context matters
Hierarchical: Complex documents with nested sections

Advanced Parsing

Enable foundation model parsing for better extraction:

parsing_model = "anthropic.claude-3-sonnet-20240229-v1:0"

Benefits:

Better handling of complex layouts
Improved table and chart extraction
Enhanced multi-column processing

Advanced parsing incurs additional costs but significantly improves extraction quality for complex documents.

Importing Existing Knowledge Bases

Connect to an existing Amazon Bedrock Knowledge Base:

bedrock_knowledge_base = {
  "exist_knowledge_base_id": "ABCDEF123",
  "type": None  # Not managed by Bedrock Chat
}

Use cases:

Reuse existing Knowledge Bases
Share knowledge across applications
Use externally managed data sources

Retrieval at Query Time

When a user sends a message:

Query is embedded using Amazon Titan
Vector search finds similar chunks
Top-k chunks (default: 5) retrieved
Chunks added to prompt context
Model generates response

Displaying Retrieved Chunks

Show users which sources were used:

display_retrieved_chunks = True

This adds source citations to responses, improving transparency and trust.

Contextual Grounding with Guardrails

Reduce hallucinations using Bedrock Guardrails:

bedrock_guardrails = {
  "is_guardrail_enabled": True,
  "guardrail_arn": "arn:aws:bedrock:...",
  "guardrail_version": "1"
}

Guardrails check if responses are grounded in retrieved knowledge and block or filter ungrounded content.

OpenSearch Serverless Configuration

Replicas

Control availability and cost with replicas:

{
  "enableRagReplicas": true  // Production
  // or
  "enableRagReplicas": false  // Dev/Test
}

Enabled: 2 OCUs minimum, higher availability
Disabled: 1 OCU minimum, lower cost

As of June 2024, OpenSearch Serverless supports 0.5 OCU, reducing entry costs. It automatically scales based on workload.

Collection Language

Optimize text analysis for your content language:

{
  "botStoreLanguage": "en"  // English (default)
  // or "ja", "es", "fr", etc.
}

Updating Knowledge

Modify knowledge sources anytime:

Edit the bot
Add/remove knowledge sources
Save changes

This triggers a new ingestion:

Sync status → QUEUED
Old knowledge remains available during sync
Sync status → SUCCEEDED when complete

Performance Optimization

Chunk Size

Smaller chunks (200-300 tokens) for precise retrieval. Larger chunks (500-1000) for more context.

Overlap

Use 10-20% overlap to avoid losing context at chunk boundaries.

Document Quality

Clean, well-structured documents improve retrieval accuracy. Remove boilerplate and noise.

Query Optimization

Encourage users to be specific in queries for better retrieval results.

Troubleshooting

Sync Status: FAILED

Check sync_status_reason for error details. Common issues:

Invalid URLs or file formats
Permission errors for S3 access
Parsing failures for complex documents

Fix the issue and save the bot again to retry.

Poor Retrieval Results

Try different chunking strategies
Enable advanced parsing for complex docs
Increase chunk overlap
Improve document structure and formatting

High Costs

Use multi-tenant Knowledge Bases
Disable replicas for dev environments
Reduce chunk count by using larger chunks
Enable prompt caching

Example Configurations

Documentation Bot

{
  "knowledge": {
    "sitemap_urls": ["https://docs.example.com/sitemap.xml"]
  },
  "bedrock_knowledge_base": {
    "type": "shared",
    "chunking_configuration": {
      "chunking_strategy": "semantic",
      "semantic_chunking_configuration": {
        "max_tokens": 300,
        "breakpoint_percentile_threshold": 95
      }
    }
  },
  "display_retrieved_chunks": True
}

Research Assistant

{
  "knowledge": {
    "filenames": ["research_papers.pdf"],
    "source_urls": ["https://arxiv.org/paper1", "https://arxiv.org/paper2"]
  },
  "bedrock_knowledge_base": {
    "type": "shared",
    "chunking_configuration": {
      "chunking_strategy": "hierarchical",
      "hierarchical_chunking_configuration": {
        "level_configurations": [
          {"max_tokens": 1500},
          {"max_tokens": 300}
        ],
        "overlap_tokens": 60
      }
    },
    "parsing_model": "anthropic.claude-3-sonnet-20240229-v1:0"
  },
  "display_retrieved_chunks": True
}

Next Steps

Create Custom Bot

Build a bot with knowledge integration

Enable Agents

Combine knowledge with tool usage

Configure Guardrails

Add content filters and grounding checks

Bot Store

Share knowledge-powered bots

Get Started

Deployment

Core Features

Configuration

Administration

Development

Migration & Support

Documentation Index

​What is RAG?

​Supported Knowledge Sources

File Uploads

Web URLs

Sitemaps

S3 URLs

​Knowledge Base Architecture

​Components

​Knowledge Base Types

​Dedicated Knowledge Base

​Multi-Tenant Knowledge Base (Recommended)

​Adding Knowledge to Bots

​Via UI

​Via API

​Ingestion Pipeline

​Chunking Strategies

​Fixed-Size Chunking (Default)

​No Chunking

​Semantic Chunking

​Hierarchical Chunking

​Advanced Parsing

​Importing Existing Knowledge Bases

​Retrieval at Query Time

​Displaying Retrieved Chunks

​Contextual Grounding with Guardrails

​OpenSearch Serverless Configuration

​Replicas

​Collection Language

​Updating Knowledge

​Performance Optimization

Chunk Size

Overlap

Document Quality

Query Optimization

​Troubleshooting

​Example Configurations

​Next Steps

Create Custom Bot

Enable Agents

Configure Guardrails

Bot Store

Build docs developers (and LLMs) love

What is RAG?

Supported Knowledge Sources

Knowledge Base Architecture

Components

Knowledge Base Types

Dedicated Knowledge Base

Multi-Tenant Knowledge Base (Recommended)

Adding Knowledge to Bots

Via UI

Via API

Ingestion Pipeline

Chunking Strategies

Fixed-Size Chunking (Default)

No Chunking

Semantic Chunking

Hierarchical Chunking

Advanced Parsing

Importing Existing Knowledge Bases

Retrieval at Query Time

Displaying Retrieved Chunks

Contextual Grounding with Guardrails

OpenSearch Serverless Configuration

Replicas

Collection Language

Updating Knowledge

Performance Optimization

Troubleshooting

Example Configurations

Next Steps