System Architecture - Bedrock Chat

Architecture Overview

Bedrock Chat is built on a serverless, fully managed AWS architecture that eliminates infrastructure management while providing scalability, reliability, and security. All AI model interactions happen through Amazon Bedrock within your AWS account—no data leaves AWS. Bedrock Chat Architecture Diagram

Core AWS Services

Bedrock Chat leverages the following AWS managed services:

Frontend Layer

Amazon CloudFront

Purpose: Global content delivery network (CDN) for the frontend applicationFeatures:

Automatic HTTPS encryption
Global edge locations for low latency
Custom domain support with Route 53
Geo-restriction capabilities
IPv6 support (configurable)

Usage in Bedrock Chat: Delivers the React frontend application globally with caching and edge acceleration.

Amazon S3

Purpose: Static asset storage and document hostingBuckets:

Frontend bucket: Stores compiled React application (Vite build output)
Document bucket: Stores uploaded documents for knowledge bases
Access logs bucket: Stores CloudFront and S3 access logs
Large message bucket: Stores oversized WebSocket messages
Source bucket: Stores CDK source code for CodeBuild

Security: All buckets use S3-managed encryption, block public access, and enforce SSL.

AWS WAF

Purpose: Web Application Firewall for traffic filteringWebACLs:

Frontend WAF: Protects CloudFront distribution (us-east-1)
Cognito WAF: Protects Cognito endpoints (regional)
Published API WAF: Protects published bot APIs (regional)

Rules:

IP allowlist/denylist (IPv4/IPv6)
Geographic restrictions (country-based)
Rate limiting
Common web exploits protection

Authentication & Authorization

Amazon Cognito

Purpose: User authentication and authorizationComponents:

User Pool: Manages user identities, registration, and sign-in
User Pool Client: Configures authentication flows
User Groups:
- Admin - Administrative access
- CreatingBotAllowed - Can create custom bots
- PublishAllowed - Can publish bot APIs

Features:

Self-registration (configurable)
Email domain restrictions
External identity providers (Google, OIDC)
MFA support
Token-based authentication with configurable expiration

Integration: Uses AWS Amplify in frontend for authentication flows.

Backend API Layer

Amazon API Gateway

Purpose: RESTful API endpoints for backend servicesAPIs:

Backend API: Main REST API for application logic
WebSocket API: Real-time streaming for chat responses
Published APIs: Dynamically created APIs for published bots

Features:

Cognito authorizer integration
Request/response validation
CORS configuration
CloudWatch logging
Throttling and quotas

AWS Lambda

Purpose: Serverless compute for API handlersFunctions:

API Handler: FastAPI application via Lambda Web Adapter
WebSocket Handler: Handles WebSocket connections and streaming
Custom resource handlers: CDK custom resources

Runtime: Python with AWS Lambda Web Adapter for FastAPIOptimizations:

Lambda SnapStart for Python (configurable)
Provisioned concurrency for hot starts
Environment variables for configuration
VPC integration for OpenSearch access

Dependencies:

FastAPI for REST endpoints
Boto3 for AWS SDK
LangChain for agent orchestration

Data Layer

Amazon DynamoDB

Purpose: NoSQL database for application dataTables:

Conversation Table: Stores chat conversations and messages
Bot Table: Stores custom bot configurations and metadata
Export Table: Staging table for analytics exports

Features:

On-demand capacity mode (auto-scaling)
Point-in-time recovery (PITR) enabled
DynamoDB Streams for event processing
Single-table design with composite keys

Data Model:

PK: USER#{userId} or BOT#{botId}
SK: CONVERSATION#{conversationId} or BOT#{botId}

Amazon OpenSearch Serverless

Purpose: Vector database and full-text search for knowledge basesCollections:

Knowledge Base Collections: Per-bot or multi-tenant collections for RAG
Bot Store Collection: Search and discovery for bot marketplace

Features:

Vector search with k-NN
Full-text search with BM25
Automatic scaling (OCU-based)
Data access policies (IAM-based)
Optional standby replicas for high availability

Cost Optimization:

Multi-tenant mode shares collections across bots
Configurable replicas (enableRagReplicas, enableBotStoreReplicas)
Starts at 0.5 OCU for small workloads

AI & Machine Learning

Amazon Bedrock

Purpose: Managed foundation models APISupported Models:

Claude: v4-opus, v4.5-opus, v3.7-sonnet, v3.5-sonnet, v3.5-haiku
Amazon Nova: nova-pro, nova-lite, nova-micro
Llama: llama3-3-70b-instruct, llama3-2-90b-instruct
Mistral: mistral-large-2, mixtral-8x7b-instruct
DeepSeek: deepseek-r1

Features:

Streaming responses
Cross-region inference (configurable)
Global inference profiles
Usage metrics and logging
Model access controls per region

API Integration: Uses Boto3 bedrock-runtime client with converse API.

Amazon Bedrock Knowledge Bases

Purpose: Managed RAG (Retrieval-Augmented Generation) serviceComponents:

Data Source: S3 bucket with document ingestion
Embedding Model: Amazon Titan Embeddings
Vector Store: OpenSearch Serverless
Ingestion Pipeline: Step Functions orchestration

Features:

Automatic document parsing (PDF, TXT, HTML, DOCX, etc.)
Chunking and embedding
Metadata filtering (multi-tenant support)
Synchronization jobs
Import existing Knowledge Bases

Multi-Tenant Mode: Filters documents using Bot ID metadata to share Knowledge Bases across multiple bots.

Orchestration & Automation

AWS Step Functions

Purpose: Orchestrate multi-step workflowsState Machines:

Embedding State Machine: Orchestrates document ingestion pipeline
1. Trigger Knowledge Base sync
2. Monitor sync status
3. Update DynamoDB with completion status
4. Handle failures and retries

Integration: Invoked by Lambda functions when users upload documents or create bots.

Amazon EventBridge Pipes

Purpose: Event-driven automationPipes:

Bot Deletion Pipe:
- Source: DynamoDB Streams (Bot Table)
- Filter: Deletion events
- Target: Lambda to delete CloudFormation stack

Use Case: Automatically clean up Knowledge Base CloudFormation stacks when bots are deleted.

AWS CodeBuild

Purpose: CI/CD for dynamic infrastructureProjects:

API Publish CodeBuild: Creates CloudFormation stacks for published bot APIs
Bedrock Custom Bot CodeBuild: Creates CloudFormation stacks for bot Knowledge Bases
Bedrock Shared KB CodeBuild: Creates shared multi-tenant Knowledge Base stacks

Workflow:

Lambda triggers CodeBuild with parameters
CodeBuild synthesizes CDK stack
CDK deploys CloudFormation stack
Outputs returned to Lambda

IAM: CodeBuild has permissions to create nested stacks with limited scope.

Analytics & Monitoring

Amazon Athena

Purpose: SQL query service for analyticsData Sources:

DynamoDB exports to S3 (via PITR)
CloudFront access logs
API Gateway logs

Queries:

Bot usage statistics
Conversation analytics
User activity metrics
Cost analysis

Tables: Managed by AWS Glue Data Catalog with partitioning.

AWS Glue

Purpose: Data catalog and ETLComponents:

Glue Database: Metadata catalog for DynamoDB exports
Glue Tables: Schema definitions for analytics queries
Glue Crawlers: (Optional) Discover new data partitions

Integration: Athena uses Glue catalog to query exported DynamoDB data.

Amazon CloudWatch

Purpose: Monitoring, logging, and alarmsLogs:

Lambda function logs
API Gateway access logs
CodeBuild build logs
Step Functions execution logs

Metrics:

Lambda invocations, duration, errors
API Gateway requests, latency
DynamoDB read/write units
OpenSearch OCU usage

Retention: Configurable log retention (default: 3 months)

Architecture Patterns

Serverless Architecture

Zero Infrastructure Management

All components are fully managed AWS services:

No EC2 instances to patch or maintain
Automatic scaling based on demand
Pay-per-use pricing model
Built-in high availability

Event-Driven Design

EventBridge Pipes react to DynamoDB changes to trigger cleanup workflows.

Infrastructure as Code

All infrastructure is defined using AWS CDK (TypeScript):

// Example: CDK Stack Definition
const database = new Database(this, "Database", {
  pointInTimeRecovery: true,
});

const backendApi = new Api(this, "BackendApi", {
  database,
  bedrockRegion: props.bedrockRegion,
  enableLambdaSnapStart: true,
});

Benefits:

Version control for infrastructure
Repeatable deployments
Multi-environment support (parameter.ts)
Type-safe configuration

Multi-Tenant Knowledge Bases

Metadata filtering isolates data while sharing infrastructure:

# Query with metadata filter
response = bedrock_agent_runtime.retrieve(
    knowledgeBaseId=shared_kb_id,
    retrievalQuery={"text": query},
    retrievalConfiguration={
        "vectorSearchConfiguration": {
            "filter": {"equals": {"key": "botId", "value": bot_id}}
        }
    }
)

Dynamic Stack Creation

CodeBuild creates nested CloudFormation stacks on-demand: Stacks:

BrChatKbStack-{botId}: Knowledge Base per bot
ApiPublishmentStack-{apiId}: Published bot API
BedrockSharedKbStack: Multi-tenant Knowledge Base

Security Architecture

Defense in Depth

Security Layers:

WAF: IP filtering, geo-restrictions, rate limiting
HTTPS: TLS encryption in transit
Cognito: User authentication and authorization
IAM: Least-privilege service permissions
Encryption: At-rest encryption for all storage services

IAM Roles & Policies

Lambda Execution Role

Permissions:

DynamoDB read/write (specific tables)
S3 read/write (specific buckets)
Bedrock InvokeModel
OpenSearch data access (via data access policies)
CloudWatch Logs write
Step Functions start execution

CodeBuild Service Role

Permissions:

CloudFormation create/update/delete stack
IAM create/update role (scoped to specific paths)
OpenSearch create collection
Bedrock create Knowledge Base
S3 read source bucket

DynamoDB Stream Role

Permissions:

DynamoDB Streams read
Lambda invoke (for EventBridge Pipes target)

Data Isolation

Per-User Data: DynamoDB partition key includes user ID
Per-Bot Data: OpenSearch metadata filtering by bot ID
Cognito Groups: Control access to bot creation and publishing
API Gateway Authorizer: Validates JWT tokens from Cognito

Scalability & Performance

Auto-Scaling Components

Service	Scaling Method	Limits
Lambda	Concurrent executions	1000 (default), request increase
DynamoDB	On-demand capacity	Unlimited
API Gateway	Automatic	10,000 RPS (default)
OpenSearch	OCU-based	2-100 OCU per collection
CloudFront	Global edge network	Unlimited

Performance Optimizations

Lambda SnapStart

Reduces cold start times by 50-90% for Python functions

CloudFront Caching

Caches static assets at edge locations globally

DynamoDB Streams

Processes changes asynchronously for analytics

WebSocket Streaming

Real-time token-by-token response streaming

Cost Optimization

Cost Drivers

Amazon Bedrock: Pay-per-token (input/output)
OpenSearch Serverless: OCU-hours
Lambda: Invocations + GB-seconds
DynamoDB: Read/write capacity (on-demand)
CloudFront: Data transfer out

Cost Reduction Strategies

Multi-Tenant Knowledge Bases: Share OpenSearch collections across bots to reduce OCU costs from 2 OCU per bot to 2 OCU for all bots.

Disable Replicas: Set enableRagReplicas: false in cdk.json for dev/test environments to use 1 OCU instead of 2.

Use Cheaper Models: Configure titleModel: "claude-v3-haiku" or "amazon-nova-lite" for conversation title generation.

Lambda SnapStart: Trade cache storage costs for reduced execution time (faster = cheaper).

High Availability

Regional Resilience

Multi-AZ: All services (Lambda, DynamoDB, API Gateway) run across multiple Availability Zones
OpenSearch Replicas: Optional standby replicas for Knowledge Bases
CloudFront: Global edge network with automatic failover
S3: 99.999999999% durability with cross-AZ replication

Disaster Recovery

DynamoDB PITR: Point-in-time recovery within last 35 days
S3 Versioning: Can be enabled for document buckets
CloudFormation Stacks: IaC enables rapid redeployment
Cross-Region: Deploy to multiple regions with separate stacks

Deployment Architecture

CloudShell Deployment Flow

CDK Deployment Flow

Tech Stack Summary

Frontend

Framework: React 18
Build Tool: Vite
Styling: Tailwind CSS
State Management: Zustand, XState
Authentication: AWS Amplify
API Client: Axios, SWR
Markdown: React Markdown with syntax highlighting

Backend

Runtime: Python 3.12
Framework: FastAPI
Adapter: AWS Lambda Web Adapter
AWS SDK: Boto3
Agent Framework: LangChain
Database: DynamoDB (Boto3)

Infrastructure

IaC: AWS CDK (TypeScript)
Language: Node.js 20
Package Manager: npm
Testing: Jest

Next Steps

Deploy Bedrock Chat

Follow the quickstart guide to deploy the architecture

CDK Deep Dive

Learn about advanced CDK deployment options

Configuration

Configure architecture parameters and optimizations

Custom Development

Set up local development environment

Get Started

Deployment

Core Features

Configuration

Administration

Development

Migration & Support

Documentation Index

​Architecture Overview

​Core AWS Services

​Frontend Layer

​Authentication & Authorization

​Backend API Layer

​Data Layer

​AI & Machine Learning

​Orchestration & Automation

​Analytics & Monitoring

​Architecture Patterns

​Serverless Architecture

Zero Infrastructure Management

​Event-Driven Design

​Infrastructure as Code

​Multi-Tenant Knowledge Bases

​Dynamic Stack Creation

​Security Architecture

​Defense in Depth

​IAM Roles & Policies

​Data Isolation

​Scalability & Performance

​Auto-Scaling Components

​Performance Optimizations

Lambda SnapStart

CloudFront Caching

DynamoDB Streams

WebSocket Streaming

​Cost Optimization

​Cost Drivers

​Cost Reduction Strategies

​High Availability

​Regional Resilience

​Disaster Recovery

​Deployment Architecture

​CloudShell Deployment Flow

​CDK Deployment Flow

​Tech Stack Summary

​Frontend

​Backend

​Infrastructure

​Next Steps

Deploy Bedrock Chat

CDK Deep Dive

Configuration

Custom Development

Build docs developers (and LLMs) love

Architecture Overview

Core AWS Services

Frontend Layer

Authentication & Authorization

Backend API Layer

Data Layer

AI & Machine Learning

Orchestration & Automation

Analytics & Monitoring

Architecture Patterns

Serverless Architecture

Event-Driven Design

Infrastructure as Code

Multi-Tenant Knowledge Bases

Dynamic Stack Creation

Security Architecture

Defense in Depth

IAM Roles & Policies

Data Isolation

Scalability & Performance

Auto-Scaling Components

Performance Optimizations

Cost Optimization

Cost Drivers

Cost Reduction Strategies

High Availability

Regional Resilience

Disaster Recovery

Deployment Architecture

CloudShell Deployment Flow

CDK Deployment Flow

Tech Stack Summary

Frontend

Backend

Infrastructure

Next Steps