Supported Models
Bedrock Chat supports a wide range of foundation models available through Amazon Bedrock:Claude Models
- Claude Opus 4/4.1/4.5
- Claude Sonnet 4/4.5
- Claude Sonnet 3.5/3.5 v2/3.7
- Claude Haiku 3/3.5/4.5
Amazon Nova
- Nova Pro
- Nova Lite
- Nova Micro
Meta Llama
- Llama 3.3 70B Instruct
- Llama 3.2 (1B, 3B, 11B, 90B)
Other Models
- Mistral 7B/Large/Large 2
- Mixtral 8x7B
- DeepSeek R1
- OpenAI GPT-OSS 20B/120B
Key Features
Model Selection
Switch between models during your conversation to compare responses or leverage specific model strengths:- Claude Sonnet 3.7: Extended thinking capabilities with up to 64K tokens
- Amazon Nova Pro: Cost-effective general-purpose model
- Claude Haiku: Fast responses for simple queries
- DeepSeek R1: Advanced reasoning capabilities
The default model for new chats is
claude-v3.7-sonnet, but administrators can configure a different default model in the deployment settings.Conversation Management
- Persistent History: All conversations are automatically saved to Amazon DynamoDB
- Conversation Titles: Automatically generated using AI (default:
claude-v3-haiku) - Search: Find past conversations quickly
- Organization: Star important conversations for easy access
Generation Parameters
Fine-tune model behavior with adjustable parameters:Parameter Details
Parameter Details
- Max Tokens: Controls response length (1-64,000). Claude 3.7 with extended thinking supports up to 64K tokens.
- Temperature: Higher values (0.8-1.0) make output more creative; lower values (0.1-0.3) make it more focused.
- Top P: Nucleus sampling threshold. Lower values make output more deterministic.
- Top K: Limits the model to consider only top K tokens. Set to 0 to disable.
- Budget Tokens: For extended thinking models, controls reasoning depth (min: 1024, max: 64,000).
Cross-Region and Global Inference
Bedrock Chat supports both Cross-Region and Global Inference for enhanced throughput and resilience:- Global Inference: Routes requests to optimal regions worldwide based on latency and availability
- Cross-Region Inference: Routes requests within the same AWS region (e.g., within the US)
Streaming Responses
All model interactions use streaming for real-time response generation:- Responses appear token-by-token as they’re generated
- Cancel generation at any time
- Visual indicators for agent thinking and tool usage
Prompt Caching
For custom bots with instructions or knowledge bases, prompt caching reduces costs and latency by reusing processed prompts:Prompt caching is enabled by default for custom bots but can be disabled if needed.
Multi-Tenancy and Permissions
Conversations are isolated by user:- Each user has their own conversation history
- Conversations are not shared between users
- User authentication via Amazon Cognito
- Optional IP address restrictions via AWS WAF
Architecture
The chat interface leverages serverless AWS services:- Frontend: React + Tailwind CSS served via CloudFront
- Backend: FastAPI on Lambda with WebSocket support
- Storage: DynamoDB for conversation history
- AI: Amazon Bedrock for model inference
Usage Tips
Model Selection
Choose faster models (Haiku, Nova Lite) for simple queries and more powerful models (Opus, Sonnet 3.7) for complex tasks.
Context Management
Long conversations consume more tokens. Start a new conversation for different topics to optimize costs.
Temperature Tuning
Lower temperature (0.1-0.3) for factual responses, higher (0.7-1.0) for creative content.
Stop Sequences
Use custom stop sequences to control when the model stops generating (e.g., specific markers or delimiters).
Next Steps
Create Custom Bots
Build bots with custom instructions and personality
Add Knowledge
Enhance bots with RAG and custom knowledge
Enable Agents
Give bots tools to perform complex tasks
Share in Store
Publish your bots for others to use