The AWS plugin provides access to Amazon Bedrock models including Claude, Qwen, and Nova Sonic for speech-to-speech.
Installation
uv add vision-agents[aws]
Authentication
Set your AWS credentials in the environment:
export AWS_ACCESS_KEY_ID=your_aws_access_key
export AWS_SECRET_ACCESS_KEY=your_aws_secret_key
export AWS_REGION=us-east-1
Components
LLM - Standard Text Models
Access Bedrock models including Qwen and Claude:
from vision_agents.plugins import aws, getstream, cartesia, deepgram, smart_turn
from vision_agents.core import Agent, User
llm = aws.LLM(
model="qwen.qwen3-32b-v1:0",
region_name="us-east-1"
)
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="AI Assistant"),
instructions="Be helpful and concise.",
llm=llm,
tts=cartesia.TTS(),
stt=deepgram.STT(),
turn_detection=smart_turn.TurnDetection(
buffer_duration=2.0,
confidence_threshold=0.5
)
)
Bedrock model ID:
qwen.qwen3-32b-v1:0 - Qwen text model
anthropic.claude-3-haiku-20240307-v1:0 - Claude Haiku
anthropic.claude-3-5-sonnet-20241022-v2:0 - Claude Sonnet
anthropic.claude-opus-4-1-20250805-v1:0 - Claude Opus
region_name
string
default:"us-east-1"
AWS region name
Optional AWS access key ID. Defaults to environment variable
Optional AWS secret access key. Defaults to environment variable
Vision Models (Claude)
Use Claude models on Bedrock for vision capabilities:
from vision_agents.plugins import aws
llm = aws.LLM(
model="anthropic.claude-3-haiku-20240307-v1:0",
region_name="us-east-1"
)
# Send image with text
response = await llm.converse(
messages=[{
"role": "user",
"content": [
{"image": {"format": "png", "source": {"bytes": image_bytes}}},
{"text": "What do you see in this image?"}
]
}]
)
Realtime - Nova Sonic Speech-to-Speech
Use AWS Nova 2 Sonic for realtime audio interactions:
from vision_agents.plugins import aws, getstream
from vision_agents.core import Agent, User
realtime = aws.Realtime(
model="amazon.nova-2-sonic-v1:0",
region_name="us-east-1",
voice_id="matthew"
)
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Voice Agent"),
instructions="Tell engaging stories.",
llm=realtime
)
model
string
default:"amazon.nova-2-sonic-v1:0"
Nova Sonic model ID
The Realtime implementation includes automatic reconnection after silence periods or connection time limits.
TTS - AWS Polly
Convert text to speech using AWS Polly:
from vision_agents.plugins import aws
tts = aws.TTS(
region_name="us-east-1",
voice_id="Joanna",
engine="neural",
text_type="text",
language_code="en-US"
)
agent = Agent(
llm=aws.LLM(model="qwen.qwen3-32b-v1:0"),
tts=tts,
# ... other components
)
Language code for synthesis
Function Calling
Standard LLM
Fully supports function calling:
from vision_agents.plugins import aws
llm = aws.LLM(
model="qwen.qwen3-32b-v1:0",
region_name="us-east-1"
)
@llm.register_function(
name="get_weather",
description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
return {
"city": city,
"temperature": 72,
"condition": "Sunny"
}
Realtime (Nova Sonic)
Fully supports function calling in realtime:
from vision_agents.plugins import aws
realtime = aws.Realtime(
model="amazon.nova-2-sonic-v1:0",
region_name="us-east-1",
voice_id="matthew"
)
@realtime.register_function(
name="get_weather",
description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
return {
"city": city,
"temperature": 72,
"condition": "Sunny"
}
See example/aws_realtime_function_calling_example.py for a complete example.
Configuration
Environment Variables
# AWS Credentials
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
AWS_BEDROCK_API_KEY=optional_session_token
# Stream API (for video calls)
STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret
# Optional: Other services
CARTESIA_API_KEY=your_cartesia_key
DEEPGRAM_API_KEY=your_deepgram_key
Available Models
Text Models
qwen.qwen3-32b-v1:0 - Qwen 3 32B
anthropic.claude-3-haiku-20240307-v1:0 - Claude Haiku (vision capable)
anthropic.claude-3-5-sonnet-20241022-v2:0 - Claude Sonnet (vision capable)
anthropic.claude-opus-4-1-20250805-v1:0 - Claude Opus (vision capable)
Realtime Models
amazon.nova-2-sonic-v1:0 - Nova 2 Sonic (speech-to-speech)
Examples
Standard LLM Example
from vision_agents.core import Agent, User
from vision_agents.plugins import aws, getstream, cartesia, deepgram, smart_turn
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="AWS Agent"),
instructions="Be helpful",
llm=aws.LLM(
model="qwen.qwen3-32b-v1:0",
region_name="us-east-1"
),
tts=cartesia.TTS(),
stt=deepgram.STT(),
turn_detection=smart_turn.TurnDetection()
)
See example/aws_realtime_nova_example.py for realtime usage.
References