Skip to main content
The AWS plugin provides access to Amazon Bedrock models including Claude, Qwen, and Nova Sonic for speech-to-speech.

Installation

uv add vision-agents[aws]

Authentication

Set your AWS credentials in the environment:
export AWS_ACCESS_KEY_ID=your_aws_access_key
export AWS_SECRET_ACCESS_KEY=your_aws_secret_key
export AWS_REGION=us-east-1

Components

LLM - Standard Text Models

Access Bedrock models including Qwen and Claude:
from vision_agents.plugins import aws, getstream, cartesia, deepgram, smart_turn
from vision_agents.core import Agent, User

llm = aws.LLM(
    model="qwen.qwen3-32b-v1:0",
    region_name="us-east-1"
)

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="AI Assistant"),
    instructions="Be helpful and concise.",
    llm=llm,
    tts=cartesia.TTS(),
    stt=deepgram.STT(),
    turn_detection=smart_turn.TurnDetection(
        buffer_duration=2.0,
        confidence_threshold=0.5
    )
)
model
string
required
Bedrock model ID:
  • qwen.qwen3-32b-v1:0 - Qwen text model
  • anthropic.claude-3-haiku-20240307-v1:0 - Claude Haiku
  • anthropic.claude-3-5-sonnet-20241022-v2:0 - Claude Sonnet
  • anthropic.claude-opus-4-1-20250805-v1:0 - Claude Opus
region_name
string
default:"us-east-1"
AWS region name
aws_access_key_id
string
Optional AWS access key ID. Defaults to environment variable
aws_secret_access_key
string
Optional AWS secret access key. Defaults to environment variable

Vision Models (Claude)

Use Claude models on Bedrock for vision capabilities:
from vision_agents.plugins import aws

llm = aws.LLM(
    model="anthropic.claude-3-haiku-20240307-v1:0",
    region_name="us-east-1"
)

# Send image with text
response = await llm.converse(
    messages=[{
        "role": "user",
        "content": [
            {"image": {"format": "png", "source": {"bytes": image_bytes}}},
            {"text": "What do you see in this image?"}
        ]
    }]
)

Realtime - Nova Sonic Speech-to-Speech

Use AWS Nova 2 Sonic for realtime audio interactions:
from vision_agents.plugins import aws, getstream
from vision_agents.core import Agent, User

realtime = aws.Realtime(
    model="amazon.nova-2-sonic-v1:0",
    region_name="us-east-1",
    voice_id="matthew"
)

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Voice Agent"),
    instructions="Tell engaging stories.",
    llm=realtime
)
model
string
default:"amazon.nova-2-sonic-v1:0"
Nova Sonic model ID
voice_id
string
AWS Polly voice ID for synthesis. See AWS Nova documentation for available voices
The Realtime implementation includes automatic reconnection after silence periods or connection time limits.

TTS - AWS Polly

Convert text to speech using AWS Polly:
from vision_agents.plugins import aws

tts = aws.TTS(
    region_name="us-east-1",
    voice_id="Joanna",
    engine="neural",
    text_type="text",
    language_code="en-US"
)

agent = Agent(
    llm=aws.LLM(model="qwen.qwen3-32b-v1:0"),
    tts=tts,
    # ... other components
)
voice_id
string
default:"Joanna"
AWS Polly voice ID
engine
string
default:"neural"
standard or neural
text_type
string
default:"text"
text or ssml
language_code
string
default:"en-US"
Language code for synthesis

Function Calling

Standard LLM

Fully supports function calling:
from vision_agents.plugins import aws

llm = aws.LLM(
    model="qwen.qwen3-32b-v1:0",
    region_name="us-east-1"
)

@llm.register_function(
    name="get_weather",
    description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
    return {
        "city": city,
        "temperature": 72,
        "condition": "Sunny"
    }

Realtime (Nova Sonic)

Fully supports function calling in realtime:
from vision_agents.plugins import aws

realtime = aws.Realtime(
    model="amazon.nova-2-sonic-v1:0",
    region_name="us-east-1",
    voice_id="matthew"
)

@realtime.register_function(
    name="get_weather",
    description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
    return {
        "city": city,
        "temperature": 72,
        "condition": "Sunny"
    }
See example/aws_realtime_function_calling_example.py for a complete example.

Configuration

Environment Variables

# AWS Credentials
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
AWS_BEDROCK_API_KEY=optional_session_token

# Stream API (for video calls)
STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret

# Optional: Other services
CARTESIA_API_KEY=your_cartesia_key
DEEPGRAM_API_KEY=your_deepgram_key

Available Models

Text Models

  • qwen.qwen3-32b-v1:0 - Qwen 3 32B
  • anthropic.claude-3-haiku-20240307-v1:0 - Claude Haiku (vision capable)
  • anthropic.claude-3-5-sonnet-20241022-v2:0 - Claude Sonnet (vision capable)
  • anthropic.claude-opus-4-1-20250805-v1:0 - Claude Opus (vision capable)

Realtime Models

  • amazon.nova-2-sonic-v1:0 - Nova 2 Sonic (speech-to-speech)

Examples

Standard LLM Example

from vision_agents.core import Agent, User
from vision_agents.plugins import aws, getstream, cartesia, deepgram, smart_turn

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="AWS Agent"),
    instructions="Be helpful",
    llm=aws.LLM(
        model="qwen.qwen3-32b-v1:0",
        region_name="us-east-1"
    ),
    tts=cartesia.TTS(),
    stt=deepgram.STT(),
    turn_detection=smart_turn.TurnDetection()
)
See example/aws_realtime_nova_example.py for realtime usage.

References

Build docs developers (and LLMs) love