AWS Bedrock

The AWS plugin provides access to Amazon Bedrock models including Claude, Qwen, and Nova Sonic for speech-to-speech.

Installation

uv add vision-agents[aws]

Authentication

Set your AWS credentials in the environment:

export AWS_ACCESS_KEY_ID=your_aws_access_key
export AWS_SECRET_ACCESS_KEY=your_aws_secret_key
export AWS_REGION=us-east-1

Components

LLM - Standard Text Models

Access Bedrock models including Qwen and Claude:

from vision_agents.plugins import aws, getstream, cartesia, deepgram, smart_turn
from vision_agents.core import Agent, User

llm = aws.LLM(
    model="qwen.qwen3-32b-v1:0",
    region_name="us-east-1"
)

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="AI Assistant"),
    instructions="Be helpful and concise.",
    llm=llm,
    tts=cartesia.TTS(),
    stt=deepgram.STT(),
    turn_detection=smart_turn.TurnDetection(
        buffer_duration=2.0,
        confidence_threshold=0.5
    )
)

model

string

required

Bedrock model ID:

qwen.qwen3-32b-v1:0 - Qwen text model
anthropic.claude-3-haiku-20240307-v1:0 - Claude Haiku
anthropic.claude-3-5-sonnet-20241022-v2:0 - Claude Sonnet
anthropic.claude-opus-4-1-20250805-v1:0 - Claude Opus

region_name

string

default:"us-east-1"

AWS region name

aws_access_key_id

string

Optional AWS access key ID. Defaults to environment variable

aws_secret_access_key

string

Optional AWS secret access key. Defaults to environment variable

Vision Models (Claude)

Use Claude models on Bedrock for vision capabilities:

from vision_agents.plugins import aws

llm = aws.LLM(
    model="anthropic.claude-3-haiku-20240307-v1:0",
    region_name="us-east-1"
)

# Send image with text
response = await llm.converse(
    messages=[{
        "role": "user",
        "content": [
            {"image": {"format": "png", "source": {"bytes": image_bytes}}},
            {"text": "What do you see in this image?"}
        ]
    }]
)

Realtime - Nova Sonic Speech-to-Speech

Use AWS Nova 2 Sonic for realtime audio interactions:

from vision_agents.plugins import aws, getstream
from vision_agents.core import Agent, User

realtime = aws.Realtime(
    model="amazon.nova-2-sonic-v1:0",
    region_name="us-east-1",
    voice_id="matthew"
)

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Voice Agent"),
    instructions="Tell engaging stories.",
    llm=realtime
)

model

string

default:"amazon.nova-2-sonic-v1:0"

Nova Sonic model ID

voice_id

string

AWS Polly voice ID for synthesis. See AWS Nova documentation for available voices

The Realtime implementation includes automatic reconnection after silence periods or connection time limits.

TTS - AWS Polly

Convert text to speech using AWS Polly:

from vision_agents.plugins import aws

tts = aws.TTS(
    region_name="us-east-1",
    voice_id="Joanna",
    engine="neural",
    text_type="text",
    language_code="en-US"
)

agent = Agent(
    llm=aws.LLM(model="qwen.qwen3-32b-v1:0"),
    tts=tts,
    # ... other components
)

voice_id

string

default:"Joanna"

AWS Polly voice ID

engine

string

default:"neural"

standard or neural

text_type

string

default:"text"

text or ssml

language_code

string

default:"en-US"

Language code for synthesis

Function Calling

Standard LLM

Fully supports function calling:

from vision_agents.plugins import aws

llm = aws.LLM(
    model="qwen.qwen3-32b-v1:0",
    region_name="us-east-1"
)

@llm.register_function(
    name="get_weather",
    description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
    return {
        "city": city,
        "temperature": 72,
        "condition": "Sunny"
    }

Realtime (Nova Sonic)

Fully supports function calling in realtime:

from vision_agents.plugins import aws

realtime = aws.Realtime(
    model="amazon.nova-2-sonic-v1:0",
    region_name="us-east-1",
    voice_id="matthew"
)

@realtime.register_function(
    name="get_weather",
    description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
    return {
        "city": city,
        "temperature": 72,
        "condition": "Sunny"
    }

See example/aws_realtime_function_calling_example.py for a complete example.

Configuration

Environment Variables

# AWS Credentials
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_REGION=us-east-1
AWS_BEDROCK_API_KEY=optional_session_token

# Stream API (for video calls)
STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret

# Optional: Other services
CARTESIA_API_KEY=your_cartesia_key
DEEPGRAM_API_KEY=your_deepgram_key

Available Models

Text Models

qwen.qwen3-32b-v1:0 - Qwen 3 32B
anthropic.claude-3-haiku-20240307-v1:0 - Claude Haiku (vision capable)
anthropic.claude-3-5-sonnet-20241022-v2:0 - Claude Sonnet (vision capable)
anthropic.claude-opus-4-1-20250805-v1:0 - Claude Opus (vision capable)

Realtime Models

amazon.nova-2-sonic-v1:0 - Nova 2 Sonic (speech-to-speech)

Examples

Standard LLM Example

from vision_agents.core import Agent, User
from vision_agents.plugins import aws, getstream, cartesia, deepgram, smart_turn

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="AWS Agent"),
    instructions="Be helpful",
    llm=aws.LLM(
        model="qwen.qwen3-32b-v1:0",
        region_name="us-east-1"
    ),
    tts=cartesia.TTS(),
    stt=deepgram.STT(),
    turn_detection=smart_turn.TurnDetection()
)

See example/aws_realtime_nova_example.py for realtime usage.

References

AWS Bedrock Documentation
Bedrock Converse API
Plugin Source: plugins/aws/vision_agents/plugins/aws/__init__.py

Get Started

Core Concepts

Building Agents

Integrations

Examples

Installation

Authentication

Components

LLM - Standard Text Models

Vision Models (Claude)

Realtime - Nova Sonic Speech-to-Speech

TTS - AWS Polly

Function Calling

Standard LLM

Realtime (Nova Sonic)

Configuration

Environment Variables

Available Models

Text Models

Realtime Models

Examples

Standard LLM Example

References

Build docs developers (and LLMs) love

Get Started

Core Concepts

Building Agents

Integrations

Examples

​Installation

​Authentication

​Components

​LLM - Standard Text Models

​Vision Models (Claude)

​Realtime - Nova Sonic Speech-to-Speech

​TTS - AWS Polly

​Function Calling

​Standard LLM

​Realtime (Nova Sonic)

​Configuration

​Environment Variables

​Available Models

​Text Models

​Realtime Models

​Examples

​Standard LLM Example

​References

Build docs developers (and LLMs) love

Installation

Authentication

Components

LLM - Standard Text Models

Vision Models (Claude)

Realtime - Nova Sonic Speech-to-Speech

TTS - AWS Polly

Function Calling

Standard LLM

Realtime (Nova Sonic)

Configuration

Environment Variables

Available Models

Text Models

Realtime Models

Examples

Standard LLM Example

References