Installation

Requirements

Before installing Vision Agents, ensure you have:

Python 3.10 or higher (Python 3.12 recommended)
uv package manager installed

Vision Agents uses uv for fast, reliable dependency management. If you don’t have uv installed, follow the uv installation guide.

Install uv

If you haven’t installed uv yet, install it with:

curl -LsSf https://astral.sh/uv/install.sh | sh

Verify the installation:

uv --version

Installation Methods

Basic Installation

Install the core Vision Agents package:

uv add vision-agents

This installs the base framework without any plugin integrations. You’ll need to install plugins separately based on your needs.

Install with Plugins

Most applications need at least a few plugins. Install Vision Agents with the integrations you’ll use:

# Recommended for voice AI agents
uv add "vision-agents[getstream,gemini,elevenlabs,deepgram]"

Performance Note: Installing all-plugins downloads many dependencies and can take several minutes. Only install the plugins you actually need for faster installation and smaller project size.

Available Plugin Extras

Vision Agents supports 35+ integrations via optional plugin extras:

LLM Providers

Install language model providers:

uv add "vision-agents[openai]"

Speech-to-Text (STT)

Install speech recognition providers:

uv add "vision-agents[deepgram]"

Text-to-Speech (TTS)

Install voice synthesis providers:

uv add "vision-agents[elevenlabs]"

Vision & Video Processing

Install computer vision and video processing tools:

uv add "vision-agents[ultralytics]"

Edge Networks

Install video/audio edge network providers:

Stream

uv add "vision-agents[getstream]"

Stream provides ultra-low-latency WebRTC infrastructure with SDKs for React, iOS, Android, Flutter, React Native, and Unity. Free tier includes 333,000 participant minutes per month.

Specialized Services

Install additional capabilities:

# Advanced turn detection
uv add "vision-agents[smart-turn]"

# Neural turn detection
uv add "vision-agents[vogent]"

Combining Multiple Plugins

Install multiple plugins at once by listing them in brackets:

uv add "vision-agents[getstream,openai,gemini,ultralytics,roboflow,deepgram,elevenlabs]"

Environment Setup

API Keys

Vision Agents uses environment variables for API credentials. Create a .env file in your project root:

.env

# Stream (Edge Network)
STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret

# LLM Providers
GEMINI_API_KEY=your_gemini_api_key
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key

# Speech Services
DEEPGRAM_API_KEY=your_deepgram_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key

# Vision Services (optional)
ROBOFLOW_API_KEY=your_roboflow_api_key

Vision Agents automatically loads .env files using python-dotenv. Make sure to add .env to your .gitignore to avoid committing secrets.

Getting API Keys

Here’s where to get credentials for popular services:

Service	Free Tier	Get API Key
Stream	333,000 minutes/month	getstream.io
Gemini	Generous free tier	ai.google.dev
OpenAI	Pay-as-you-go	platform.openai.com
Deepgram	$200 free credits	deepgram.com
ElevenLabs	10,000 chars/month	elevenlabs.io
Roboflow	Free tier available	roboflow.com
Anthropic	Pay-as-you-go	console.anthropic.com

Verify Installation

Verify your installation by running a simple test:

test_install.py

from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, gemini

print("Vision Agents installed successfully!")
print(f"Agent class: {Agent}")
print(f"Stream Edge: {getstream.Edge}")
print(f"Gemini LLM: {gemini.LLM}")

Run the test:

uv run python test_install.py

You should see output confirming the imports:

Vision Agents installed successfully!
Agent class: <class 'vision_agents.core.agents.agents.Agent'>
Stream Edge: <class 'vision_agents.plugins.getstream.edge.Edge'>
Gemini LLM: <class 'vision_agents.plugins.gemini.llm.LLM'>

Project Structure

Here’s a recommended project structure for Vision Agents applications:

my-vision-agent/
├── .env                 # API keys (DO NOT COMMIT)
├── .gitignore          # Git ignore file
├── pyproject.toml      # Project dependencies
├── my_agent.py         # Main agent code
├── instructions.md     # Agent instructions
└── processors/         # Custom processors
    └── __init__.py

Initialize a New Project

Create a new Vision Agents project:

Create Project Directory

mkdir my-vision-agent
cd my-vision-agent

Initialize uv Project

uv init

Install Dependencies

uv add "vision-agents[getstream,gemini,elevenlabs,deepgram]"

Create .env File

touch .env
# Add your API keys to .env

Add to .gitignore

.gitignore

.env
__pycache__/
*.pyc
.pytest_cache/

Development Installation

To contribute to Vision Agents or run examples from the repository:

Clone the Repository

git clone https://github.com/GetStream/Vision-Agents.git
cd Vision-Agents

Install Development Dependencies

uv sync --all-extras --dev

This installs:

All plugin extras
Development tools (ruff, mypy, pytest)
Pre-commit hooks

Run Tests

Verify your development setup:

uv run pytest -m "not integration"

See DEVELOPMENT.md in the repository for detailed contribution guidelines.

Updating Vision Agents

Keep Vision Agents up to date:

# Update to latest version
uv sync --upgrade-package vision-agents

# Update all dependencies
uv sync --upgrade

Check your installed version:

uv run python -c "import vision_agents; print(vision_agents.__version__)"

Troubleshooting

Installation fails with dependency conflicts

Solution:Vision Agents uses numpy<2.0 due to compatibility requirements. If you see numpy conflicts:

# Clear uv cache and reinstall
uv cache clean
uv sync --reinstall

Import errors after installation

Check:

Verify the plugin is installed:
```
uv pip list | grep vision-agents
```
Install missing plugins:
```
uv add "vision-agents[plugin-name]"
```
Restart your Python interpreter

uv command not found

Solution:If uv isn’t in your PATH after installation:

Restart your terminal
Or manually add to PATH:
```
export PATH="$HOME/.cargo/bin:$PATH"
```
Verify:
```
which uv
```

GPU acceleration for YOLO/vision models

For CUDA GPU Support:Install PyTorch with CUDA support before Vision Agents:

# CUDA 11.8
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Then install Vision Agents
uv add "vision-agents[ultralytics]"

For Apple Silicon (MPS):PyTorch with MPS support is included by default on macOS.

Environment variables not loading

Check:

.env file is in your project root

You’re calling load_dotenv() in your code:

from dotenv import load_dotenv
load_dotenv()  # Load .env file

Variable names match exactly (case-sensitive)

Next Steps

Now that you have Vision Agents installed:

Quickstart Guide

Build your first agent in 5 minutes

Core Concepts

Learn about agents, processors, and architecture

Browse Examples

Explore real-world examples and use cases

Integration Guides

Configure LLMs, STT, TTS, and vision models

Getting Help

Need help with installation?

Discord - Join our community
GitHub Issues - Report installation problems
Documentation - Browse all guides

Get Started

Core Concepts

Building Agents

Integrations

Examples

Requirements

Install uv

Installation Methods

Basic Installation

Install with Plugins

Available Plugin Extras

LLM Providers

Speech-to-Text (STT)

Text-to-Speech (TTS)

Vision & Video Processing

Edge Networks

Specialized Services

Combining Multiple Plugins

Environment Setup

API Keys

Getting API Keys

Verify Installation

Project Structure

Initialize a New Project

Development Installation

Updating Vision Agents

Troubleshooting

Next Steps

Quickstart Guide

Core Concepts

Browse Examples

Integration Guides

Getting Help

Build docs developers (and LLMs) love

Get Started

Core Concepts

Building Agents

Integrations

Examples

​Requirements

​Install uv

​Installation Methods

​Basic Installation

​Install with Plugins

​Available Plugin Extras

​LLM Providers

​Speech-to-Text (STT)

​Text-to-Speech (TTS)

​Vision & Video Processing

​Edge Networks

​Specialized Services

​Combining Multiple Plugins

​Environment Setup

​API Keys

​Getting API Keys

​Verify Installation

​Project Structure

​Initialize a New Project

​Development Installation

​Updating Vision Agents

​Troubleshooting

​Next Steps

Quickstart Guide

Core Concepts

Browse Examples

Integration Guides

​Getting Help

Build docs developers (and LLMs) love

Requirements

Install uv

Installation Methods

Basic Installation

Install with Plugins

Available Plugin Extras

LLM Providers

Speech-to-Text (STT)

Text-to-Speech (TTS)

Vision & Video Processing

Edge Networks

Specialized Services

Combining Multiple Plugins

Environment Setup

API Keys

Getting API Keys

Verify Installation

Project Structure

Initialize a New Project

Development Installation

Updating Vision Agents

Troubleshooting

Next Steps

Getting Help