Skip to main content

Requirements

Before installing Vision Agents, ensure you have:
  • Python 3.10 or higher (Python 3.12 recommended)
  • uv package manager installed
Vision Agents uses uv for fast, reliable dependency management. If you don’t have uv installed, follow the uv installation guide.

Install uv

If you haven’t installed uv yet, install it with:
curl -LsSf https://astral.sh/uv/install.sh | sh
Verify the installation:
uv --version

Installation Methods

Basic Installation

Install the core Vision Agents package:
uv add vision-agents
This installs the base framework without any plugin integrations. You’ll need to install plugins separately based on your needs.

Install with Plugins

Most applications need at least a few plugins. Install Vision Agents with the integrations you’ll use:
# Recommended for voice AI agents
uv add "vision-agents[getstream,gemini,elevenlabs,deepgram]"
Performance Note: Installing all-plugins downloads many dependencies and can take several minutes. Only install the plugins you actually need for faster installation and smaller project size.

Available Plugin Extras

Vision Agents supports 35+ integrations via optional plugin extras:

LLM Providers

Install language model providers:
uv add "vision-agents[openai]"

Speech-to-Text (STT)

Install speech recognition providers:
uv add "vision-agents[deepgram]"

Text-to-Speech (TTS)

Install voice synthesis providers:
uv add "vision-agents[elevenlabs]"

Vision & Video Processing

Install computer vision and video processing tools:
uv add "vision-agents[ultralytics]"

Edge Networks

Install video/audio edge network providers:
Stream
uv add "vision-agents[getstream]"
Stream provides ultra-low-latency WebRTC infrastructure with SDKs for React, iOS, Android, Flutter, React Native, and Unity. Free tier includes 333,000 participant minutes per month.

Specialized Services

Install additional capabilities:
# Advanced turn detection
uv add "vision-agents[smart-turn]"

# Neural turn detection
uv add "vision-agents[vogent]"

Combining Multiple Plugins

Install multiple plugins at once by listing them in brackets:
uv add "vision-agents[getstream,openai,gemini,ultralytics,roboflow,deepgram,elevenlabs]"

Environment Setup

API Keys

Vision Agents uses environment variables for API credentials. Create a .env file in your project root:
.env
# Stream (Edge Network)
STREAM_API_KEY=your_stream_api_key
STREAM_API_SECRET=your_stream_api_secret

# LLM Providers
GEMINI_API_KEY=your_gemini_api_key
OPENAI_API_KEY=your_openai_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key

# Speech Services
DEEPGRAM_API_KEY=your_deepgram_api_key
ELEVENLABS_API_KEY=your_elevenlabs_api_key

# Vision Services (optional)
ROBOFLOW_API_KEY=your_roboflow_api_key
Vision Agents automatically loads .env files using python-dotenv. Make sure to add .env to your .gitignore to avoid committing secrets.

Getting API Keys

Here’s where to get credentials for popular services:
ServiceFree TierGet API Key
Stream333,000 minutes/monthgetstream.io
GeminiGenerous free tierai.google.dev
OpenAIPay-as-you-goplatform.openai.com
Deepgram$200 free creditsdeepgram.com
ElevenLabs10,000 chars/monthelevenlabs.io
RoboflowFree tier availableroboflow.com
AnthropicPay-as-you-goconsole.anthropic.com

Verify Installation

Verify your installation by running a simple test:
test_install.py
from vision_agents.core import Agent, User
from vision_agents.plugins import getstream, gemini

print("Vision Agents installed successfully!")
print(f"Agent class: {Agent}")
print(f"Stream Edge: {getstream.Edge}")
print(f"Gemini LLM: {gemini.LLM}")
Run the test:
uv run python test_install.py
You should see output confirming the imports:
Vision Agents installed successfully!
Agent class: <class 'vision_agents.core.agents.agents.Agent'>
Stream Edge: <class 'vision_agents.plugins.getstream.edge.Edge'>
Gemini LLM: <class 'vision_agents.plugins.gemini.llm.LLM'>

Project Structure

Here’s a recommended project structure for Vision Agents applications:
my-vision-agent/
├── .env                 # API keys (DO NOT COMMIT)
├── .gitignore          # Git ignore file
├── pyproject.toml      # Project dependencies
├── my_agent.py         # Main agent code
├── instructions.md     # Agent instructions
└── processors/         # Custom processors
    └── __init__.py

Initialize a New Project

Create a new Vision Agents project:
1

Create Project Directory

mkdir my-vision-agent
cd my-vision-agent
2

Initialize uv Project

uv init
3

Install Dependencies

uv add "vision-agents[getstream,gemini,elevenlabs,deepgram]"
4

Create .env File

touch .env
# Add your API keys to .env
5

Add to .gitignore

.gitignore
.env
__pycache__/
*.pyc
.pytest_cache/

Development Installation

To contribute to Vision Agents or run examples from the repository:
1

Clone the Repository

git clone https://github.com/GetStream/Vision-Agents.git
cd Vision-Agents
2

Install Development Dependencies

uv sync --all-extras --dev
This installs:
  • All plugin extras
  • Development tools (ruff, mypy, pytest)
  • Pre-commit hooks
3

Run Tests

Verify your development setup:
uv run pytest -m "not integration"
See DEVELOPMENT.md in the repository for detailed contribution guidelines.

Updating Vision Agents

Keep Vision Agents up to date:
# Update to latest version
uv sync --upgrade-package vision-agents

# Update all dependencies
uv sync --upgrade
Check your installed version:
uv run python -c "import vision_agents; print(vision_agents.__version__)"

Troubleshooting

Solution:Vision Agents uses numpy<2.0 due to compatibility requirements. If you see numpy conflicts:
# Clear uv cache and reinstall
uv cache clean
uv sync --reinstall
Check:
  1. Verify the plugin is installed:
    uv pip list | grep vision-agents
    
  2. Install missing plugins:
    uv add "vision-agents[plugin-name]"
    
  3. Restart your Python interpreter
Solution:If uv isn’t in your PATH after installation:
  1. Restart your terminal
  2. Or manually add to PATH:
    export PATH="$HOME/.cargo/bin:$PATH"
    
  3. Verify:
    which uv
    
For CUDA GPU Support:Install PyTorch with CUDA support before Vision Agents:
# CUDA 11.8
uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

# Then install Vision Agents
uv add "vision-agents[ultralytics]"
For Apple Silicon (MPS):PyTorch with MPS support is included by default on macOS.
Check:
  1. .env file is in your project root
  2. You’re calling load_dotenv() in your code:
    from dotenv import load_dotenv
    load_dotenv()  # Load .env file
    
  3. Variable names match exactly (case-sensitive)

Next Steps

Now that you have Vision Agents installed:

Quickstart Guide

Build your first agent in 5 minutes

Core Concepts

Learn about agents, processors, and architecture

Browse Examples

Explore real-world examples and use cases

Integration Guides

Configure LLMs, STT, TTS, and vision models

Getting Help

Need help with installation?

Build docs developers (and LLMs) love