Requirements
Before installing Vision Agents, ensure you have:- Python 3.10 or higher (Python 3.12 recommended)
- uv package manager installed
Vision Agents uses uv for fast, reliable dependency management. If you don’t have uv installed, follow the uv installation guide.
Install uv
If you haven’t installed uv yet, install it with:Installation Methods
Basic Installation
Install the core Vision Agents package:Install with Plugins
Most applications need at least a few plugins. Install Vision Agents with the integrations you’ll use:Available Plugin Extras
Vision Agents supports 35+ integrations via optional plugin extras:LLM Providers
Install language model providers:Speech-to-Text (STT)
Install speech recognition providers:Text-to-Speech (TTS)
Install voice synthesis providers:Vision & Video Processing
Install computer vision and video processing tools:Edge Networks
Install video/audio edge network providers:Stream
Stream provides ultra-low-latency WebRTC infrastructure with SDKs for React, iOS, Android, Flutter, React Native, and Unity. Free tier includes 333,000 participant minutes per month.
Specialized Services
Install additional capabilities:Combining Multiple Plugins
Install multiple plugins at once by listing them in brackets:Environment Setup
API Keys
Vision Agents uses environment variables for API credentials. Create a.env file in your project root:
.env
Vision Agents automatically loads
.env files using python-dotenv. Make sure to add .env to your .gitignore to avoid committing secrets.Getting API Keys
Here’s where to get credentials for popular services:| Service | Free Tier | Get API Key |
|---|---|---|
| Stream | 333,000 minutes/month | getstream.io |
| Gemini | Generous free tier | ai.google.dev |
| OpenAI | Pay-as-you-go | platform.openai.com |
| Deepgram | $200 free credits | deepgram.com |
| ElevenLabs | 10,000 chars/month | elevenlabs.io |
| Roboflow | Free tier available | roboflow.com |
| Anthropic | Pay-as-you-go | console.anthropic.com |
Verify Installation
Verify your installation by running a simple test:test_install.py
Project Structure
Here’s a recommended project structure for Vision Agents applications:Initialize a New Project
Create a new Vision Agents project:Development Installation
To contribute to Vision Agents or run examples from the repository:Install Development Dependencies
- All plugin extras
- Development tools (ruff, mypy, pytest)
- Pre-commit hooks
See DEVELOPMENT.md in the repository for detailed contribution guidelines.
Updating Vision Agents
Keep Vision Agents up to date:Troubleshooting
Installation fails with dependency conflicts
Installation fails with dependency conflicts
Solution:Vision Agents uses
numpy<2.0 due to compatibility requirements. If you see numpy conflicts:Import errors after installation
Import errors after installation
Check:
-
Verify the plugin is installed:
-
Install missing plugins:
- Restart your Python interpreter
uv command not found
uv command not found
Solution:If
uv isn’t in your PATH after installation:- Restart your terminal
-
Or manually add to PATH:
-
Verify:
GPU acceleration for YOLO/vision models
GPU acceleration for YOLO/vision models
For CUDA GPU Support:Install PyTorch with CUDA support before Vision Agents:For Apple Silicon (MPS):PyTorch with MPS support is included by default on macOS.
Environment variables not loading
Environment variables not loading
Check:
-
.envfile is in your project root -
You’re calling
load_dotenv()in your code: - Variable names match exactly (case-sensitive)
Next Steps
Now that you have Vision Agents installed:Quickstart Guide
Build your first agent in 5 minutes
Core Concepts
Learn about agents, processors, and architecture
Browse Examples
Explore real-world examples and use cases
Integration Guides
Configure LLMs, STT, TTS, and vision models
Getting Help
Need help with installation?- Discord - Join our community
- GitHub Issues - Report installation problems
- Documentation - Browse all guides