Installation
Authentication
Set your API key in the environment:Components
LLM - Text Generation
Use Gemini models for text-based conversations:The Gemini model to use. Options include
gemini-3-pro-preview, gemini-3-flash-previewOptional API key. Defaults to
GOOGLE_API_KEY or GEMINI_API_KEY environment variableOptional thinking level for Gemini 3. Use
ThinkingLevel.LOW or ThinkingLevel.HIGH for complex reasoningResolution for multimodal processing:
MEDIA_RESOLUTION_LOW- Fast processingMEDIA_RESOLUTION_MEDIUM- Balanced (recommended for PDFs)MEDIA_RESOLUTION_HIGH- Best quality (recommended for images)
VLM - Vision Language Model
Use Gemini’s vision capabilities to analyze video frames:Vision model to use
Frame rate for video processing (frames per second)
Number of seconds of video frames to buffer
Realtime - Speech-to-Speech
Direct speech-to-speech with Gemini Live:Built-in Tools
Gemini supports several built-in tools via thetools parameter:
Available Built-in Tools
tools.FileSearch(store)- RAG over your documentstools.GoogleSearch()- Ground responses with web datatools.CodeExecution()- Run Python codetools.URLContext()- Read specific web pagestools.GoogleMaps()- Location-aware queries (Preview)tools.ComputerUse()- Browser automation (Preview)
Function Calling
Register custom functions for the model to call:Migration from Gemini 2.5
When upgrading to Gemini 3:- Thinking: Replace complex prompts with
thinking_level="high" - Temperature: Remove explicit low temperature settings to avoid looping
- PDFs: Test with
media_resolution="high"for dense documents - Token Usage: May increase for PDFs but decrease for video
References
- Gemini API Documentation
- Plugin Source:
plugins/gemini/vision_agents/plugins/gemini/__init__.py