The OpenAI plugin provides access to GPT models including GPT-4, GPT-4.1, and realtime models for voice interactions.
Installation
uv add vision-agents[openai]
Authentication
Set your API key in the environment:
export OPENAI_API_KEY=your_openai_api_key
Components
LLM - Text Generation (Responses API)
Use OpenAI’s modern Responses API for GPT-4.1 and newer models:
from vision_agents.plugins import openai, deepgram, cartesia, getstream, smart_turn
from vision_agents.core import Agent, User
llm = openai.LLM(model="gpt-4.1")
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="AI Assistant"),
instructions="Be helpful and concise.",
llm=llm,
tts=cartesia.TTS(),
stt=deepgram.STT(),
turn_detection=smart_turn.TurnDetection()
)
The OpenAI model to use (e.g., gpt-4.1, gpt-4, gpt-4-turbo)
Optional API key. Defaults to OPENAI_API_KEY environment variable
Optional base URL for API endpoint
Maximum number of function calling rounds
Realtime - Speech-to-Speech
Use OpenAI’s realtime API for direct audio-to-audio interactions:
from vision_agents.plugins import openai, getstream
from vision_agents.core import Agent, User
realtime = openai.Realtime()
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Voice Assistant"),
instructions="Speak naturally and be friendly.",
llm=realtime
)
Realtime mode handles audio processing directly - no separate TTS/STT needed.
TTS - Text-to-Speech
Use OpenAI’s TTS for voice synthesis:
from vision_agents.plugins import openai
tts = openai.TTS()
agent = Agent(
llm=your_llm,
tts=tts,
# ... other config
)
Chat Completions Models
For compatibility with vLLM, TGI, Ollama, or legacy Chat Completions API:
ChatCompletionsLLM
For text-only models:
from vision_agents.plugins import openai
llm = openai.ChatCompletionsLLM(
model="gpt-4",
base_url="https://api.openai.com/v1",
api_key="your_key"
)
ChatCompletionsVLM
For vision models (including third-party like Qwen):
from vision_agents.plugins import openai, deepgram, elevenlabs, getstream, vogent
from vision_agents.core import Agent, User
llm = openai.ChatCompletionsVLM(
model="qwen3vl",
base_url="https://model-xyz.api.baseten.co/production/predict",
api_key="your_baseten_key"
)
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Video Assistant"),
instructions="Analyze video frames and answer questions.",
llm=llm,
stt=deepgram.STT(),
tts=elevenlabs.TTS(),
turn_detection=vogent.TurnDetection()
)
API endpoint URL for third-party providers
API key for authentication
Function Calling
Register custom functions for the model to invoke:
from vision_agents.plugins import openai
llm = openai.LLM("gpt-4.1")
# Or use openai.Realtime() for realtime model
@llm.register_function(
name="get_weather",
description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
"""Get weather information for a city."""
return {
"city": city,
"temperature": 72,
"condition": "Sunny"
}
The function will be automatically called when the model decides to use it.
Configuration Examples
With Turn Detection
from vision_agents.plugins import openai, smart_turn
agent = Agent(
llm=openai.LLM("gpt-4.1"),
turn_detection=smart_turn.TurnDetection(
buffer_duration=2.0,
confidence_threshold=0.5
),
# ... other config
)
With Multiple Modalities
from vision_agents.plugins import openai, deepgram, elevenlabs
agent = Agent(
llm=openai.LLM("gpt-4.1"),
stt=deepgram.STT(model="flux-general-en"),
tts=elevenlabs.TTS(),
# ... other config
)
Environment Variables
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=model_to_use
OPENAI_REALTIME_MODEL=gpt-4o-realtime-preview
References