OpenAI

The OpenAI plugin provides access to GPT models including GPT-4, GPT-4.1, and realtime models for voice interactions.

Installation

uv add vision-agents[openai]

Authentication

Set your API key in the environment:

export OPENAI_API_KEY=your_openai_api_key

Components

LLM - Text Generation (Responses API)

Use OpenAI’s modern Responses API for GPT-4.1 and newer models:

from vision_agents.plugins import openai, deepgram, cartesia, getstream, smart_turn
from vision_agents.core import Agent, User

llm = openai.LLM(model="gpt-4.1")

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="AI Assistant"),
    instructions="Be helpful and concise.",
    llm=llm,
    tts=cartesia.TTS(),
    stt=deepgram.STT(),
    turn_detection=smart_turn.TurnDetection()
)

model

string

required

The OpenAI model to use (e.g., gpt-4.1, gpt-4, gpt-4-turbo)

api_key

string

Optional API key. Defaults to OPENAI_API_KEY environment variable

base_url

string

Optional base URL for API endpoint

max_tool_rounds

int

default:"3"

Maximum number of function calling rounds

Realtime - Speech-to-Speech

Use OpenAI’s realtime API for direct audio-to-audio interactions:

from vision_agents.plugins import openai, getstream
from vision_agents.core import Agent, User

realtime = openai.Realtime()

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Voice Assistant"),
    instructions="Speak naturally and be friendly.",
    llm=realtime
)

Realtime mode handles audio processing directly - no separate TTS/STT needed.

TTS - Text-to-Speech

Use OpenAI’s TTS for voice synthesis:

from vision_agents.plugins import openai

tts = openai.TTS()

agent = Agent(
    llm=your_llm,
    tts=tts,
    # ... other config
)

Chat Completions Models

For compatibility with vLLM, TGI, Ollama, or legacy Chat Completions API:

ChatCompletionsLLM

For text-only models:

from vision_agents.plugins import openai

llm = openai.ChatCompletionsLLM(
    model="gpt-4",
    base_url="https://api.openai.com/v1",
    api_key="your_key"
)

ChatCompletionsVLM

For vision models (including third-party like Qwen):

from vision_agents.plugins import openai, deepgram, elevenlabs, getstream, vogent
from vision_agents.core import Agent, User

llm = openai.ChatCompletionsVLM(
    model="qwen3vl",
    base_url="https://model-xyz.api.baseten.co/production/predict",
    api_key="your_baseten_key"
)

agent = Agent(
    edge=getstream.Edge(),
    agent_user=User(name="Video Assistant"),
    instructions="Analyze video frames and answer questions.",
    llm=llm,
    stt=deepgram.STT(),
    tts=elevenlabs.TTS(),
    turn_detection=vogent.TurnDetection()
)

model

string

required

Model identifier

base_url

string

API endpoint URL for third-party providers

api_key

string

API key for authentication

Function Calling

from vision_agents.plugins import openai

llm = openai.LLM("gpt-4.1")
# Or use openai.Realtime() for realtime model

@llm.register_function(
    name="get_weather",
    description="Get the current weather for a given city"
)
def get_weather(city: str) -> dict:
    """Get weather information for a city."""
    return {
        "city": city,
        "temperature": 72,
        "condition": "Sunny"
    }

The function will be automatically called when the model decides to use it.

Configuration Examples

With Turn Detection

from vision_agents.plugins import openai, smart_turn

agent = Agent(
    llm=openai.LLM("gpt-4.1"),
    turn_detection=smart_turn.TurnDetection(
        buffer_duration=2.0,
        confidence_threshold=0.5
    ),
    # ... other config
)

With Multiple Modalities

from vision_agents.plugins import openai, deepgram, elevenlabs

agent = Agent(
    llm=openai.LLM("gpt-4.1"),
    stt=deepgram.STT(model="flux-general-en"),
    tts=elevenlabs.TTS(),
    # ... other config
)

Environment Variables

OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=model_to_use
OPENAI_REALTIME_MODEL=gpt-4o-realtime-preview

References

OpenAI API Documentation
Responses API
Chat Completions API
Plugin Source: plugins/openai/vision_agents/plugins/openai/__init__.py

Get Started

Core Concepts

Building Agents

Integrations

Examples

Installation

Authentication

Components

LLM - Text Generation (Responses API)

Realtime - Speech-to-Speech

TTS - Text-to-Speech

Chat Completions Models

ChatCompletionsLLM

ChatCompletionsVLM

Function Calling

Configuration Examples

With Turn Detection

With Multiple Modalities

Environment Variables

References

Build docs developers (and LLMs) love

Get Started

Core Concepts

Building Agents

Integrations

Examples

​Installation

​Authentication

​Components

​LLM - Text Generation (Responses API)

​Realtime - Speech-to-Speech

​TTS - Text-to-Speech

​Chat Completions Models

​ChatCompletionsLLM

​ChatCompletionsVLM

​Function Calling

​Configuration Examples

​With Turn Detection

​With Multiple Modalities

​Environment Variables

​References

Build docs developers (and LLMs) love

Installation

Authentication

Components

LLM - Text Generation (Responses API)

Realtime - Speech-to-Speech

TTS - Text-to-Speech

Chat Completions Models

ChatCompletionsLLM

ChatCompletionsVLM

Function Calling

Configuration Examples

With Turn Detection

With Multiple Modalities

Environment Variables

References