Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/getzep/graphiti/llms.txt

Use this file to discover all available pages before exploring further.

OpenAIClient

The primary client for OpenAI’s GPT models with support for structured outputs using the responses.parse API.

Installation

pip install graphiti-core
The OpenAI SDK is included by default.

Basic Usage

from graphiti_core.llm_client import OpenAIClient
from graphiti_core.llm_client.config import LLMConfig
from graphiti_core.prompts.models import Message
from pydantic import BaseModel

# Initialize client
client = OpenAIClient(
    config=LLMConfig(
        api_key="sk-...",
        model="gpt-4.1-mini",
        temperature=1.0,
        max_tokens=16384
    )
)

# Define response structure
class ExtractedInfo(BaseModel):
    name: str
    age: int
    occupation: str

# Generate structured response
messages = [
    Message(role="system", content="Extract person information from text."),
    Message(role="user", content="John is a 30 year old software engineer.")
]

response = await client.generate_response(
    messages=messages,
    response_model=ExtractedInfo
)

print(response)  # {'name': 'John', 'age': 30, 'occupation': 'software engineer'}

Constructor

config
LLMConfig | None
default:"None"
Configuration object. If None, creates default config.
cache
bool
default:"False"
Enable response caching (not currently implemented, raises NotImplementedError if True)
client
Any | None
default:"None"
Optional pre-configured AsyncOpenAI client instance. If not provided, creates one from config.
max_tokens
int
default:"16384"
Maximum output tokens. Defaults to 16384 for compatibility.
reasoning
str
default:"'minimal'"
Reasoning effort level for reasoning models (GPT-5, o1, o3). Options: 'minimal', 'low', 'medium', 'high'
verbosity
str
default:"'low'"
Verbosity level for reasoning models. Options: 'low', 'medium', 'high'

Supported Models

Reasoning Models (via responses.parse API):
  • gpt-5-* series
  • o1-* series
  • o3-* series
Standard Models (via chat.completions.create):
  • gpt-4.1-mini (recommended)
  • gpt-4.1-nano
  • gpt-4o
  • gpt-4-turbo
  • All other GPT models
Reasoning models (GPT-5, o1, o3) do not support temperature settings. The client automatically omits temperature for these models.

Reasoning Model Configuration

For GPT-5 and o-series models, configure reasoning depth:
client = OpenAIClient(
    config=LLMConfig(
        api_key="sk-...",
        model="gpt-5-preview"
    ),
    reasoning="high",      # More thorough reasoning
    verbosity="medium"     # Detailed output
)

Custom Base URL

Use OpenAI-compatible endpoints:
client = OpenAIClient(
    config=LLMConfig(
        api_key="your-key",
        base_url="https://api.your-provider.com/v1"
    )
)

Response Format

The client uses different APIs based on model capabilities: Reasoning Models (responses.parse):
response = await client.responses.parse(
    model="gpt-5-preview",
    input=messages,
    max_output_tokens=max_tokens,
    text_format=response_model,
    reasoning={'effort': 'minimal'},
    text={'verbosity': 'low'}
)
Standard Models (chat.completions.create):
response = await client.chat.completions.create(
    model="gpt-4.1-mini",
    messages=messages,
    temperature=1.0,
    max_tokens=max_tokens,
    response_format={'type': 'json_object'}
)

OpenAIGenericClient

A simplified OpenAI client designed for local and third-party OpenAI-compatible models. Does not support caching or the responses.parse API.

When to Use

  • Local models (e.g., Ollama, LM Studio)
  • Third-party OpenAI-compatible APIs
  • Models with higher token limits
  • Simpler integration requirements

Basic Usage

from graphiti_core.llm_client import OpenAIGenericClient
from graphiti_core.llm_client.config import LLMConfig

# For local Ollama instance
client = OpenAIGenericClient(
    config=LLMConfig(
        base_url="http://localhost:11434/v1",
        model="llama3",
        api_key="not-needed"  # Ollama doesn't require key
    ),
    max_tokens=32000  # Higher limit for local models
)

Constructor

config
LLMConfig | None
default:"None"
Configuration object. If None, creates default config.
cache
bool
default:"False"
Caching is not supported. Raises NotImplementedError if True.
client
Any | None
default:"None"
Optional pre-configured AsyncOpenAI client instance.
max_tokens
int
default:"16384"
Maximum output tokens. Default increased to 16384 for better local model compatibility.

Key Differences from OpenAIClient

FeatureOpenAIClientOpenAIGenericClient
CachingSupported (not implemented)Not supported
responses.parse APIYes (reasoning models)No
Structured outputsVia responses.parseVia json_schema
Max retries2 (configurable)2 (fixed)
Default max_tokens1638416384
Reasoning/verbosityYesNo

Structured Output Handling

Uses json_schema in response format:
response_format = {
    'type': 'json_schema',
    'json_schema': {
        'name': 'structured_response',
        'schema': response_model.model_json_schema()
    }
}

Error Handling

Implements custom retry logic:
  • Max 2 retries on validation/parsing errors
  • No retry for rate limits or refusals
  • Automatic retry for OpenAI client errors (timeout, connection, server errors)
  • Appends error context to messages for model self-correction

Example: Local Model

from graphiti_core.llm_client import OpenAIGenericClient
from graphiti_core.llm_client.config import LLMConfig
from pydantic import BaseModel

class Summary(BaseModel):
    title: str
    key_points: list[str]

client = OpenAIGenericClient(
    config=LLMConfig(
        base_url="http://localhost:11434/v1",
        model="llama3:70b"
    ),
    max_tokens=8192
)

messages = [
    Message(role="system", content="Summarize the following text."),
    Message(role="user", content="Long article text...")
]

summary = await client.generate_response(
    messages=messages,
    response_model=Summary
)

Compatibility Notes

  • Works with any OpenAI-compatible API
  • Does not use provider-specific features
  • JSON schema support required for structured outputs
  • Temperature and max_tokens always included in requests

Build docs developers (and LLMs) love