Documentation Index
Fetch the complete documentation index at: https://mintlify.com/getzep/graphiti/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The AzureOpenAILLMClient provides integration with OpenAI models hosted on Azure, supporting both the native Azure OpenAI SDK and OpenAI’s v1 API compatibility endpoint.
Installation
pip install graphiti-core
The OpenAI SDK (which includes Azure support) is included by default.
Basic Usage
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig
from openai import AsyncAzureOpenAI
from pydantic import BaseModel
# Create Azure OpenAI client
azure_client = AsyncAzureOpenAI(
api_key="your-azure-key",
api_version="2024-02-15-preview",
azure_endpoint="https://your-resource.openai.azure.com"
)
# Initialize Graphiti client
client = AzureOpenAILLMClient(
azure_client=azure_client,
config=LLMConfig(
model="gpt-4o", # Your Azure deployment name
temperature=1.0
),
max_tokens=16384
)
# Define response structure
class Analysis(BaseModel):
summary: str
sentiment: str
key_points: list[str]
# Generate structured response
from graphiti_core.prompts.models import Message
messages = [
Message(role="system", content="Analyze the following text."),
Message(role="user", content="Product review text...")
]
response = await client.generate_response(
messages=messages,
response_model=Analysis
)
Constructor
azure_client
AsyncAzureOpenAI | AsyncOpenAI
required
Pre-configured Azure OpenAI client. Must be either:
AsyncAzureOpenAI for native Azure SDK
AsyncOpenAI with Azure v1 API endpoint
config
LLMConfig | None
default:"None"
Configuration object. If None, creates default config.
Maximum output tokens for responses.
Reasoning effort level for reasoning models (GPT-5, o1, o3). Options: 'minimal', 'low', 'medium', 'high'
Verbosity level for reasoning models. Options: 'low', 'medium', 'high'
Caching is not supported. The cache parameter in the base class is always False.
Azure SDK Setup
Option 1: AsyncAzureOpenAI (Recommended)
from openai import AsyncAzureOpenAI
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig
azure_client = AsyncAzureOpenAI(
api_key="your-azure-api-key",
api_version="2024-02-15-preview",
azure_endpoint="https://your-resource.openai.azure.com"
)
client = AzureOpenAILLMClient(
azure_client=azure_client,
config=LLMConfig(model="gpt-4o-deployment") # Your deployment name
)
Option 2: AsyncOpenAI with Azure v1 Endpoint
from openai import AsyncOpenAI
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig
# Using Azure's OpenAI v1 compatibility endpoint
openai_client = AsyncOpenAI(
api_key="your-azure-api-key",
base_url="https://your-resource.openai.azure.com/openai/deployments/your-deployment"
)
client = AzureOpenAILLMClient(
azure_client=openai_client,
config=LLMConfig(model="gpt-4o")
)
Supported Models
All OpenAI models available on Azure are supported:
Reasoning Models (via responses.parse):
gpt-5-* deployments
o1-* deployments
o3-* deployments
Standard Models (via chat.completions or beta.chat.completions.parse):
gpt-4o deployments
gpt-4-turbo deployments
gpt-4 deployments
gpt-3.5-turbo deployments
Use your Azure deployment name as the model parameter, not the base model name.
Structured Output Handling
The client automatically selects the appropriate API based on model type:
Reasoning Models (GPT-5, o1, o3)
Uses responses.parse API:
response = await client.responses.parse(
model="gpt-5-deployment",
input=messages,
max_output_tokens=max_tokens,
text_format=response_model,
reasoning={'effort': 'minimal'},
text={'verbosity': 'low'}
)
Standard Models (GPT-4o, etc.)
Uses beta.chat.completions.parse API:
response = await client.beta.chat.completions.parse(
model="gpt-4o-deployment",
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
response_format=response_model # Structured output
)
Response Parsing
The client handles different response formats:
ParsedChatCompletion (Standard Models)
# From beta.chat.completions.parse
if hasattr(message, 'parsed') and message.parsed:
return message.parsed.model_dump() # Already a Pydantic model
elif hasattr(message, 'refusal') and message.refusal:
raise RefusalError(message.refusal)
Responses.parse (Reasoning Models)
# From responses.parse
if hasattr(response, 'output_text'):
return json.loads(response.output_text)
elif hasattr(response, 'refusal') and response.refusal:
raise RefusalError(response.refusal)
Reasoning Model Configuration
For GPT-5 and o-series deployments:
client = AzureOpenAILLMClient(
azure_client=azure_client,
config=LLMConfig(model="o1-deployment"),
reasoning="high", # More thorough reasoning
verbosity="medium" # Detailed output
)
Reasoning parameters:
reasoning: 'minimal', 'low', 'medium', 'high'
verbosity: 'low', 'medium', 'high'
Reasoning models do not support temperature. The client automatically omits temperature for these models.
Error Handling
Refusals
from graphiti_core.llm_client.errors import RefusalError
try:
response = await client.generate_response(messages=messages)
except RefusalError as e:
print(f"Model refused to respond: {e}")
# No retry - request was rejected
Rate Limits
from graphiti_core.llm_client.errors import RateLimitError
try:
response = await client.generate_response(messages=messages)
except RateLimitError as e:
print(f"Rate limited: {e}")
# Implement exponential backoff
Automatic Retries
The client retries up to 2 times for:
- Validation errors
- JSON parsing errors
- Transient API failures
Error context is appended for model self-correction:
error_context = (
f'The previous response attempt was invalid. '
f'Error type: {e.__class__.__name__}. '
f'Please try again with a valid response.'
)
messages.append(Message(role='user', content=error_context))
Token Usage Tracking
Track token consumption across requests:
client = AzureOpenAILLMClient(
azure_client=azure_client,
config=LLMConfig(model="gpt-4o-deployment")
)
response = await client.generate_response(
messages=messages,
prompt_name="entity_extraction"
)
# Check usage
usage = client.token_tracker.get_usage()
print(f"Input tokens: {usage['input_tokens']}")
print(f"Output tokens: {usage['output_tokens']}")
print(f"Total tokens: {usage['total_tokens']}")
# By prompt name
usage = client.token_tracker.get_usage_by_prompt("entity_extraction")
Model Detection
The client automatically detects reasoning models:
@staticmethod
def _supports_reasoning_features(model: str) -> bool:
"""Return True when the Azure model supports reasoning/verbosity options."""
reasoning_prefixes = ('o1', 'o3', 'gpt-5')
return model.startswith(reasoning_prefixes)
Behavior changes for reasoning models:
- Uses
responses.parse instead of beta.chat.completions.parse
- Omits
temperature parameter
- Includes
reasoning and verbosity options
Example: Complete Integration
import os
from openai import AsyncAzureOpenAI
from graphiti_core.llm_client import AzureOpenAILLMClient
from graphiti_core.llm_client.config import LLMConfig, ModelSize
from graphiti_core.prompts.models import Message
from pydantic import BaseModel
# Setup Azure client
azure_client = AsyncAzureOpenAI(
api_key=os.getenv("AZURE_OPENAI_API_KEY"),
api_version="2024-02-15-preview",
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT")
)
# Create Graphiti client
client = AzureOpenAILLMClient(
azure_client=azure_client,
config=LLMConfig(
model="gpt-4o", # Your deployment name
small_model="gpt-4o-mini", # Smaller deployment
temperature=0.7
),
max_tokens=8192
)
# Define schema
class ExtractedEntities(BaseModel):
people: list[str]
organizations: list[str]
locations: list[str]
# Extract entities
messages = [
Message(
role="system",
content="Extract named entities from the text."
),
Message(
role="user",
content="Apple CEO Tim Cook announced a new facility in Cupertino."
)
]
result = await client.generate_response(
messages=messages,
response_model=ExtractedEntities,
prompt_name="entity_extraction"
)
print(result)
# {
# 'people': ['Tim Cook'],
# 'organizations': ['Apple'],
# 'locations': ['Cupertino']
# }
# Check token usage
usage = client.token_tracker.get_usage()
print(f"Tokens used: {usage['total_tokens']}")
- Use appropriate deployment sizes: Deploy both large and small models
- Set reasonable max_tokens: Azure charges per token
- Monitor quotas: Azure has deployment-specific rate limits
- Use model_size parameter: Let Graphiti choose optimal deployment
# Automatic deployment selection
response = await client.generate_response(
messages=messages,
model_size=ModelSize.small # Uses small_model deployment
)
Differences from OpenAIClient
| Feature | OpenAIClient | AzureOpenAILLMClient |
|---|
| Client type | AsyncOpenAI | AsyncAzureOpenAI or AsyncOpenAI |
| Model parameter | Base model name | Azure deployment name |
| API version | Latest | Configurable |
| Endpoint | api.openai.com | Azure resource endpoint |
| Caching | Not implemented | Not supported |
| Structured outputs | responses.parse | responses.parse + beta.parse |
Troubleshooting
Authentication Errors
# Ensure API key and endpoint are correct
azure_client = AsyncAzureOpenAI(
api_key="your-key", # From Azure portal
api_version="2024-02-15-preview",
azure_endpoint="https://your-resource.openai.azure.com" # Full URL
)
Deployment Not Found
# Use deployment name, not base model
config = LLMConfig(
model="my-gpt4o-deployment" # Your custom deployment name
)
Rate Limiting
# Azure has per-deployment quotas
# Check Azure portal for:
# - Tokens per minute (TPM)
# - Requests per minute (RPM)
# Implement backoff or use multiple deployments