Documentation Index
Fetch the complete documentation index at: https://mintlify.com/getzep/graphiti/llms.txt
Use this file to discover all available pages before exploring further.
Google Gemini provides state-of-the-art multimodal AI models with strong reasoning, structured output, and embedding capabilities.
Installation
Install Graphiti with Gemini support:
pip install graphiti-core[google-genai]
Configuration
Environment Variables
Complete Setup
Gemini can be used for LLM inference, embeddings, and cross-encoding:
import os
from graphiti_core import Graphiti
from graphiti_core.llm_client.gemini_client import GeminiClient, LLMConfig
from graphiti_core.embedder.gemini import GeminiEmbedder, GeminiEmbedderConfig
from graphiti_core.cross_encoder.gemini_reranker_client import GeminiRerankerClient
# Configure API key
api_key = os.environ["GOOGLE_API_KEY"]
# Initialize Graphiti with Gemini for all components
graphiti = Graphiti(
"bolt://localhost:7687",
"neo4j",
"password",
llm_client=GeminiClient(
config=LLMConfig(
api_key=api_key,
model="gemini-2.0-flash"
)
),
embedder=GeminiEmbedder(
config=GeminiEmbedderConfig(
api_key=api_key,
embedding_model="text-embedding-001"
)
),
cross_encoder=GeminiRerankerClient(
config=LLMConfig(
api_key=api_key,
model="gemini-2.5-flash-lite"
)
)
)
Supported Models
Language Models
Gemini 3 (Preview)
- gemini-3-pro-preview: Most capable, 64K output tokens
- gemini-3-flash-preview (recommended): Fast, efficient, 64K output tokens
Gemini 2.5
- gemini-2.5-pro: Advanced reasoning, 64K output tokens
- gemini-2.5-flash: Balanced performance, 64K output tokens
- gemini-2.5-flash-lite: Fast, cost-effective, 64K output tokens
Gemini 2.0
- gemini-2.0-flash: Fast multimodal, 8K output tokens
- gemini-2.0-flash-lite: Ultra-fast, 8K output tokens
Gemini 1.5
- gemini-1.5-pro: Extended context (2M tokens), 8K output
- gemini-1.5-flash: Fast, 8K output tokens
- gemini-1.5-flash-8b: Smallest, 8K output tokens
Embedding Models
- text-embedding-001 (recommended): General-purpose embeddings
- text-embedding-005: Latest embedding model
- gemini-embedding-001: Multimodal embeddings
Reranking Models
- gemini-2.5-flash-lite (recommended): Optimized for classification
- Any Gemini model with log probabilities support
LLM Configuration
from graphiti_core.llm_client.gemini_client import GeminiClient, LLMConfig
llm_client = GeminiClient(
config=LLMConfig(
api_key="AIza...",
model="gemini-2.0-flash",
small_model="gemini-2.5-flash-lite",
temperature=0.7
),
max_tokens=16384 # Override default
)
LLM Configuration Options
| Parameter | Type | Default | Description |
|---|
api_key | str | From env | Google API key |
model | str | "gemini-3-flash-preview" | Primary LLM model |
small_model | str | "gemini-2.5-flash-lite" | Model for simpler tasks |
temperature | float | 0.7 | Sampling temperature (0-2) |
max_tokens | int | Model-specific | Maximum output tokens |
Embeddings Configuration
from graphiti_core.embedder.gemini import GeminiEmbedder, GeminiEmbedderConfig
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(
api_key="AIza...",
embedding_model="text-embedding-001",
embedding_dim=768 # Default dimension
),
batch_size=100 # Process 100 texts per batch
)
Embedder Configuration Options
| Parameter | Type | Default | Description |
|---|
api_key | str | From env | Google API key |
embedding_model | str | "text-embedding-001" | Embedding model |
embedding_dim | int | 768 | Output dimension |
batch_size | int | 100 | Batch size for embed_content |
Reranking Configuration
Gemini’s reranker uses log probabilities for relevance scoring:
from graphiti_core.cross_encoder.gemini_reranker_client import GeminiRerankerClient
from graphiti_core.llm_client.config import LLMConfig
reranker = GeminiRerankerClient(
config=LLMConfig(
api_key="AIza...",
model="gemini-2.5-flash-lite" # Optimized for classification
)
)
The reranker uses boolean classification with log probabilities to rank passage relevance, similar to the OpenAI reranker approach.
Thinking Configuration (Gemini 2.5+)
For models that support thinking (Gemini 2.5+), enable extended reasoning:
from google.genai import types
llm_client = GeminiClient(
config=LLMConfig(model="gemini-2.5-pro"),
thinking_config=types.ThinkingConfig(
mode="reasoning", # Enable reasoning mode
max_tokens=2048 # Limit thinking tokens
)
)
Structured Output Support
Gemini supports native structured output via JSON schema:
# Graphiti automatically:
# 1. Converts Pydantic models to JSON schema
# 2. Sets response_mime_type to "application/json"
# 3. Validates responses against schema
# 4. Handles truncation and salvages partial JSON
Benefits:
- Native JSON mode with schema validation
- Automatic partial JSON salvaging
- Retry logic for malformed responses
Complete Example
import asyncio
import os
from datetime import datetime, timezone
from graphiti_core import Graphiti
from graphiti_core.llm_client.gemini_client import GeminiClient, LLMConfig
from graphiti_core.embedder.gemini import GeminiEmbedder, GeminiEmbedderConfig
from graphiti_core.nodes import EpisodeType
async def main():
api_key = os.environ["GOOGLE_API_KEY"]
# Configure Gemini LLM
llm_client = GeminiClient(
config=LLMConfig(
api_key=api_key,
model="gemini-2.0-flash",
temperature=0.7
)
)
# Configure Gemini embeddings
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(
api_key=api_key,
embedding_model="text-embedding-001"
)
)
# Initialize Graphiti
graphiti = Graphiti(
"bolt://localhost:7687",
"neo4j",
"password",
llm_client=llm_client,
embedder=embedder
)
try:
# Add an episode
await graphiti.add_episode(
name="AI News 1",
episode_body="Google announced Gemini 3.0 with enhanced multimodal capabilities.",
source=EpisodeType.text,
reference_time=datetime.now(timezone.utc)
)
# Search the graph
results = await graphiti.search("What are Gemini 3.0's features?")
for result in results:
print(f"Fact: {result.fact}")
finally:
await graphiti.close()
if __name__ == "__main__":
asyncio.run(main())
Error Handling
Graphiti automatically handles:
- Rate Limit Errors: Exponential backoff and retry
- Safety Blocks: Content filtered by safety settings
- Prompt Blocks: Prompts blocked before processing
- Truncation: Partial JSON salvaging from truncated responses
Safety Settings
Gemini has built-in safety filters. If content is blocked:
# Exception will indicate the safety category:
# - HARM_CATEGORY_HARASSMENT
# - HARM_CATEGORY_HATE_SPEECH
# - HARM_CATEGORY_SEXUALLY_EXPLICIT
# - HARM_CATEGORY_DANGEROUS_CONTENT
Maximum Output Tokens
| Model Family | Max Output Tokens |
|---|
| Gemini 3 | 65,536 (64K) |
| Gemini 2.5 | 65,536 (64K) |
| Gemini 2.0 | 8,192 (8K) |
| Gemini 1.5 | 8,192 (8K) |
When to Use Gemini
Choose Gemini if you:
- Need multimodal capabilities (image, video, audio)
- Want extended context windows (1-2M tokens)
- Prefer Google’s safety and content filtering
- Need native JSON schema support
- Want to use Google Cloud infrastructure
Choose OpenAI if you:
- Need GPT-5 reasoning models
- Want faster response times
- Prefer OpenAI’s ecosystem
Cost Optimization
- Use Flash Models: Gemini Flash is fast and cost-effective
- Batch Embeddings: Use batch operations for embeddings
- Adjust Thinking Tokens: Limit thinking tokens for reasoning models
- Monitor Usage: Track API usage via Google Cloud Console