Documentation Index
Fetch the complete documentation index at: https://mintlify.com/getzep/graphiti/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The GeminiEmbedder provides embeddings using Google’s Gemini embedding models, supporting configurable batch sizes and automatic batch processing.
Installation
pip install graphiti-core[google-genai]
Basic Usage
from graphiti_core.embedder import GeminiEmbedder
from graphiti_core.embedder.gemini import GeminiEmbedderConfig
# Initialize embedder
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(
api_key="your-google-api-key",
embedding_model="text-embedding-001",
embedding_dim=1024
)
)
# Single embedding
vector = await embedder.create("Hello, world!")
print(len(vector)) # 1024
# Batch embeddings
texts = [
"First document",
"Second document",
"Third document"
]
vectors = await embedder.create_batch(texts)
print(len(vectors)) # 3
Configuration
GeminiEmbedderConfig
embedding_model
str
default:"'text-embedding-001'"
Gemini embedding model to use. Options:
text-embedding-001 (default)
text-embedding-005
gemini-embedding-001
Output embedding dimensionality. Passed to the API via output_dimensionality.
Google API key. If not provided, uses GOOGLE_API_KEY environment variable.
Constructor
config
GeminiEmbedderConfig | None
default:"None"
Configuration object. If None, creates default config.
client
genai.Client | None
default:"None"
Optional pre-configured genai.Client instance. If not provided, creates one from config.
Batch size for API requests. Defaults:
1 for gemini-embedding-001 (API limitation)
100 for other models
Supported Models
text-embedding-001 (Default)
- Dimensions: 768 (native)
- Best for: General purpose embeddings
config = GeminiEmbedderConfig(
embedding_model="text-embedding-001",
embedding_dim=768
)
text-embedding-005
- Dimensions: 768 (native)
- Best for: Latest improvements
config = GeminiEmbedderConfig(
embedding_model="text-embedding-005",
embedding_dim=768
)
gemini-embedding-001
- Dimensions: 768 (native)
- Best for: Backwards compatibility
- Limitation: Batch size of 1 only
config = GeminiEmbedderConfig(
embedding_model="gemini-embedding-001",
embedding_dim=768
)
embedder = GeminiEmbedder(
config=config
# batch_size automatically set to 1
)
The gemini-embedding-001 model has a strict API limit of 1 instance per request. The embedder automatically sets batch_size=1 for this model.
Methods
create()
Generate a single embedding vector.
vector = await embedder.create("Your text here")
print(len(vector)) # embedding_dim
Parameters:
input_data (str | list[str] | Iterable[int] | Iterable[Iterable[int]]): Input to embed
Returns: list[float] - Embedding vector
Raises:
ValueError: If no embeddings returned from API
create_batch()
Generate embeddings for multiple texts with automatic batching.
texts = ["Text 1", "Text 2", "Text 3"]
vectors = await embedder.create_batch(texts)
print(len(vectors)) # 3
Parameters:
input_data_list (list[str]): List of texts to embed
Returns: list[list[float]] - List of embedding vectors
Raises:
ValueError: If embeddings are empty or invalid
Exception: If batch processing fails
Batch Size Configuration
The embedder intelligently handles batch sizes:
# Default: 100 for most models
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(embedding_model="text-embedding-001")
)
print(embedder.batch_size) # 100
# Automatic: 1 for gemini-embedding-001
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(embedding_model="gemini-embedding-001")
)
print(embedder.batch_size) # 1
# Custom: Override default
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(embedding_model="text-embedding-001"),
batch_size=50 # Process 50 at a time
)
Logic:
if batch_size is None and self.config.embedding_model == 'gemini-embedding-001':
self.batch_size = 1 # API limitation
elif batch_size is None:
self.batch_size = 100 # Default
else:
self.batch_size = batch_size # User-specified
Dimension Configuration
The embedder uses Gemini’s output_dimensionality parameter:
from google.genai import types
config = types.EmbedContentConfig(
output_dimensionality=self.config.embedding_dim
)
result = await self.client.aio.models.embed_content(
model=self.config.embedding_model,
contents=[input_data],
config=config
)
This allows flexible dimension sizes:
# 768 dimensions (native for text-embedding-001)
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(embedding_dim=768)
)
# 512 dimensions (reduced)
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(embedding_dim=512)
)
# 1024 dimensions (if supported by model)
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(embedding_dim=1024)
)
Batch Processing with Fallback
The embedder implements robust batch processing with automatic fallback:
# Process in batches of batch_size
for i in range(0, len(input_data_list), batch_size):
batch = input_data_list[i : i + batch_size]
try:
# Try batch processing
result = await client.embed_content(batch, config=config)
all_embeddings.extend(result.embeddings)
except Exception as e:
# Fall back to individual processing
for item in batch:
result = await client.embed_content([item], config=config)
all_embeddings.append(result.embeddings[0])
This ensures reliability even when batch requests fail:
texts = ["Text 1", "Text 2", "Text 3", "Problematic text", "Text 5"]
# If batch fails, processes individually with logging
vectors = await embedder.create_batch(texts)
# Warning: "Batch embedding failed for batch 1, falling back to individual processing"
Error Handling
Empty Embeddings
try:
vector = await embedder.create("")
except ValueError as e:
print(f"Error: {e}")
# "No embeddings returned from Gemini API in create()"
Batch Processing Errors
import logging
logger = logging.getLogger(__name__)
try:
vectors = await embedder.create_batch(texts)
except ValueError as e:
# Individual item failed
logger.error(f"Embedding error: {e}")
except Exception as e:
# Batch processing failed entirely
logger.error(f"Batch embedding failed: {e}")
Validation
The embedder validates API responses:
# Single embedding
if not result.embeddings or len(result.embeddings) == 0 or not result.embeddings[0].values:
raise ValueError('No embeddings returned from Gemini API in create()')
return result.embeddings[0].values
# Batch embedding
if not result.embeddings or len(result.embeddings) == 0:
raise Exception('No embeddings returned')
for embedding in result.embeddings:
if not embedding.values:
raise ValueError('Empty embedding values returned')
all_embeddings.append(embedding.values)
Example: Large Dataset Processing
from graphiti_core.embedder import GeminiEmbedder
from graphiti_core.embedder.gemini import GeminiEmbedderConfig
import logging
logging.basicConfig(level=logging.INFO)
# Initialize with custom batch size
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(
api_key="your-key",
embedding_model="text-embedding-001",
embedding_dim=768
),
batch_size=100
)
# Large dataset
large_dataset = [f"Document {i}" for i in range(1000)]
# Process in batches with progress tracking
from tqdm.asyncio import tqdm
async def embed_with_progress(texts):
all_vectors = []
# Process in chunks of 100
for i in tqdm(range(0, len(texts), 100), desc="Embedding"):
chunk = texts[i:i + 100]
vectors = await embedder.create_batch(chunk)
all_vectors.extend(vectors)
return all_vectors
vectors = await embed_with_progress(large_dataset)
print(f"Generated {len(vectors)} embeddings")
Example: Semantic Search
from graphiti_core.embedder import GeminiEmbedder
from graphiti_core.embedder.gemini import GeminiEmbedderConfig
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Initialize embedder
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(
api_key="your-key",
embedding_model="text-embedding-001",
embedding_dim=768
)
)
# Corpus
documents = [
"Python is a programming language.",
"Machine learning uses neural networks.",
"Data science involves statistical analysis.",
"Cloud computing provides scalable infrastructure."
]
# Generate embeddings
doc_vectors = await embedder.create_batch(documents)
# Query
query = "What is machine learning?"
query_vector = await embedder.create(query)
# Compute similarities
similarities = cosine_similarity([query_vector], doc_vectors)[0]
# Find top matches
top_indices = np.argsort(similarities)[::-1][:3]
print(f"Query: {query}\n")
for idx in top_indices:
print(f"Similarity: {similarities[idx]:.4f}")
print(f"Document: {documents[idx]}\n")
Use with Graphiti
from graphiti_core import Graphiti
from graphiti_core.embedder import GeminiEmbedder
from graphiti_core.embedder.gemini import GeminiEmbedderConfig
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(
api_key="your-google-api-key",
embedding_model="text-embedding-001",
embedding_dim=768
),
batch_size=100
)
graphiti = Graphiti(
uri="neo4j://localhost:7687",
user="neo4j",
password="password",
embedder=embedder
)
# Gemini embeddings used automatically
await graphiti.add_episode(
name="episode1",
episode_body="Your text here...",
source_description="source1"
)
- Use appropriate batch size: Balance between efficiency and API limits
- Choose text-embedding-001 for general use: Good balance of quality and speed
- Set dimensions to native 768: Avoid unnecessary computation
- Monitor API quotas: Gemini has rate limits
- Use batch processing: Always prefer
create_batch() for multiple inputs
Model Comparison
| Model | Dims | Batch Size | Best For |
|---|
| text-embedding-001 | 768 | 100 | General purpose |
| text-embedding-005 | 768 | 100 | Latest quality |
| gemini-embedding-001 | 768 | 1 | Legacy support |
Troubleshooting
Batch Size Too Large
# Error: Batch size exceeds API limit
# Solution: Reduce batch_size
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(embedding_model="text-embedding-001"),
batch_size=50 # Reduce from 100
)
gemini-embedding-001 Batch Errors
# Model only supports batch_size=1
# Solution: Use automatic configuration
embedder = GeminiEmbedder(
config=GeminiEmbedderConfig(embedding_model="gemini-embedding-001")
# batch_size automatically set to 1
)
Empty Embeddings
# Check input validity
if not input_data or input_data.strip() == "":
print("Error: Empty input")
# Ensure model is available
config = GeminiEmbedderConfig(
embedding_model="text-embedding-001" # Use valid model
)
API Reference
class GeminiEmbedder(EmbedderClient):
"""Google Gemini Embedder Client"""
def __init__(
self,
config: GeminiEmbedderConfig | None = None,
client: genai.Client | None = None,
batch_size: int | None = None
):
"""
Initialize the GeminiEmbedder.
Args:
config: Configuration with api_key, model, and dimensions
client: Optional pre-configured genai.Client
batch_size: Optional batch size override
"""
...
async def create(
self,
input_data: str | list[str] | Iterable[int] | Iterable[Iterable[int]]
) -> list[float]:
"""
Create embeddings for input data.
Args:
input_data: Text or token sequence to embed
Returns:
Embedding vector
Raises:
ValueError: If no embeddings returned
"""
...
async def create_batch(
self,
input_data_list: list[str]
) -> list[list[float]]:
"""
Create embeddings for multiple inputs with automatic batching.
Args:
input_data_list: List of texts to embed
Returns:
List of embedding vectors
Raises:
ValueError: If embeddings are empty
Exception: If batch processing fails
"""
...