Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/getzep/graphiti/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The GeminiEmbedder provides embeddings using Google’s Gemini embedding models, supporting configurable batch sizes and automatic batch processing.

Installation

pip install graphiti-core[google-genai]

Basic Usage

from graphiti_core.embedder import GeminiEmbedder
from graphiti_core.embedder.gemini import GeminiEmbedderConfig

# Initialize embedder
embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(
        api_key="your-google-api-key",
        embedding_model="text-embedding-001",
        embedding_dim=1024
    )
)

# Single embedding
vector = await embedder.create("Hello, world!")
print(len(vector))  # 1024

# Batch embeddings
texts = [
    "First document",
    "Second document",
    "Third document"
]
vectors = await embedder.create_batch(texts)
print(len(vectors))  # 3

Configuration

GeminiEmbedderConfig

embedding_model
str
default:"'text-embedding-001'"
Gemini embedding model to use. Options:
  • text-embedding-001 (default)
  • text-embedding-005
  • gemini-embedding-001
embedding_dim
int
default:"1024"
Output embedding dimensionality. Passed to the API via output_dimensionality.
api_key
str | None
default:"None"
Google API key. If not provided, uses GOOGLE_API_KEY environment variable.

Constructor

config
GeminiEmbedderConfig | None
default:"None"
Configuration object. If None, creates default config.
client
genai.Client | None
default:"None"
Optional pre-configured genai.Client instance. If not provided, creates one from config.
batch_size
int | None
default:"None"
Batch size for API requests. Defaults:
  • 1 for gemini-embedding-001 (API limitation)
  • 100 for other models

Supported Models

text-embedding-001 (Default)

  • Dimensions: 768 (native)
  • Best for: General purpose embeddings
config = GeminiEmbedderConfig(
    embedding_model="text-embedding-001",
    embedding_dim=768
)

text-embedding-005

  • Dimensions: 768 (native)
  • Best for: Latest improvements
config = GeminiEmbedderConfig(
    embedding_model="text-embedding-005",
    embedding_dim=768
)

gemini-embedding-001

  • Dimensions: 768 (native)
  • Best for: Backwards compatibility
  • Limitation: Batch size of 1 only
config = GeminiEmbedderConfig(
    embedding_model="gemini-embedding-001",
    embedding_dim=768
)

embedder = GeminiEmbedder(
    config=config
    # batch_size automatically set to 1
)
The gemini-embedding-001 model has a strict API limit of 1 instance per request. The embedder automatically sets batch_size=1 for this model.

Methods

create()

Generate a single embedding vector.
vector = await embedder.create("Your text here")
print(len(vector))  # embedding_dim
Parameters:
  • input_data (str | list[str] | Iterable[int] | Iterable[Iterable[int]]): Input to embed
Returns: list[float] - Embedding vector Raises:
  • ValueError: If no embeddings returned from API

create_batch()

Generate embeddings for multiple texts with automatic batching.
texts = ["Text 1", "Text 2", "Text 3"]
vectors = await embedder.create_batch(texts)
print(len(vectors))  # 3
Parameters:
  • input_data_list (list[str]): List of texts to embed
Returns: list[list[float]] - List of embedding vectors Raises:
  • ValueError: If embeddings are empty or invalid
  • Exception: If batch processing fails

Batch Size Configuration

The embedder intelligently handles batch sizes:
# Default: 100 for most models
embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(embedding_model="text-embedding-001")
)
print(embedder.batch_size)  # 100

# Automatic: 1 for gemini-embedding-001
embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(embedding_model="gemini-embedding-001")
)
print(embedder.batch_size)  # 1

# Custom: Override default
embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(embedding_model="text-embedding-001"),
    batch_size=50  # Process 50 at a time
)
Logic:
if batch_size is None and self.config.embedding_model == 'gemini-embedding-001':
    self.batch_size = 1  # API limitation
elif batch_size is None:
    self.batch_size = 100  # Default
else:
    self.batch_size = batch_size  # User-specified

Dimension Configuration

The embedder uses Gemini’s output_dimensionality parameter:
from google.genai import types

config = types.EmbedContentConfig(
    output_dimensionality=self.config.embedding_dim
)

result = await self.client.aio.models.embed_content(
    model=self.config.embedding_model,
    contents=[input_data],
    config=config
)
This allows flexible dimension sizes:
# 768 dimensions (native for text-embedding-001)
embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(embedding_dim=768)
)

# 512 dimensions (reduced)
embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(embedding_dim=512)
)

# 1024 dimensions (if supported by model)
embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(embedding_dim=1024)
)

Batch Processing with Fallback

The embedder implements robust batch processing with automatic fallback:
# Process in batches of batch_size
for i in range(0, len(input_data_list), batch_size):
    batch = input_data_list[i : i + batch_size]
    
    try:
        # Try batch processing
        result = await client.embed_content(batch, config=config)
        all_embeddings.extend(result.embeddings)
    except Exception as e:
        # Fall back to individual processing
        for item in batch:
            result = await client.embed_content([item], config=config)
            all_embeddings.append(result.embeddings[0])
This ensures reliability even when batch requests fail:
texts = ["Text 1", "Text 2", "Text 3", "Problematic text", "Text 5"]

# If batch fails, processes individually with logging
vectors = await embedder.create_batch(texts)
# Warning: "Batch embedding failed for batch 1, falling back to individual processing"

Error Handling

Empty Embeddings

try:
    vector = await embedder.create("")
except ValueError as e:
    print(f"Error: {e}")
    # "No embeddings returned from Gemini API in create()"

Batch Processing Errors

import logging

logger = logging.getLogger(__name__)

try:
    vectors = await embedder.create_batch(texts)
except ValueError as e:
    # Individual item failed
    logger.error(f"Embedding error: {e}")
except Exception as e:
    # Batch processing failed entirely
    logger.error(f"Batch embedding failed: {e}")

Validation

The embedder validates API responses:
# Single embedding
if not result.embeddings or len(result.embeddings) == 0 or not result.embeddings[0].values:
    raise ValueError('No embeddings returned from Gemini API in create()')

return result.embeddings[0].values

# Batch embedding
if not result.embeddings or len(result.embeddings) == 0:
    raise Exception('No embeddings returned')

for embedding in result.embeddings:
    if not embedding.values:
        raise ValueError('Empty embedding values returned')
    all_embeddings.append(embedding.values)

Example: Large Dataset Processing

from graphiti_core.embedder import GeminiEmbedder
from graphiti_core.embedder.gemini import GeminiEmbedderConfig
import logging

logging.basicConfig(level=logging.INFO)

# Initialize with custom batch size
embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(
        api_key="your-key",
        embedding_model="text-embedding-001",
        embedding_dim=768
    ),
    batch_size=100
)

# Large dataset
large_dataset = [f"Document {i}" for i in range(1000)]

# Process in batches with progress tracking
from tqdm.asyncio import tqdm

async def embed_with_progress(texts):
    all_vectors = []
    
    # Process in chunks of 100
    for i in tqdm(range(0, len(texts), 100), desc="Embedding"):
        chunk = texts[i:i + 100]
        vectors = await embedder.create_batch(chunk)
        all_vectors.extend(vectors)
    
    return all_vectors

vectors = await embed_with_progress(large_dataset)
print(f"Generated {len(vectors)} embeddings")
from graphiti_core.embedder import GeminiEmbedder
from graphiti_core.embedder.gemini import GeminiEmbedderConfig
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# Initialize embedder
embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(
        api_key="your-key",
        embedding_model="text-embedding-001",
        embedding_dim=768
    )
)

# Corpus
documents = [
    "Python is a programming language.",
    "Machine learning uses neural networks.",
    "Data science involves statistical analysis.",
    "Cloud computing provides scalable infrastructure."
]

# Generate embeddings
doc_vectors = await embedder.create_batch(documents)

# Query
query = "What is machine learning?"
query_vector = await embedder.create(query)

# Compute similarities
similarities = cosine_similarity([query_vector], doc_vectors)[0]

# Find top matches
top_indices = np.argsort(similarities)[::-1][:3]

print(f"Query: {query}\n")
for idx in top_indices:
    print(f"Similarity: {similarities[idx]:.4f}")
    print(f"Document: {documents[idx]}\n")

Use with Graphiti

from graphiti_core import Graphiti
from graphiti_core.embedder import GeminiEmbedder
from graphiti_core.embedder.gemini import GeminiEmbedderConfig

embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(
        api_key="your-google-api-key",
        embedding_model="text-embedding-001",
        embedding_dim=768
    ),
    batch_size=100
)

graphiti = Graphiti(
    uri="neo4j://localhost:7687",
    user="neo4j",
    password="password",
    embedder=embedder
)

# Gemini embeddings used automatically
await graphiti.add_episode(
    name="episode1",
    episode_body="Your text here...",
    source_description="source1"
)

Performance Tips

  1. Use appropriate batch size: Balance between efficiency and API limits
  2. Choose text-embedding-001 for general use: Good balance of quality and speed
  3. Set dimensions to native 768: Avoid unnecessary computation
  4. Monitor API quotas: Gemini has rate limits
  5. Use batch processing: Always prefer create_batch() for multiple inputs

Model Comparison

ModelDimsBatch SizeBest For
text-embedding-001768100General purpose
text-embedding-005768100Latest quality
gemini-embedding-0017681Legacy support

Troubleshooting

Batch Size Too Large

# Error: Batch size exceeds API limit
# Solution: Reduce batch_size
embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(embedding_model="text-embedding-001"),
    batch_size=50  # Reduce from 100
)

gemini-embedding-001 Batch Errors

# Model only supports batch_size=1
# Solution: Use automatic configuration
embedder = GeminiEmbedder(
    config=GeminiEmbedderConfig(embedding_model="gemini-embedding-001")
    # batch_size automatically set to 1
)

Empty Embeddings

# Check input validity
if not input_data or input_data.strip() == "":
    print("Error: Empty input")
    
# Ensure model is available
config = GeminiEmbedderConfig(
    embedding_model="text-embedding-001"  # Use valid model
)

API Reference

class GeminiEmbedder(EmbedderClient):
    """Google Gemini Embedder Client"""
    
    def __init__(
        self,
        config: GeminiEmbedderConfig | None = None,
        client: genai.Client | None = None,
        batch_size: int | None = None
    ):
        """
        Initialize the GeminiEmbedder.
        
        Args:
            config: Configuration with api_key, model, and dimensions
            client: Optional pre-configured genai.Client
            batch_size: Optional batch size override
        """
        ...
    
    async def create(
        self,
        input_data: str | list[str] | Iterable[int] | Iterable[Iterable[int]]
    ) -> list[float]:
        """
        Create embeddings for input data.
        
        Args:
            input_data: Text or token sequence to embed
            
        Returns:
            Embedding vector
            
        Raises:
            ValueError: If no embeddings returned
        """
        ...
    
    async def create_batch(
        self,
        input_data_list: list[str]
    ) -> list[list[float]]:
        """
        Create embeddings for multiple inputs with automatic batching.
        
        Args:
            input_data_list: List of texts to embed
            
        Returns:
            List of embedding vectors
            
        Raises:
            ValueError: If embeddings are empty
            Exception: If batch processing fails
        """
        ...

Build docs developers (and LLMs) love