The TurboPuffer plugin provides hybrid search capabilities combining vector (semantic) and BM25 (keyword) search for high-quality RAG.
Installation
uv add vision-agents[turbopuffer]
Authentication
Set your API keys in the environment:
export TURBO_PUFFER_KEY=your_turbopuffer_api_key
export GOOGLE_API_KEY=your_google_api_key # For Gemini embeddings
Components
TurboPufferRAG
Hybrid search RAG implementation:
from vision_agents.plugins import turbopuffer
# Initialize RAG
rag = turbopuffer.TurboPufferRAG(namespace="my-knowledge")
# Add documents from directory
await rag.add_directory("./knowledge")
# Search with hybrid mode (default)
results = await rag.search("How does the chat API work?")
# Vector-only search
results = await rag.search(
"How does the chat API work?",
mode="vector"
)
# BM25-only search
results = await rag.search(
"chat API pricing",
mode="bm25"
)
TurboPuffer namespace for storing vectors. Used to organize different knowledge bases.
embedding_model
string
default:"models/gemini-embedding-001"
Gemini embedding model for vector generation
Size of text chunks for splitting documents
Overlap between chunks for context continuity
region
string
default:"gcp-us-central1"
TurboPuffer region for data storage
Usage Examples
Quick Start
from vision_agents.plugins import turbopuffer
# Create RAG and add knowledge
rag = await turbopuffer.create_rag(
namespace="product-knowledge",
knowledge_dir="./knowledge"
)
# Search
results = await rag.search("What features are available?")
for result in results:
print(result.content)
With Custom Configuration
from vision_agents.plugins import turbopuffer
rag = turbopuffer.TurboPufferRAG(
namespace="support-docs",
embedding_model="models/gemini-embedding-001",
chunk_size=5000,
chunk_overlap=100,
region="gcp-us-central1"
)
await rag.add_directory("./docs")
Different Search Modes
# Hybrid search (best for most queries)
results = await rag.search(
"How do I authenticate?",
mode="hybrid" # default
)
# Vector search (for semantic similarity)
results = await rag.search(
"authentication methods",
mode="vector"
)
# BM25 search (for keyword matching)
results = await rag.search(
"API key authentication",
mode="bm25"
)
Use with Agent
from vision_agents.core import Agent, User
from vision_agents.plugins import turbopuffer, openai, getstream
# Create RAG
rag = await turbopuffer.create_rag(
namespace="docs",
knowledge_dir="./knowledge"
)
# Create agent with RAG
agent = Agent(
edge=getstream.Edge(),
agent_user=User(name="Support Agent"),
instructions="Answer questions using the knowledge base.",
llm=openai.LLM("gpt-4.1"),
rag=rag
)
Search Modes
Hybrid (Default)
Combines vector and BM25 search using Reciprocal Rank Fusion:
- Best for general queries
- Balances semantic and keyword matching
- Recommended for most use cases
results = await rag.search("query", mode="hybrid")
Vector Only
Semantic similarity search:
- Best for conceptual queries
- Finds related content even with different wording
- Good for “fuzzy” searches
results = await rag.search("query", mode="vector")
BM25 Only
Keyword-based search:
- Best for exact term matching
- Good for technical queries with specific terms
- Fast and precise for known keywords
results = await rag.search("query", mode="bm25")
Adding Documents
From Directory
# Add all text files from directory
await rag.add_directory("./knowledge")
# Supports: .txt, .md, .pdf, and more
From Individual Files
await rag.add_file("./docs/getting-started.md")
await rag.add_file("./docs/api-reference.pdf")
From Text Directly
await rag.add_text(
"API keys can be generated from the dashboard.",
metadata={"source": "manual_entry"}
)
Cache Warming
For low-latency queries, warm the cache:
# Warm cache for common queries
await rag.warm_cache([
"How do I get started?",
"What are the pricing plans?",
"How do I authenticate?"
])
Configuration Options
Chunk Size
Controls document splitting:
# Smaller chunks (more precise, more vectors)
rag = turbopuffer.TurboPufferRAG(
namespace="docs",
chunk_size=2000,
chunk_overlap=200
)
# Larger chunks (more context, fewer vectors)
rag = turbopuffer.TurboPufferRAG(
namespace="docs",
chunk_size=10000,
chunk_overlap=500
)
Embedding Model
Use different Gemini embedding models:
rag = turbopuffer.TurboPufferRAG(
namespace="docs",
embedding_model="models/gemini-embedding-001" # Default
)
Region Selection
rag = turbopuffer.TurboPufferRAG(
namespace="docs",
region="gcp-us-central1" # Default
)
Environment Variables
# TurboPuffer
TURBO_PUFFER_KEY=your_turbopuffer_api_key_here
# Google API (for embeddings)
GOOGLE_API_KEY=your_google_api_key_here
Features
- Hybrid Search: Combines vector and BM25 for best results
- Reciprocal Rank Fusion: Intelligently merges results from multiple strategies
- Gemini Embeddings: High-quality semantic vectors
- Fast Queries: Low-latency search with cache warming
- Automatic Chunking: Smart document splitting with overlap
- Multiple Formats: Supports text, markdown, PDF, and more
For Speed
rag = turbopuffer.TurboPufferRAG(
namespace="docs",
chunk_size=5000, # Fewer chunks
chunk_overlap=100
)
# Use BM25 for fast keyword search
results = await rag.search("query", mode="bm25")
For Quality
rag = turbopuffer.TurboPufferRAG(
namespace="docs",
chunk_size=10000, # More context
chunk_overlap=500
)
# Use hybrid search
results = await rag.search("query", mode="hybrid")
Dependencies
turbopuffer - TurboPuffer vector database client
langchain-google-genai - Gemini embeddings
langchain-text-splitters - Text chunking utilities
References