ThinkEx uses Google’s Gemini models via the Vercel AI SDK Gateway for intelligent assistance and content processing.
Primary Models
Gemini 2.5 Flash
Model ID: google/gemini-2.5-flash (default)
Google’s latest fast model optimized for speed and efficiency.
Characteristics:
- Speed: Very fast response times
- Context Window: 1M tokens
- Multimodal: Text, images, video, audio
- Thinking: Dynamic reasoning budget
- Grounding: Google Search integration
Best For:
- General chat conversations
- Quick content analysis
- Real-time assistance
- Web search synthesis
Configuration:
const result = await streamText({
model: gateway("google/gemini-2.5-flash"),
temperature: 1.0,
providerOptions: {
google: {
thinkingConfig: {
includeThoughts: true,
},
},
},
});
Gemini 2.5 Flash Lite
Model ID: google/gemini-2.5-flash-lite
Lightweight version optimized for simple tasks.
Characteristics:
- Speed: Fastest response times
- Context Window: 1M tokens
- Cost: Most economical
- Multimodal: Text, images, video, audio
Best For:
- File processing and analysis
- Web search queries
- Simple content extraction
- Background processing tasks
Usage in ThinkEx:
// Web search tool
const { text } = await generateText({
model: google('gemini-2.5-flash-lite'),
tools: {
googleSearch: google.tools.googleSearch({ mode: 'MODE_UNSPECIFIED' }),
},
prompt: `Search for: ${query}`,
});
// File analysis
const { text } = await generateText({
model: google("gemini-2.5-flash-lite"),
messages: [{
role: "user",
content: [
{ type: "text", text: "Analyze this file..." },
{ type: "file", data: fileUrl, mediaType: "application/pdf" },
],
}],
});
Gemini 3 Flash Preview
Model ID: google/gemini-3-flash-preview
Next-generation Gemini model with enhanced reasoning.
Characteristics:
- Thinking: Explicit thinking levels (minimal, standard, deep)
- Context Window: 1M+ tokens
- Reasoning: Enhanced multi-step reasoning
- Multimodal: Advanced vision and audio understanding
Best For:
- Complex problem solving
- Multi-step reasoning tasks
- Advanced content analysis
- Research and synthesis
Configuration:
providerOptions: {
google: {
thinkingConfig: {
includeThoughts: true,
thinkingLevel: "minimal", // "minimal" | "standard" | "deep"
},
},
}
Thinking Levels:
- minimal: Quick reasoning for simple tasks
- standard: Balanced reasoning for most tasks
- deep: Extended reasoning for complex problems
Model Selection
The chat API accepts a modelId parameter:
POST /api/chat
{
"modelId": "google/gemini-2.5-flash",
"messages": [...],
...
}
Auto-Prefixing:
If you provide a model ID without a provider prefix (e.g., gemini-2.5-flash), it’s automatically prefixed with google/:
// These are equivalent:
"gemini-2.5-flash" → "google/gemini-2.5-flash"
Default Model:
If no modelId is specified, the default is google/gemini-2.5-flash.
Model Capabilities
Multimodal Support
All Gemini models support multiple content types:
Text:
{ type: "text", text: "Analyze this content..." }
Images:
{
type: "file",
data: imageUrl, // or base64 data URL
mediaType: "image/jpeg",
filename: "photo.jpg",
}
Videos:
{
type: "file",
data: "https://youtube.com/watch?v=...",
mediaType: "video/mp4",
}
PDFs:
{
type: "file",
data: pdfUrl,
mediaType: "application/pdf",
filename: "document.pdf",
}
Audio:
{
type: "file",
data: audioUrl,
mediaType: "audio/mpeg",
filename: "audio.mp3",
}
All models support function calling:
tools: {
createNote: tool({
description: "Create a note card",
inputSchema: z.object({
title: z.string(),
content: z.string(),
}),
execute: async ({ title, content }) => {
// Implementation
},
}),
}
Grounding
Gemini models support web grounding:
providerOptions: {
google: {
grounding: {
// Google Search integration
},
},
}
ThinkEx uses explicit webSearch tool instead of automatic grounding for better control and source attribution.
Provider Configuration
Google AI Studio
Setup:
- Get API key from Google AI Studio
- Add to environment:
GOOGLE_GENERATIVE_AI_API_KEY=AIza...
Rate Limits:
- Free tier: 15 requests/minute
- Paid tier: Higher limits based on plan
AI Gateway
Optional: Use Vercel AI Gateway for enhanced routing:
AI_GATEWAY_API_KEY=your-gateway-key
Benefits:
- Automatic failover between providers
- Load balancing across models
- Centralized logging and monitoring
- Cost optimization
Web Search
// src/lib/ai/tools/web-search.ts
const { text } = await generateText({
model: google('gemini-2.5-flash-lite'),
tools: {
googleSearch: google.tools.googleSearch({ mode: 'MODE_UNSPECIFIED' }),
},
prompt: query,
});
File Processing
// src/lib/ai/tools/process-files.ts
const { text } = await generateText({
model: google("gemini-2.5-flash-lite"),
messages: [{
role: "user",
content: [
{ type: "text", text: batchPrompt },
...fileInfos.map(f => ({
type: "file",
data: f.fileUrl,
mediaType: f.mediaType,
filename: f.filename,
})),
],
}],
});
URL Processing
// src/lib/ai/tools/process-urls.ts
const { text } = await generateText({
model: google("gemini-2.5-flash"),
prompt: `Analyze content from: ${url}...`,
});
Context Caching
Long context is automatically cached:
onFinish: ({ usage }) => {
console.log({
cachedInputTokens: usage?.cachedInputTokens,
inputTokens: usage?.inputTokens,
});
}
Message Pruning
Reduce token usage by pruning old messages:
const prunedMessages = pruneMessages({
messages: convertedMessages,
reasoning: "before-last-message",
toolCalls: "before-last-5-messages",
emptyMessages: "remove",
});
Streaming
Use streaming for better perceived performance:
const result = streamText({
model,
messages,
experimental_transform: smoothStream({
chunking: "word",
delayInMs: 15,
}),
});
Token Usage Tracking
Per-Step Tracking
onStepFinish: (result) => {
const { usage, finishReason } = result;
console.log({
stepType: result.stepType,
inputTokens: usage?.inputTokens,
outputTokens: usage?.outputTokens,
reasoningTokens: usage?.reasoningTokens,
});
}
Final Usage
onFinish: ({ usage, finishReason }) => {
console.log({
totalTokens: usage?.totalTokens,
cachedInputTokens: usage?.cachedInputTokens,
finishReason,
});
}
Experimental Features
Claude Support (Experimental)
ThinkEx has experimental support for Anthropic’s Claude:
// Special mapping: Claude Sonnet 4.5 → Gemini 3 Flash Preview
if (modelId === "anthropic/claude-sonnet-4.5") {
modelId = "google/gemini-3-flash-preview";
}
Claude support is experimental and not fully tested. Stick with Gemini models for production use.
Cost Optimization
Model Selection Strategy
-
Simple tasks →
gemini-2.5-flash-lite
- File analysis
- Web search
- Content extraction
-
General chat →
gemini-2.5-flash
- User conversations
- Content generation
- Tool orchestration
-
Complex reasoning →
gemini-3-flash-preview
- Multi-step problems
- Research synthesis
- Advanced analysis
Caching Strategy
- PDFs: Cache OCR results after first extraction
- Messages: Use context caching for long conversations
- Files: Store processed results in database
Error Handling
Rate Limit Errors
try {
const result = await streamText({ model, ... });
} catch (error) {
if (error.status === 429) {
// Rate limit exceeded
// Implement exponential backoff
}
}
Timeout Protection
const result = await streamText({
model,
messages,
stopWhen: stepCountIs(25), // Prevent infinite loops
});
Next Steps
AI Overview
Learn about AI architecture and features
AI Tools
Explore available AI tools