Skip to main content
ThinkEx uses Google’s Gemini models via the Vercel AI SDK Gateway for intelligent assistance and content processing.

Primary Models

Gemini 2.5 Flash

Model ID: google/gemini-2.5-flash (default) Google’s latest fast model optimized for speed and efficiency. Characteristics:
  • Speed: Very fast response times
  • Context Window: 1M tokens
  • Multimodal: Text, images, video, audio
  • Thinking: Dynamic reasoning budget
  • Grounding: Google Search integration
Best For:
  • General chat conversations
  • Quick content analysis
  • Real-time assistance
  • Web search synthesis
Configuration:
const result = await streamText({
  model: gateway("google/gemini-2.5-flash"),
  temperature: 1.0,
  providerOptions: {
    google: {
      thinkingConfig: {
        includeThoughts: true,
      },
    },
  },
});

Gemini 2.5 Flash Lite

Model ID: google/gemini-2.5-flash-lite Lightweight version optimized for simple tasks. Characteristics:
  • Speed: Fastest response times
  • Context Window: 1M tokens
  • Cost: Most economical
  • Multimodal: Text, images, video, audio
Best For:
  • File processing and analysis
  • Web search queries
  • Simple content extraction
  • Background processing tasks
Usage in ThinkEx:
// Web search tool
const { text } = await generateText({
  model: google('gemini-2.5-flash-lite'),
  tools: {
    googleSearch: google.tools.googleSearch({ mode: 'MODE_UNSPECIFIED' }),
  },
  prompt: `Search for: ${query}`,
});

// File analysis
const { text } = await generateText({
  model: google("gemini-2.5-flash-lite"),
  messages: [{
    role: "user",
    content: [
      { type: "text", text: "Analyze this file..." },
      { type: "file", data: fileUrl, mediaType: "application/pdf" },
    ],
  }],
});

Gemini 3 Flash Preview

Model ID: google/gemini-3-flash-preview Next-generation Gemini model with enhanced reasoning. Characteristics:
  • Thinking: Explicit thinking levels (minimal, standard, deep)
  • Context Window: 1M+ tokens
  • Reasoning: Enhanced multi-step reasoning
  • Multimodal: Advanced vision and audio understanding
Best For:
  • Complex problem solving
  • Multi-step reasoning tasks
  • Advanced content analysis
  • Research and synthesis
Configuration:
providerOptions: {
  google: {
    thinkingConfig: {
      includeThoughts: true,
      thinkingLevel: "minimal", // "minimal" | "standard" | "deep"
    },
  },
}
Thinking Levels:
  • minimal: Quick reasoning for simple tasks
  • standard: Balanced reasoning for most tasks
  • deep: Extended reasoning for complex problems

Model Selection

The chat API accepts a modelId parameter:
POST /api/chat
{
  "modelId": "google/gemini-2.5-flash",
  "messages": [...],
  ...
}
Auto-Prefixing: If you provide a model ID without a provider prefix (e.g., gemini-2.5-flash), it’s automatically prefixed with google/:
// These are equivalent:
"gemini-2.5-flash""google/gemini-2.5-flash"
Default Model: If no modelId is specified, the default is google/gemini-2.5-flash.

Model Capabilities

Multimodal Support

All Gemini models support multiple content types: Text:
{ type: "text", text: "Analyze this content..." }
Images:
{
  type: "file",
  data: imageUrl, // or base64 data URL
  mediaType: "image/jpeg",
  filename: "photo.jpg",
}
Videos:
{
  type: "file",
  data: "https://youtube.com/watch?v=...",
  mediaType: "video/mp4",
}
PDFs:
{
  type: "file",
  data: pdfUrl,
  mediaType: "application/pdf",
  filename: "document.pdf",
}
Audio:
{
  type: "file",
  data: audioUrl,
  mediaType: "audio/mpeg",
  filename: "audio.mp3",
}

Tool Calling

All models support function calling:
tools: {
  createNote: tool({
    description: "Create a note card",
    inputSchema: z.object({
      title: z.string(),
      content: z.string(),
    }),
    execute: async ({ title, content }) => {
      // Implementation
    },
  }),
}

Grounding

Gemini models support web grounding:
providerOptions: {
  google: {
    grounding: {
      // Google Search integration
    },
  },
}
ThinkEx uses explicit webSearch tool instead of automatic grounding for better control and source attribution.

Provider Configuration

Google AI Studio

Setup:
  1. Get API key from Google AI Studio
  2. Add to environment:
GOOGLE_GENERATIVE_AI_API_KEY=AIza...
Rate Limits:
  • Free tier: 15 requests/minute
  • Paid tier: Higher limits based on plan

AI Gateway

Optional: Use Vercel AI Gateway for enhanced routing:
AI_GATEWAY_API_KEY=your-gateway-key
Benefits:
  • Automatic failover between providers
  • Load balancing across models
  • Centralized logging and monitoring
  • Cost optimization

Model Usage in Tools

// src/lib/ai/tools/web-search.ts
const { text } = await generateText({
  model: google('gemini-2.5-flash-lite'),
  tools: {
    googleSearch: google.tools.googleSearch({ mode: 'MODE_UNSPECIFIED' }),
  },
  prompt: query,
});

File Processing

// src/lib/ai/tools/process-files.ts
const { text } = await generateText({
  model: google("gemini-2.5-flash-lite"),
  messages: [{
    role: "user",
    content: [
      { type: "text", text: batchPrompt },
      ...fileInfos.map(f => ({
        type: "file",
        data: f.fileUrl,
        mediaType: f.mediaType,
        filename: f.filename,
      })),
    ],
  }],
});

URL Processing

// src/lib/ai/tools/process-urls.ts
const { text } = await generateText({
  model: google("gemini-2.5-flash"),
  prompt: `Analyze content from: ${url}...`,
});

Performance Optimization

Context Caching

Long context is automatically cached:
onFinish: ({ usage }) => {
  console.log({
    cachedInputTokens: usage?.cachedInputTokens,
    inputTokens: usage?.inputTokens,
  });
}

Message Pruning

Reduce token usage by pruning old messages:
const prunedMessages = pruneMessages({
  messages: convertedMessages,
  reasoning: "before-last-message",
  toolCalls: "before-last-5-messages",
  emptyMessages: "remove",
});

Streaming

Use streaming for better perceived performance:
const result = streamText({
  model,
  messages,
  experimental_transform: smoothStream({
    chunking: "word",
    delayInMs: 15,
  }),
});

Token Usage Tracking

Per-Step Tracking

onStepFinish: (result) => {
  const { usage, finishReason } = result;
  console.log({
    stepType: result.stepType,
    inputTokens: usage?.inputTokens,
    outputTokens: usage?.outputTokens,
    reasoningTokens: usage?.reasoningTokens,
  });
}

Final Usage

onFinish: ({ usage, finishReason }) => {
  console.log({
    totalTokens: usage?.totalTokens,
    cachedInputTokens: usage?.cachedInputTokens,
    finishReason,
  });
}

Experimental Features

Claude Support (Experimental)

ThinkEx has experimental support for Anthropic’s Claude:
// Special mapping: Claude Sonnet 4.5 → Gemini 3 Flash Preview
if (modelId === "anthropic/claude-sonnet-4.5") {
  modelId = "google/gemini-3-flash-preview";
}
Claude support is experimental and not fully tested. Stick with Gemini models for production use.

Cost Optimization

Model Selection Strategy

  1. Simple tasksgemini-2.5-flash-lite
    • File analysis
    • Web search
    • Content extraction
  2. General chatgemini-2.5-flash
    • User conversations
    • Content generation
    • Tool orchestration
  3. Complex reasoninggemini-3-flash-preview
    • Multi-step problems
    • Research synthesis
    • Advanced analysis

Caching Strategy

  • PDFs: Cache OCR results after first extraction
  • Messages: Use context caching for long conversations
  • Files: Store processed results in database

Error Handling

Rate Limit Errors

try {
  const result = await streamText({ model, ... });
} catch (error) {
  if (error.status === 429) {
    // Rate limit exceeded
    // Implement exponential backoff
  }
}

Timeout Protection

const result = await streamText({
  model,
  messages,
  stopWhen: stepCountIs(25), // Prevent infinite loops
});

Next Steps

AI Overview

Learn about AI architecture and features

AI Tools

Explore available AI tools

Build docs developers (and LLMs) love