Skip to main content
Google provides the Gemini family of models with native multimodal capabilities (vision, audio, video) and industry-leading context windows up to 1 million tokens.

Installation

The Google client is included in the core Koog library. No additional dependencies required.

Quick Start

import ai.koog.prompt.executor.clients.google.*
import ai.koog.agents.core.*

val executor = simpleGoogleExecutor(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    model = GoogleModels.Gemini2_5Flash
)

val agent = AIAgent(
    executor = executor,
    tools = toolRegistry {
        // Your tools here
    }
) {
    // Define your agent strategy
}

val result = agent.execute("Analyze this video...")

Authentication

API Key Setup

Get your API key from Google AI Studio.
export GOOGLE_API_KEY=AIza...

Programmatic Configuration

val client = GoogleLLMClient(
    apiKey = "AIza...",
    settings = GoogleClientSettings(
        baseUrl = "https://generativelanguage.googleapis.com",
        timeoutConfig = ConnectionTimeoutConfig(
            requestTimeoutMillis = 120_000
        )
    )
)

Available Models

Gemini 2.5 Pro (Most Capable)

Best model for complex tasks requiring advanced reasoning.
GoogleModels.Gemini2_5Pro         // 1M context, 65K output
Capabilities:
  • 1 million token context window
  • Native multimodal (audio, image, video, text)
  • Function calling
  • Structured outputs (JSON schema)
  • Extended thinking
Best balance of speed and capability.
GoogleModels.Gemini2_5Flash       // 1M context, fast, multimodal
Use cases:
  • General-purpose applications
  • Real-time interactions
  • Cost-effective multimodal tasks
  • High-throughput scenarios

Gemini 2.5 Flash Lite

Ultra-fast and cost-efficient.
GoogleModels.Gemini2_5FlashLite   // 1M context, ultra-fast

Gemini 2.0 Flash

Fast, efficient model for various tasks.
GoogleModels.Gemini2_0Flash       // 1M context, 8K output
GoogleModels.Gemini2_0Flash001    // Specific version

Gemini 2.0 Flash Lite

Smallest, most efficient Gemini 2.0 model.
GoogleModels.Gemini2_0FlashLite   // Low-latency applications

Gemini 3 Pro Preview

Advanced reasoning with thinking capability.
GoogleModels.Gemini3_Pro_Preview  // Latest generation, thinking_level

Embedding Models

GoogleModels.Embeddings.GeminiEmbedding001  // 2048 token context

Code Examples

Basic Chat Completion

val executor = simpleGoogleExecutor(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    model = GoogleModels.Gemini2_5Flash
)

val result = executor.execute(
    prompt = prompt {
        system("You are a helpful assistant.")
        user("Explain machine learning simply.")
    }
)

println(result.first().content)

Function Calling

data class CalculatorArgs(val expression: String)

val calculatorTool = tool<CalculatorArgs, Double>(
    name = "calculate",
    description = "Evaluate a mathematical expression"
) { args ->
    // Your calculation logic
    42.0
}

val agent = AIAgent(
    executor = simpleGoogleExecutor(
        apiKey = System.getenv("GOOGLE_API_KEY"),
        model = GoogleModels.Gemini2_5Flash
    ),
    tools = toolRegistry { tool(calculatorTool) }
) {
    defineGraph<String, String>("calculator-agent") {
        val response = callLLM()
        finish(response)
    }
}

val result = agent.execute("What is 25 * 17?")

Vision - Image Analysis

val executor = simpleGoogleExecutor(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    model = GoogleModels.Gemini2_5Flash
)

val result = executor.execute(
    prompt = prompt {
        user {
            text("Describe this image in detail")
            image(
                bytes = File("image.jpg").readBytes(),
                mimeType = "image/jpeg"
            )
        }
    }
)

Video Processing

val executor = simpleGoogleExecutor(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    model = GoogleModels.Gemini2_5Pro
)

val result = executor.execute(
    prompt = prompt {
        user {
            text("Summarize what happens in this video")
            video(
                bytes = File("video.mp4").readBytes(),
                mimeType = "video/mp4"
            )
        }
    }
)

Audio Processing

val executor = simpleGoogleExecutor(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    model = GoogleModels.Gemini2_5Pro
)

val result = executor.execute(
    prompt = prompt {
        user {
            text("Transcribe and summarize this audio")
            audio(
                bytes = File("audio.mp3").readBytes(),
                mimeType = "audio/mpeg"
            )
        }
    }
)

Structured Output

@Serializable
data class Product(
    val name: String,
    val price: Double,
    val category: String
)

val executor = simpleGoogleExecutor(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    model = GoogleModels.Gemini2_5Flash,
    params = GoogleParams(
        schema = LLMParams.Schema.JSON.Standard(
            name = "Product",
            schema = /* JSON schema */
        )
    )
)

val result = executor.execute(
    prompt = prompt {
        user("Extract product info: iPhone 15 Pro, $999, Electronics")
    }
)

val product = Json.decodeFromString<Product>(result.first().content)

Extended Thinking (Gemini 3)

val executor = simpleGoogleExecutor(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    model = GoogleModels.Gemini3_Pro_Preview,
    params = GoogleParams(
        thinkingConfig = GoogleParams.ThinkingConfig(
            thinkingLevel = "high" // low, medium, high
        )
    )
)

val result = executor.execute(
    prompt = prompt {
        user("Solve this complex problem: ...")
    }
)

// Access reasoning traces
result.forEach { message ->
    when (message) {
        is Message.Reasoning -> println("Thought: ${message.content}")
        is Message.Assistant -> println("Answer: ${message.content}")
    }
}

Streaming Responses

val executor = simpleGoogleExecutor(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    model = GoogleModels.Gemini2_5Flash
)

executor.executeStreaming(
    prompt = prompt { user("Write a long essay about AI") }
).collect { frame ->
    when (frame) {
        is StreamFrame.TextDelta -> print(frame.text)
        is StreamFrame.End -> println("\nDone!")
        else -> {}
    }
}

Long Context Processing

val executor = simpleGoogleExecutor(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    model = GoogleModels.Gemini2_5Pro // 1M token context
)

// Can process extremely long documents
val massiveDocument = File("very_large_doc.txt").readText()

val result = executor.execute(
    prompt = prompt {
        user("Document: $massiveDocument\n\nQuestion: Analyze key patterns")
    }
)

Multiple Choices

val executor = simpleGoogleExecutor(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    model = GoogleModels.Gemini2_5Flash,
    params = GoogleParams(
        numberOfChoices = 3 // Generate 3 different responses
    )
)

val choices = executor.executeMultipleChoices(
    prompt = prompt { user("Write a creative story opening") }
)

choices.forEach { choice ->
    println("Option: ${choice.first().content}")
}

Embeddings

val client = GoogleLLMClient(
    apiKey = System.getenv("GOOGLE_API_KEY")
)

val embedding = client.embed(
    text = "Machine learning is a subset of artificial intelligence",
    model = GoogleModels.Embeddings.GeminiEmbedding001
)

println("Embedding dimensions: ${embedding.size}")

Advanced Configuration

Custom Parameters

val executor = simpleGoogleExecutor(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    model = GoogleModels.Gemini2_5Flash,
    params = GoogleParams(
        temperature = 0.7,
        maxTokens = 8192,
        topP = 0.95,
        topK = 40,
        thinkingConfig = GoogleParams.ThinkingConfig(
            thinkingLevel = "medium"
        )
    )
)

Tool Choice Control

val executor = simpleGoogleExecutor(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    model = GoogleModels.Gemini2_5Flash,
    params = GoogleParams(
        toolChoice = LLMParams.ToolChoice.Required // Auto, None, Required
    )
)

Model Capabilities

ModelContextOutputVisionAudioVideoToolsThinking
Gemini 2.5 Pro1M65K
Gemini 2.5 Flash1M65K
Gemini 2.0 Flash1M8K
Gemini 3 Pro1M65K

Pricing

Pricing varies by model. See Google AI Pricing for current rates. Example costs (per 1M tokens):
  • Gemini 2.5 Flash: 0.150.15-1.00 (input) / 0.600.60-3.50 (output)
  • Gemini 2.5 Pro: 1.251.25-2.50 (input) / 10.0010.00-15.00 (output)
  • Gemini 2.0 Flash: 0.100.10-0.70 (input) / $0.40 (output)

Best Practices

  1. Use Gemini 2.5 Flash for most tasks - best balance of speed and capability
  2. Use Gemini 2.5 Pro for complex reasoning and long documents
  3. Leverage 1M context for document-heavy applications
  4. Use multimodal inputs - images, audio, video natively supported
  5. Enable streaming for long responses
  6. Batch API calls to stay within rate limits

Limitations

  • No moderation API: Implement custom content filtering
  • Rate limits vary by region and tier
  • Video processing can be slow for long videos

Troubleshooting

Rate Limits

val client = GoogleLLMClient(
    apiKey = System.getenv("GOOGLE_API_KEY"),
    settings = GoogleClientSettings(
        timeoutConfig = ConnectionTimeoutConfig(
            requestTimeoutMillis = 300_000 // 5 minutes for long videos
        )
    )
)

Error Handling

try {
    val result = executor.execute(prompt { user("Hello") })
} catch (e: LLMClientException) {
    when {
        e.message?.contains("quota") == true -> {
            // Handle quota exceeded
        }
        e.message?.contains("invalid_argument") == true -> {
            // Check request format
        }
        else -> throw e
    }
}

Empty Response Handling

Sometimes Gemini models may return empty parts fields:
val result = try {
    executor.execute(prompt { user("Hello") })
} catch (e: LLMClientException) {
    if (e.message?.contains("parts field is missing") == true) {
        // Retry with different prompt or parameters
        executor.execute(prompt { 
            user("Please respond: Hello") 
        })
    } else throw e
}

Resources

Build docs developers (and LLMs) love