Documentation Index
Fetch the complete documentation index at: https://mintlify.com/JetBrains/koog/llms.txt
Use this file to discover all available pages before exploring further.
Google provides the Gemini family of models with native multimodal capabilities (vision, audio, video) and industry-leading context windows up to 1 million tokens.
Installation
The Google client is included in the core Koog library. No additional dependencies required.
Quick Start
import ai.koog.prompt.executor.clients.google.*
import ai.koog.agents.core.*
val executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini2_5Flash
)
val agent = AIAgent(
executor = executor,
tools = toolRegistry {
// Your tools here
}
) {
// Define your agent strategy
}
val result = agent.execute("Analyze this video...")
Authentication
API Key Setup
Get your API key from Google AI Studio.
export GOOGLE_API_KEY=AIza...
Programmatic Configuration
val client = GoogleLLMClient(
apiKey = "AIza...",
settings = GoogleClientSettings(
baseUrl = "https://generativelanguage.googleapis.com",
timeoutConfig = ConnectionTimeoutConfig(
requestTimeoutMillis = 120_000
)
)
)
Available Models
Gemini 2.5 Pro (Most Capable)
Best model for complex tasks requiring advanced reasoning.
GoogleModels.Gemini2_5Pro // 1M context, 65K output
Capabilities:
- 1 million token context window
- Native multimodal (audio, image, video, text)
- Function calling
- Structured outputs (JSON schema)
- Extended thinking
Gemini 2.5 Flash (Recommended)
Best balance of speed and capability.
GoogleModels.Gemini2_5Flash // 1M context, fast, multimodal
Use cases:
- General-purpose applications
- Real-time interactions
- Cost-effective multimodal tasks
- High-throughput scenarios
Gemini 2.5 Flash Lite
Ultra-fast and cost-efficient.
GoogleModels.Gemini2_5FlashLite // 1M context, ultra-fast
Gemini 2.0 Flash
Fast, efficient model for various tasks.
GoogleModels.Gemini2_0Flash // 1M context, 8K output
GoogleModels.Gemini2_0Flash001 // Specific version
Gemini 2.0 Flash Lite
Smallest, most efficient Gemini 2.0 model.
GoogleModels.Gemini2_0FlashLite // Low-latency applications
Gemini 3 Pro Preview
Advanced reasoning with thinking capability.
GoogleModels.Gemini3_Pro_Preview // Latest generation, thinking_level
Embedding Models
GoogleModels.Embeddings.GeminiEmbedding001 // 2048 token context
Code Examples
Basic Chat Completion
val executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini2_5Flash
)
val result = executor.execute(
prompt = prompt {
system("You are a helpful assistant.")
user("Explain machine learning simply.")
}
)
println(result.first().content)
Function Calling
data class CalculatorArgs(val expression: String)
val calculatorTool = tool<CalculatorArgs, Double>(
name = "calculate",
description = "Evaluate a mathematical expression"
) { args ->
// Your calculation logic
42.0
}
val agent = AIAgent(
executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini2_5Flash
),
tools = toolRegistry { tool(calculatorTool) }
) {
defineGraph<String, String>("calculator-agent") {
val response = callLLM()
finish(response)
}
}
val result = agent.execute("What is 25 * 17?")
Vision - Image Analysis
val executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini2_5Flash
)
val result = executor.execute(
prompt = prompt {
user {
text("Describe this image in detail")
image(
bytes = File("image.jpg").readBytes(),
mimeType = "image/jpeg"
)
}
}
)
Video Processing
val executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini2_5Pro
)
val result = executor.execute(
prompt = prompt {
user {
text("Summarize what happens in this video")
video(
bytes = File("video.mp4").readBytes(),
mimeType = "video/mp4"
)
}
}
)
Audio Processing
val executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini2_5Pro
)
val result = executor.execute(
prompt = prompt {
user {
text("Transcribe and summarize this audio")
audio(
bytes = File("audio.mp3").readBytes(),
mimeType = "audio/mpeg"
)
}
}
)
Structured Output
@Serializable
data class Product(
val name: String,
val price: Double,
val category: String
)
val executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini2_5Flash,
params = GoogleParams(
schema = LLMParams.Schema.JSON.Standard(
name = "Product",
schema = /* JSON schema */
)
)
)
val result = executor.execute(
prompt = prompt {
user("Extract product info: iPhone 15 Pro, $999, Electronics")
}
)
val product = Json.decodeFromString<Product>(result.first().content)
Extended Thinking (Gemini 3)
val executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini3_Pro_Preview,
params = GoogleParams(
thinkingConfig = GoogleParams.ThinkingConfig(
thinkingLevel = "high" // low, medium, high
)
)
)
val result = executor.execute(
prompt = prompt {
user("Solve this complex problem: ...")
}
)
// Access reasoning traces
result.forEach { message ->
when (message) {
is Message.Reasoning -> println("Thought: ${message.content}")
is Message.Assistant -> println("Answer: ${message.content}")
}
}
Streaming Responses
val executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini2_5Flash
)
executor.executeStreaming(
prompt = prompt { user("Write a long essay about AI") }
).collect { frame ->
when (frame) {
is StreamFrame.TextDelta -> print(frame.text)
is StreamFrame.End -> println("\nDone!")
else -> {}
}
}
Long Context Processing
val executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini2_5Pro // 1M token context
)
// Can process extremely long documents
val massiveDocument = File("very_large_doc.txt").readText()
val result = executor.execute(
prompt = prompt {
user("Document: $massiveDocument\n\nQuestion: Analyze key patterns")
}
)
Multiple Choices
val executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini2_5Flash,
params = GoogleParams(
numberOfChoices = 3 // Generate 3 different responses
)
)
val choices = executor.executeMultipleChoices(
prompt = prompt { user("Write a creative story opening") }
)
choices.forEach { choice ->
println("Option: ${choice.first().content}")
}
Embeddings
val client = GoogleLLMClient(
apiKey = System.getenv("GOOGLE_API_KEY")
)
val embedding = client.embed(
text = "Machine learning is a subset of artificial intelligence",
model = GoogleModels.Embeddings.GeminiEmbedding001
)
println("Embedding dimensions: ${embedding.size}")
Advanced Configuration
Custom Parameters
val executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini2_5Flash,
params = GoogleParams(
temperature = 0.7,
maxTokens = 8192,
topP = 0.95,
topK = 40,
thinkingConfig = GoogleParams.ThinkingConfig(
thinkingLevel = "medium"
)
)
)
val executor = simpleGoogleExecutor(
apiKey = System.getenv("GOOGLE_API_KEY"),
model = GoogleModels.Gemini2_5Flash,
params = GoogleParams(
toolChoice = LLMParams.ToolChoice.Required // Auto, None, Required
)
)
Model Capabilities
| Model | Context | Output | Vision | Audio | Video | Tools | Thinking |
|---|
| Gemini 2.5 Pro | 1M | 65K | ✅ | ✅ | ✅ | ✅ | ❌ |
| Gemini 2.5 Flash | 1M | 65K | ✅ | ✅ | ✅ | ✅ | ❌ |
| Gemini 2.0 Flash | 1M | 8K | ✅ | ✅ | ✅ | ✅ | ❌ |
| Gemini 3 Pro | 1M | 65K | ✅ | ✅ | ✅ | ✅ | ✅ |
Pricing
Pricing varies by model. See Google AI Pricing for current rates.
Example costs (per 1M tokens):
- Gemini 2.5 Flash: 0.15−1.00 (input) / 0.60−3.50 (output)
- Gemini 2.5 Pro: 1.25−2.50 (input) / 10.00−15.00 (output)
- Gemini 2.0 Flash: 0.10−0.70 (input) / $0.40 (output)
Best Practices
- Use Gemini 2.5 Flash for most tasks - best balance of speed and capability
- Use Gemini 2.5 Pro for complex reasoning and long documents
- Leverage 1M context for document-heavy applications
- Use multimodal inputs - images, audio, video natively supported
- Enable streaming for long responses
- Batch API calls to stay within rate limits
Limitations
- No moderation API: Implement custom content filtering
- Rate limits vary by region and tier
- Video processing can be slow for long videos
Troubleshooting
Rate Limits
val client = GoogleLLMClient(
apiKey = System.getenv("GOOGLE_API_KEY"),
settings = GoogleClientSettings(
timeoutConfig = ConnectionTimeoutConfig(
requestTimeoutMillis = 300_000 // 5 minutes for long videos
)
)
)
Error Handling
try {
val result = executor.execute(prompt { user("Hello") })
} catch (e: LLMClientException) {
when {
e.message?.contains("quota") == true -> {
// Handle quota exceeded
}
e.message?.contains("invalid_argument") == true -> {
// Check request format
}
else -> throw e
}
}
Empty Response Handling
Sometimes Gemini models may return empty parts fields:
val result = try {
executor.execute(prompt { user("Hello") })
} catch (e: LLMClientException) {
if (e.message?.contains("parts field is missing") == true) {
// Retry with different prompt or parameters
executor.execute(prompt {
user("Please respond: Hello")
})
} else throw e
}
Resources