Skip to main content
Mistral AI provides frontier-class language models with strong performance in reasoning, coding, and multimodal tasks. Models include vision support, document processing, and specialized coding capabilities.

Installation

The Mistral AI client is included in the core Koog library. No additional dependencies required.

Quick Start

import ai.koog.prompt.executor.clients.mistralai.*
import ai.koog.agents.core.*

val executor = simpleMistralAIExecutor(
    apiKey = System.getenv("MISTRAL_API_KEY")
)

val agent = AIAgent(
    executor = executor,
    llmModel = MistralAIModels.Chat.MistralLarge21,
    tools = toolRegistry {
        // Your tools here
    }
) {
    // Define your agent strategy
}

val result = agent.execute("Analyze this codebase...")

Authentication

API Key Setup

Get your API key from Mistral AI Platform.
export MISTRAL_API_KEY=...

Programmatic Configuration

val client = MistralAILLMClient(
    apiKey = "...",
    settings = MistralAIClientSettings(
        baseUrl = "https://api.mistral.ai",
        timeoutConfig = ConnectionTimeoutConfig(
            requestTimeoutMillis = 120_000
        )
    )
)

Available Models

Chat Models

General-purpose conversation and reasoning models.
MistralAIModels.Chat.MistralMedium31     // 128K context, multimodal
MistralAIModels.Chat.MistralLarge21      // 128K context, top-tier reasoning
MistralAIModels.Chat.MistralSmall2       // 32K context, efficient
Mistral Medium 3.1 (Premier)
  • 128K context window
  • Multimodal: vision, images, documents
  • Tools and function calling
  • Structured JSON outputs
Mistral Large 2.1 (Premier)
  • 128K context window
  • Most capable for complex reasoning
  • Tools and function calling
  • Structured JSON outputs
Mistral Small 2 (Premier)
  • 32K context window
  • Efficient for standard tasks
  • Tools and function calling

Reasoning Models

Advanced models with extended reasoning capabilities.
MistralAIModels.Chat.MagistralMedium12   // 128K context, frontier reasoning + vision
Magistral Medium 1.2 (Premier)
  • 128K context window
  • Advanced reasoning with speculation
  • Vision and document processing
  • Structured JSON outputs

Coding Models

Specialized models optimized for software engineering tasks.
MistralAIModels.Chat.Codestral           // 256K context, low-latency coding
MistralAIModels.Chat.DevstralMedium      // 128K context, enterprise coding
Codestral 2508 (Premier)
  • 256K context window
  • Fill-in-the-middle, code completion
  • Code correction and test generation
  • Low-latency responses
Devstral Medium (Premier)
  • 128K context window
  • Codebase exploration and editing
  • Multi-file editing
  • Software engineering agents

Embedding Models

Vector embeddings for semantic search and RAG.
MistralAIModels.Embeddings.MistralEmbed      // 8K context, text embeddings
MistralAIModels.Embeddings.CodestralEmbed    // 8K context, code embeddings

Moderation Model

Content safety and harmful content detection.
MistralAIModels.Moderation.MistralModeration // 8K context, safety checks

Code Examples

Basic Chat Completion

val executor = simpleMistralAIExecutor(
    apiKey = System.getenv("MISTRAL_API_KEY")
)

val result = executor.execute(
    model = MistralAIModels.Chat.MistralLarge21,
    prompt = prompt {
        system("You are a helpful AI assistant.")
        user("Explain quantum entanglement.")
    }
)

println(result.first().content)

Function Calling

data class CodeAnalysisArgs(val code: String, val language: String)

val analysisTool = tool<CodeAnalysisArgs, String>(
    name = "analyze_code",
    description = "Analyze code quality and suggest improvements"
) { args ->
    "Analysis for ${args.language}: ${args.code.length} chars"
}

val agent = AIAgent(
    executor = simpleMistralAIExecutor(
        apiKey = System.getenv("MISTRAL_API_KEY")
    ),
    llmModel = MistralAIModels.Chat.Codestral,
    tools = toolRegistry { tool(analysisTool) }
) {
    defineGraph<String, String>("code-agent") {
        val response = callLLM()
        finish(response)
    }
}

val result = agent.execute("Analyze this Python function: def add(a, b): return a + b")

Vision - Image Analysis

val executor = simpleMistralAIExecutor(
    apiKey = System.getenv("MISTRAL_API_KEY")
)

val result = executor.execute(
    model = MistralAIModels.Chat.MistralMedium31,
    prompt = prompt {
        user {
            text("What's in this image?")
            image(
                url = "https://example.com/diagram.png"
                // or: bytes = imageBytes
            )
        }
    }
)

Document Processing

val executor = simpleMistralAIExecutor(
    apiKey = System.getenv("MISTRAL_API_KEY")
)

val result = executor.execute(
    model = MistralAIModels.Chat.MagistralMedium12,
    prompt = prompt {
        user {
            text("Summarize this document")
            file(
                bytes = pdfBytes,
                mimeType = "application/pdf",
                fileName = "report.pdf"
            )
        }
    }
)

Structured Output

@Serializable
data class CodeReview(
    val issues: List<String>,
    val suggestions: List<String>,
    val rating: Int
)

val executor = simpleMistralAIExecutor(
    apiKey = System.getenv("MISTRAL_API_KEY")
)

val result = executor.execute(
    model = MistralAIModels.Chat.Codestral,
    params = MistralAIParams(
        schema = LLMParams.Schema.JSON.Standard(
            name = "CodeReview",
            schema = /* JSON schema */
        )
    ),
    prompt = prompt {
        user("Review this code: fun add(a: Int, b: Int) = a + b")
    }
)

val review = Json.decodeFromString<CodeReview>(result.first().content)

Embeddings

val client = MistralAILLMClient(
    apiKey = System.getenv("MISTRAL_API_KEY")
)

val embedding = client.embed(
    text = "Kotlin is a modern programming language",
    model = MistralAIModels.Embeddings.MistralEmbed
)

println("Embedding dimensions: ${embedding.size}") // 1024

Code Embeddings

val client = MistralAILLMClient(
    apiKey = System.getenv("MISTRAL_API_KEY")
)

val codeEmbedding = client.embed(
    text = "fun factorial(n: Int): Int = if (n <= 1) 1 else n * factorial(n - 1)",
    model = MistralAIModels.Embeddings.CodestralEmbed
)

Content Moderation

val client = MistralAILLMClient(
    apiKey = System.getenv("MISTRAL_API_KEY")
)

val result = client.moderate(
    prompt = prompt { user("Some potentially harmful content") },
    model = MistralAIModels.Moderation.MistralModeration
)

if (result.isHarmful) {
    println("Content flagged: ${result.categories}")
}

Streaming Responses

val executor = simpleMistralAIExecutor(
    apiKey = System.getenv("MISTRAL_API_KEY")
)

executor.executeStreaming(
    model = MistralAIModels.Chat.MistralLarge21,
    prompt = prompt { user("Write a detailed technical explanation...") }
).collect { frame ->
    when (frame) {
        is StreamFrame.TextDelta -> print(frame.text)
        is StreamFrame.End -> println("\nComplete")
        else -> {}
    }
}

Advanced Configuration

Custom Parameters

val executor = simpleMistralAIExecutor(
    apiKey = System.getenv("MISTRAL_API_KEY")
)

executor.execute(
    model = MistralAIModels.Chat.MistralLarge21,
    params = MistralAIParams(
        temperature = 0.7,
        maxTokens = 4096,
        topP = 0.9,
        randomSeed = 42,
        presencePenalty = 0.5,
        frequencyPenalty = 0.5,
        safePrompt = true, // Enable content safety
        parallelToolCalls = true
    ),
    prompt = prompt { user("Generate creative content") }
)

Speculation (Predictive Decoding)

Use speculation for faster generation when output is predictable:
executor.execute(
    model = MistralAIModels.Chat.MagistralMedium12,
    params = MistralAIParams(
        speculation = "Expected pattern to predict"
    ),
    prompt = prompt { user("Complete this sentence: The capital of France is") }
)

Tool Choice Control

executor.execute(
    model = MistralAIModels.Chat.MistralLarge21,
    params = MistralAIParams(
        toolChoice = LLMParams.ToolChoice.Required // Auto, None, Required, Named
    ),
    prompt = prompt { user("Search for information") }
)

Model Capabilities

ModelContextVisionDocsToolsEmbeddingsModeration
Mistral Medium 3.1128K
Mistral Large 2.1128K
Mistral Small 232K
Magistral Medium 1.2128K
Codestral256K
Devstral Medium128K
Mistral Embed8K
Codestral Embed8K
Mistral Moderation8K

Pricing

Pricing varies by model. See Mistral AI Pricing for current rates. Model Tiers:
  • Premier Models: Mistral Medium 3.1, Large 2.1, Small 2, Magistral Medium 1.2, Codestral, Devstral Medium
  • All premier models require API access and have usage-based pricing

Best Practices

  1. Use Mistral Large 2.1 for complex reasoning and high-stakes tasks
  2. Use Codestral for code completion, fill-in-the-middle, and low-latency coding
  3. Use Devstral Medium for codebase exploration and multi-file editing
  4. Use Mistral Medium 3.1 for multimodal tasks with vision
  5. Enable safe prompt for user-facing applications
  6. Use Codestral Embed for code search and semantic code analysis
  7. Leverage 256K context in Codestral for large codebase analysis

Use Cases

Use Devstral Medium or Codestral for building agents that explore codebases, edit multiple files, and assist with software development tasks.
Use Codestral for low-latency code completion, fill-in-the-middle, test generation, and code corrections.
Use Mistral Medium 3.1 or Magistral Medium 1.2 for processing images, documents, and mixed-content analysis.
Use Mistral Large 2.1 or Magistral Medium 1.2 for tasks requiring advanced reasoning and problem-solving.
Use Codestral Embed for semantic code search, code similarity, and RAG applications with code.
Use Mistral Moderation to detect and filter harmful content in user inputs.

Troubleshooting

Rate Limits

val client = MistralAILLMClient(
    apiKey = System.getenv("MISTRAL_API_KEY"),
    settings = MistralAIClientSettings(
        timeoutConfig = ConnectionTimeoutConfig(
            requestTimeoutMillis = 180_000 // 3 minutes for large requests
        )
    )
)

Error Handling

try {
    val result = executor.execute(
        model = MistralAIModels.Chat.MistralLarge21,
        prompt = prompt { user("Hello") }
    )
} catch (e: LLMClientException) {
    when {
        e.message?.contains("rate_limit") == true -> {
            // Handle rate limiting
        }
        e.message?.contains("invalid_request") == true -> {
            // Check request format
        }
        else -> throw e
    }
}

Resources

Build docs developers (and LLMs) love