LLM Clients

Clients in BAML define which LLM provider and model to use for your functions. BAML supports all major LLM providers and can work with any OpenAI-compatible API.

Quick Start: Shorthand Syntax

The fastest way to use a client is with the shorthand syntax:

function MakeHaiku(topic: string) -> string {
  client "openai/gpt-4o"
  prompt #"
    Write a haiku about {{ topic }}.
  "#
}

Format: "<provider>/<model>" This assumes you have the appropriate API key in your environment:

OPENAI_API_KEY for OpenAI
ANTHROPIC_API_KEY for Anthropic
GOOGLE_API_KEY for Google AI
etc.

Common Shorthand Examples

client "openai/gpt-4o"                    // OpenAI GPT-4o
client "openai/gpt-4o-mini"               // OpenAI GPT-4o Mini
client "anthropic/claude-sonnet-4"        // Anthropic Claude
client "google-ai/gemini-2.0-flash"       // Google Gemini

Named Client Configuration

For more control, define named clients:

client<llm> GPT4o {
  provider "openai"
  options {
    model "gpt-4o"
    api_key env.OPENAI_API_KEY
    temperature 0.7
    max_tokens 1000
  }
}

function Summarize(text: string) -> string {
  client GPT4o
  prompt #"
    Summarize: {{ text }}
  "#
}

Client Anatomy

Declaration: client<llm> ClientName
Provider: Which API provider to use
Options: Model, credentials, and parameters

Supported Providers

BAML supports all major LLM providers:

OpenAI

client<llm> GPT4 {
  provider "openai"
  options {
    model "gpt-4o"
    api_key env.OPENAI_API_KEY
    temperature 0.0
    max_tokens 2000
  }
}

Anthropic

client<llm> Claude {
  provider "anthropic"
  options {
    model "claude-sonnet-4-20250514"
    api_key env.ANTHROPIC_API_KEY
    max_tokens 1000
    temperature 1.0
  }
}

Google AI (Gemini)

client<llm> Gemini {
  provider "google-ai"
  options {
    model "gemini-2.0-flash"
    api_key env.GOOGLE_API_KEY
  }
}

AWS Bedrock

client<llm> BedrockClaude {
  provider "aws-bedrock"
  options {
    model "anthropic.claude-3-sonnet-20240229-v1:0"
    region "us-west-2"
  }
}

Azure OpenAI

client<llm> AzureGPT {
  provider "azure-openai"
  options {
    resource_name "my-resource"
    deployment_id "gpt-4-deployment"
    api_key env.AZURE_OPENAI_KEY
  }
}

OpenAI-Compatible (Ollama, OpenRouter, etc.)

client<llm> Ollama {
  provider "openai-generic"
  options {
    base_url "http://localhost:11434/v1"
    model "llama2"
    api_key "ollama"  // Ollama doesn't require a real key
  }
}

client<llm> OpenRouter {
  provider "openai-generic"
  options {
    base_url "https://openrouter.ai/api/v1"
    model "anthropic/claude-3-opus"
    api_key env.OPENROUTER_API_KEY
  }
}

See the Provider Reference for all supported providers.

Common Options

These options work across most providers:

client<llm> MyClient {
  provider "openai"
  options {
    model "gpt-4o"              // Required: which model to use
    api_key env.MY_API_KEY      // API key from environment
    temperature 0.7             // Sampling temperature (0-2)
    max_tokens 1000             // Max tokens to generate
    top_p 0.9                   // Nucleus sampling
    
    // Custom headers
    headers {
      "anthropic-beta" "prompt-caching-2024-07-31"
    }
  }
}

Environment Variables

Access environment variables with env.VARIABLE_NAME:

options {
  api_key env.OPENAI_API_KEY
  base_url env.CUSTOM_ENDPOINT
}

Custom Headers

Add custom headers for beta features or authentication:

options {
  model "claude-3-opus"
  api_key env.ANTHROPIC_API_KEY
  headers {
    "anthropic-beta" "prompt-caching-2024-07-31"
    "anthropic-version" "2023-06-01"
  }
}

Retry Policies

Add automatic retries for transient failures:

retry_policy CustomRetry {
  max_retries 3
}

client<llm> ResilientGPT {
  provider "openai"
  retry_policy CustomRetry
  options {
    model "gpt-4o"
    api_key env.OPENAI_API_KEY
  }
}

Advanced retry options:

retry_policy AggressiveRetry {
  max_retries 5
  strategy {
    type "exponential_backoff"
    initial_delay_ms 1000
    max_delay_ms 10000
  }
}

See Retry Policy Reference for details.

Fallback Clients

Automatically fall back to another model if the primary fails:

client<llm> GPT4WithFallback {
  provider "openai"
  options {
    model "gpt-4o"
    api_key env.OPENAI_API_KEY
  }
}

client<llm> ClaudeFallback {
  provider "anthropic"
  options {
    model "claude-sonnet-3.5"
    api_key env.ANTHROPIC_API_KEY
  }
}

client<llm> ResilientClient {
  strategy {
    type "fallback"
    clients [GPT4WithFallback, ClaudeFallback]
  }
}

function Extract(text: string) -> Data {
  client ResilientClient  // Tries GPT-4, falls back to Claude
  prompt #"..."
}

See Fallback Strategy for more.

Round Robin

Distribute requests across multiple models:

client<llm> LoadBalanced {
  strategy {
    type "round_robin"
    clients [GPT4o, Claude, Gemini]
  }
}

Each request rotates through the client list. Useful for:

Load distribution
Cost optimization
A/B testing different models

See Round Robin Strategy.

Runtime Client Selection

Choose the client dynamically at runtime using the Client Registry:

from baml_client import b

# Use a different client for this call
result = b.ExtractResume(
    resume_text,
    baml_options={
        "client_registry": {
            "client_1": "openai/gpt-4o-mini",
            "client_2": "anthropic/claude-sonnet-3.5"
        }
    }
)

This is useful for:

Feature flags (send 10% to a new model)
User-based routing (premium users get better models)
Dynamic cost optimization

See Client Registry for details.

Switching Models

Switching models is as simple as changing one line:

function Extract(text: string) -> Data {
-  client "openai/gpt-4o"
+  client "anthropic/claude-sonnet-4"
  prompt #"..."
}

BAML handles all the differences in:

API formats
Authentication
Response parsing
Structured output support

Schema-Aligned Parsing (SAP)

BAML’s SAP algorithm works with any model, even those without native structured output support:

Works on day one of new model releases
Handles models without tool calling (like O1, DeepSeek R1)
Parses markdown-wrapped JSON
Accepts chain-of-thought before JSON
Tolerates minor formatting issues

This means you can use BAML with:

Brand new models before official API support
Open-source models
Fine-tuned models
Models without structured output APIs

Provider-Specific Features

Some providers have unique capabilities:

Anthropic Prompt Caching

client<llm> CachedClaude {
  provider "anthropic"
  options {
    model "claude-sonnet-3.5"
    api_key env.ANTHROPIC_API_KEY
    headers {
      "anthropic-beta" "prompt-caching-2024-07-31"
    }
  }
}

See Prompt Caching.

OpenAI Response Format

client<llm> StructuredGPT {
  provider "openai"
  options {
    model "gpt-4o"
    api_key env.OPENAI_API_KEY
    response_format {
      type "json_object"
    }
  }
}

Testing with Different Clients

Test the same function with multiple models:

function Classify(text: string) -> Category {
  client GPT4o
  prompt #"..."
}

test TestWithGPT {
  functions [Classify]
  args { text "Sample input" }
}

test TestWithClaude {
  functions [Classify]
  override {
    client "anthropic/claude-sonnet-4"
  }
  args { text "Sample input" }
}

The VSCode playground lets you run tests against different models to compare:

Accuracy
Latency
Cost
Output quality

Best Practices

Use named clients for configuration: Easier to maintain than inline options
Store API keys in environment variables: Never hardcode credentials
Add retry policies: Handle transient failures gracefully
Use fallbacks for critical paths: Ensure high availability
Test with multiple models: Find the best model for your use case
Monitor costs: Different models have different pricing
Use round robin for load balancing: Distribute load across providers

Example: Production-Ready Configuration

Here’s a complete example with retries and fallbacks:

// Retry policy for transient failures
retry_policy StandardRetry {
  max_retries 3
}

// Primary client
client<llm> PrimaryGPT {
  provider "openai"
  retry_policy StandardRetry
  options {
    model "gpt-4o"
    api_key env.OPENAI_API_KEY
    temperature 0.0
    max_tokens 2000
  }
}

// Fallback client
client<llm> FallbackClaude {
  provider "anthropic"
  retry_policy StandardRetry
  options {
    model "claude-sonnet-3.5"
    api_key env.ANTHROPIC_API_KEY
    max_tokens 2000
  }
}

// Combined resilient client
client<llm> Production {
  strategy {
    type "fallback"
    clients [PrimaryGPT, FallbackClaude]
  }
}

// Use in functions
function ExtractData(text: string) -> Data {
  client Production
  prompt #"
    Extract structured data:
    {{ text }}
    {{ ctx.output_format }}
  "#
}

Next Steps

Functions

Use clients in BAML functions

Testing

Test with different clients

Provider Reference

Complete provider documentation

Client Registry

Runtime client selection

Get Started

Installation

Core Concepts

Guides

Advanced

Deployment

LLM Clients

LLM Clients

Quick Start: Shorthand Syntax

Common Shorthand Examples

Named Client Configuration

Client Anatomy

Supported Providers

OpenAI

Anthropic

Google AI (Gemini)

AWS Bedrock

Azure OpenAI

OpenAI-Compatible (Ollama, OpenRouter, etc.)

Common Options

Environment Variables

Custom Headers

Retry Policies

Fallback Clients

Round Robin

Runtime Client Selection

Switching Models

Schema-Aligned Parsing (SAP)

Provider-Specific Features

Anthropic Prompt Caching

OpenAI Response Format

Testing with Different Clients

Best Practices

Example: Production-Ready Configuration

Next Steps

Functions

Testing

Provider Reference

Client Registry

Build docs developers (and LLMs) love

Get Started

Installation

Core Concepts

Guides

Advanced

Deployment

Documentation Index

​LLM Clients

​Quick Start: Shorthand Syntax

​Common Shorthand Examples

​Named Client Configuration

​Client Anatomy

​Supported Providers

​OpenAI

​Anthropic

​Google AI (Gemini)

​AWS Bedrock

​Azure OpenAI

​OpenAI-Compatible (Ollama, OpenRouter, etc.)

​Common Options

​Environment Variables

​Custom Headers

​Retry Policies

​Fallback Clients

​Round Robin

​Runtime Client Selection

​Switching Models

​Schema-Aligned Parsing (SAP)

​Provider-Specific Features

​Anthropic Prompt Caching

​OpenAI Response Format

​Testing with Different Clients

​Best Practices

​Example: Production-Ready Configuration

​Next Steps

Functions

Testing

Provider Reference

Client Registry

Build docs developers (and LLMs) love

LLM Clients

Quick Start: Shorthand Syntax

Common Shorthand Examples

Named Client Configuration

Client Anatomy

Supported Providers

OpenAI

Anthropic

Google AI (Gemini)

AWS Bedrock

Azure OpenAI

OpenAI-Compatible (Ollama, OpenRouter, etc.)

Common Options

Environment Variables

Custom Headers

Retry Policies

Fallback Clients

Round Robin

Runtime Client Selection

Switching Models

Schema-Aligned Parsing (SAP)

Provider-Specific Features

Anthropic Prompt Caching

OpenAI Response Format

Testing with Different Clients

Best Practices

Example: Production-Ready Configuration

Next Steps