LLM Models

Overview

Chunkr supports any OpenAI-compatible API for LLM processing. You can configure multiple models with different providers, set rate limits, and specify default and fallback models.

Configuration File

LLM models are configured in a models.yaml file. Copy models.example.yaml to get started:

cp models.example.yaml models.yaml

Set the path to your models configuration file using the environment variable:

LLM__MODELS_PATH=./models.yaml

Model Configuration Structure

Each model in the configuration requires:

string

required

Unique identifier for the model. Use this ID in POST and PATCH requests to reference the model.

model

string

required

The model name/identifier used by the provider (e.g., gpt-4o, gemini-2.0-flash-lite).

provider_url

string

required

The API endpoint URL for the provider’s chat completions service.

api_key

string

required

Your API key for authentication with the provider.

default

boolean

Mark one model as the default. This model is used when no specific model is requested.

fallback

boolean

Mark one model as the fallback. Used when FallbackStrategy::Default is configured.

rate-limit

integer

Optional rate limit in requests per minute for this model.

You must configure exactly one default model and one fallback model (they can be the same model).

Provider Examples

OpenAI
Google AI Studio
OpenRouter
Self-Hosted

models.yaml

models:
  - id: gpt-4o
    model: gpt-4o
    provider_url: https://api.openai.com/v1/chat/completions
    api_key: "your_openai_api_key_here"
    default: true
    rate-limit: 200

Rate limits help prevent API quota exhaustion. OpenAI has different rate limits based on your tier.

models.yaml

models:
  - id: gemini-2.0-flash-lite
    model: gemini-2.0-flash-lite
    provider_url: https://generativelanguage.googleapis.com/v1beta/openai/chat/completions
    api_key: "your_google_ai_studio_api_key_here"
    fallback: true

Google AI Studio provides an OpenAI-compatible endpoint for Gemini models.

models.yaml

models:
  - id: gemini-pro-1.5
    model: google/gemini-pro-1.5
    provider_url: https://openrouter.ai/api/v1/chat/completions
    api_key: "your_openrouter_api_key_here"

OpenRouter provides access to multiple LLM providers through a unified API. Use the provider-prefixed model names (e.g., google/gemini-pro-1.5).

models.yaml

models:
  - id: local-llm
    model: mistral-7b
    provider_url: http://localhost:8000/v1/chat/completions
    api_key: "your_local_api_key_or_leave_empty_if_not_required"

Any OpenAI-compatible server works, including:

vLLM
LocalAI
Ollama (with OpenAI compatibility layer)
Text Generation Inference

Leave the api_key empty or use a placeholder if your self-hosted server doesn’t require authentication.

Complete Example

Here’s a complete models.yaml with multiple providers:

models.yaml

models:
  # Primary model (OpenAI)
  - id: gpt-4o
    model: gpt-4o
    provider_url: https://api.openai.com/v1/chat/completions
    api_key: "sk-..."
    default: true
    rate-limit: 200

  # Fallback model (Google AI)
  - id: gemini-2.0-flash-lite
    model: gemini-2.0-flash-lite
    provider_url: https://generativelanguage.googleapis.com/v1beta/openai/chat/completions
    api_key: "AIza..."
    fallback: true

  # Additional model via OpenRouter
  - id: gemini-pro-1.5
    model: google/gemini-pro-1.5
    provider_url: https://openrouter.ai/api/v1/chat/completions
    api_key: "sk-or-..."

  # Self-hosted model
  - id: local-llm
    model: mistral-7b
    provider_url: http://localhost:8000/v1/chat/completions
    api_key: ""

Using Models in Requests

Specifying a Model

Reference your configured model by its id in the llm_processing configuration:

{
  "llm_processing": {
    "model_id": "gpt-4o",
    "temperature": 0.0,
    "max_completion_tokens": 4096
  }
}

Default Model

If you don’t specify a model_id, the model marked with default: true is used automatically:

{
  "llm_processing": {
    "temperature": 0.0
  }
}

Fallback Strategy

Configure how Chunkr handles LLM failures:

fallback_strategy

enum

default:"Default"

None: No fallback, task fails on LLM error
Default: Use the model marked with fallback: true
Model("model-id"): Use a specific model as fallback

{
  "llm_processing": {
    "model_id": "gpt-4o",
    "fallback_strategy": "None"
  }
}

Rate Limiting

Rate limits prevent exceeding provider quotas:

models:
  - id: gpt-4o
    model: gpt-4o
    provider_url: https://api.openai.com/v1/chat/completions
    api_key: "sk-..."
    rate-limit: 200  # Max 200 requests per minute

Chunkr automatically queues requests when approaching the limit.

Best Practices

Never commit API keys to version control. Use environment variables or secure secret management.

Use different models for different purposes
- Fast model (e.g., GPT-4o-mini, Gemini Flash) for simple segments
- High-quality model (e.g., GPT-4o) for complex tables and formulas
Configure appropriate rate limits
- Check your provider’s rate limits
- Set conservative limits to avoid throttling
Always configure a fallback
- Ensures processing continues if primary model fails
- Use a reliable, fast model as fallback

Test your configuration

# Verify your models.yaml is valid
chunkr validate-models

Troubleshooting

Model Not Found

If you get a “model not found” error:

Verify the model id exists in your models.yaml
Check that LLM__MODELS_PATH points to the correct file
Restart the Chunkr service after updating models.yaml

Authentication Errors

Verify your API key is correct and not expired
Check that the API key has the necessary permissions
For self-hosted models, verify the endpoint is accessible

Rate Limit Errors

Lower the rate-limit value in your configuration
Upgrade your provider tier for higher limits
Configure a fallback model with higher limits

Getting Started

Core Concepts

Configuration

Deployment

Guides

Overview

Configuration File

Model Configuration Structure

Provider Examples

Complete Example

Using Models in Requests

Specifying a Model

Default Model

Fallback Strategy

Rate Limiting

Best Practices

Troubleshooting

Model Not Found

Authentication Errors

Rate Limit Errors

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Configuration

Deployment

Guides

Documentation Index

​Overview

​Configuration File

​Model Configuration Structure

​Provider Examples

​Complete Example

​Using Models in Requests

​Specifying a Model

​Default Model

​Fallback Strategy

​Rate Limiting

​Best Practices

​Troubleshooting

​Model Not Found

​Authentication Errors

​Rate Limit Errors

Build docs developers (and LLMs) love

Overview

Configuration File

Model Configuration Structure

Provider Examples

Complete Example

Using Models in Requests

Specifying a Model

Default Model

Fallback Strategy

Rate Limiting

Best Practices

Troubleshooting

Model Not Found

Authentication Errors

Rate Limit Errors