Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lumina-ai-inc/chunkr/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Chunkr supports any OpenAI-compatible API for LLM processing. You can configure multiple models with different providers, set rate limits, and specify default and fallback models.

Configuration File

LLM models are configured in a models.yaml file. Copy models.example.yaml to get started:
cp models.example.yaml models.yaml
Set the path to your models configuration file using the environment variable:
LLM__MODELS_PATH=./models.yaml

Model Configuration Structure

Each model in the configuration requires:
id
string
required
Unique identifier for the model. Use this ID in POST and PATCH requests to reference the model.
model
string
required
The model name/identifier used by the provider (e.g., gpt-4o, gemini-2.0-flash-lite).
provider_url
string
required
The API endpoint URL for the provider’s chat completions service.
api_key
string
required
Your API key for authentication with the provider.
default
boolean
Mark one model as the default. This model is used when no specific model is requested.
fallback
boolean
Mark one model as the fallback. Used when FallbackStrategy::Default is configured.
rate-limit
integer
Optional rate limit in requests per minute for this model.
You must configure exactly one default model and one fallback model (they can be the same model).

Provider Examples

models.yaml
models:
  - id: gpt-4o
    model: gpt-4o
    provider_url: https://api.openai.com/v1/chat/completions
    api_key: "your_openai_api_key_here"
    default: true
    rate-limit: 200
Rate limits help prevent API quota exhaustion. OpenAI has different rate limits based on your tier.

Complete Example

Here’s a complete models.yaml with multiple providers:
models.yaml
models:
  # Primary model (OpenAI)
  - id: gpt-4o
    model: gpt-4o
    provider_url: https://api.openai.com/v1/chat/completions
    api_key: "sk-..."
    default: true
    rate-limit: 200

  # Fallback model (Google AI)
  - id: gemini-2.0-flash-lite
    model: gemini-2.0-flash-lite
    provider_url: https://generativelanguage.googleapis.com/v1beta/openai/chat/completions
    api_key: "AIza..."
    fallback: true

  # Additional model via OpenRouter
  - id: gemini-pro-1.5
    model: google/gemini-pro-1.5
    provider_url: https://openrouter.ai/api/v1/chat/completions
    api_key: "sk-or-..."

  # Self-hosted model
  - id: local-llm
    model: mistral-7b
    provider_url: http://localhost:8000/v1/chat/completions
    api_key: ""

Using Models in Requests

Specifying a Model

Reference your configured model by its id in the llm_processing configuration:
{
  "llm_processing": {
    "model_id": "gpt-4o",
    "temperature": 0.0,
    "max_completion_tokens": 4096
  }
}

Default Model

If you don’t specify a model_id, the model marked with default: true is used automatically:
{
  "llm_processing": {
    "temperature": 0.0
  }
}

Fallback Strategy

Configure how Chunkr handles LLM failures:
fallback_strategy
enum
default:"Default"
  • None: No fallback, task fails on LLM error
  • Default: Use the model marked with fallback: true
  • Model("model-id"): Use a specific model as fallback
{
  "llm_processing": {
    "model_id": "gpt-4o",
    "fallback_strategy": "None"
  }
}

Rate Limiting

Rate limits prevent exceeding provider quotas:
models:
  - id: gpt-4o
    model: gpt-4o
    provider_url: https://api.openai.com/v1/chat/completions
    api_key: "sk-..."
    rate-limit: 200  # Max 200 requests per minute
Chunkr automatically queues requests when approaching the limit.

Best Practices

Never commit API keys to version control. Use environment variables or secure secret management.
  1. Use different models for different purposes
    • Fast model (e.g., GPT-4o-mini, Gemini Flash) for simple segments
    • High-quality model (e.g., GPT-4o) for complex tables and formulas
  2. Configure appropriate rate limits
    • Check your provider’s rate limits
    • Set conservative limits to avoid throttling
  3. Always configure a fallback
    • Ensures processing continues if primary model fails
    • Use a reliable, fast model as fallback
  4. Test your configuration
    # Verify your models.yaml is valid
    chunkr validate-models
    

Troubleshooting

Model Not Found

If you get a “model not found” error:
  1. Verify the model id exists in your models.yaml
  2. Check that LLM__MODELS_PATH points to the correct file
  3. Restart the Chunkr service after updating models.yaml

Authentication Errors

  1. Verify your API key is correct and not expired
  2. Check that the API key has the necessary permissions
  3. For self-hosted models, verify the endpoint is accessible

Rate Limit Errors

  1. Lower the rate-limit value in your configuration
  2. Upgrade your provider tier for higher limits
  3. Configure a fallback model with higher limits

Build docs developers (and LLMs) love