Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lumina-ai-inc/chunkr/llms.txt

Use this file to discover all available pages before exploring further.

Installation Guide

Chunkr runs as a collection of Docker services orchestrated with Docker Compose. This guide covers installation for GPU-accelerated deployments, CPU-only systems, and Mac ARM devices.

Prerequisites

1

Install Docker

Install Docker Desktop or Docker Engine:Verify installation:
docker --version
docker compose version
2

Install NVIDIA Container Toolkit (GPU Only)

For GPU acceleration, install the NVIDIA Container Toolkit:
Skip this step if you’re using CPU-only or Mac ARM deployment.
# Add the NVIDIA repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Restart Docker
sudo systemctl restart docker
Full installation guide
3

Verify GPU Access (GPU Only)

Test that Docker can access your GPU:
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
You should see your GPU information.

Quick Installation

1

Clone the Repository

git clone https://github.com/lumina-ai-inc/chunkr.git
cd chunkr
2

Set Up Environment

# Copy environment template
cp .env.example .env

# Copy LLM models template
cp models.example.yaml models.yaml
3

Configure LLM Models

Edit models.yaml with your LLM configuration. See LLM Configuration below.
4

Start Services

# Recommended: Uses NVIDIA GPUs for faster processing
docker compose up -d
First startup downloads several GB of models and may take 10-15 minutes.
5

Verify Installation

Check that all services are running:
docker compose ps
All services should show “Up” status. Access:

LLM Configuration

Chunkr requires at least one LLM for vision-language model processing. You can configure multiple models with fallbacks. The models.yaml file supports multiple LLM providers with advanced options:
models.yaml
models:
  # OpenAI Configuration
  - id: gpt-4o
    model: gpt-4o
    provider_url: https://api.openai.com/v1/chat/completions
    api_key: "sk-your-openai-key-here"
    default: true
    rate-limit: 200  # requests per minute (optional)

  # Google AI Studio Configuration
  - id: gemini-2.0-flash-lite
    model: gemini-2.0-flash-lite
    provider_url: https://generativelanguage.googleapis.com/v1beta/openai/chat/completions
    api_key: "your-google-ai-key-here"
    fallback: true

  # OpenRouter Configuration
  - id: gemini-pro-1.5
    model: google/gemini-pro-1.5
    provider_url: https://openrouter.ai/api/v1/chat/completions
    api_key: "your-openrouter-key-here"

  # Self-hosted LLM (Ollama, vLLM, etc.)
  - id: local-llm
    model: mistral-7b
    provider_url: http://localhost:11434/v1/chat/completions
    api_key: ""  # Leave empty if not required
  • Exactly one model must have default: true
  • Exactly one model must have fallback: true (can be the same as default)
  • Use id to reference models in API requests
  • rate-limit is optional and sets requests per minute cap

Using Environment Variables (Basic)

For simple single-LLM setups, use environment variables in .env:
.env
LLM__KEY=sk-your-api-key-here
LLM__MODEL=gpt-4o
LLM__URL=https://api.openai.com/v1/chat/completions
Environment variables are overridden by models.yaml. If you use models.yaml, remove or comment out the LLM__* variables.

Common LLM Providers

- id: gpt-4o
  model: gpt-4o
  provider_url: https://api.openai.com/v1/chat/completions
  api_key: "sk-your-key-here"
  default: true
Get API Key | Documentation

Service Architecture

Chunkr consists of multiple containerized services:
  • server: Main API server (Rust/Actix-Web) on port 8000
  • task: Background worker pool (30 replicas for GPU, 10 for CPU)
  • web: React-based UI on port 5173
  • postgres: Database for metadata and task state
  • redis: Queue and cache for job processing
  • minio: S3-compatible object storage for files
  • segmentation: YOLO-based layout detection (6 replicas)
    • GPU: Uses NVIDIA GPU acceleration
    • CPU: Optimized for multi-core processing
  • ocr: DocTR OCR engine (3 replicas)
    • GPU: CUDA-accelerated inference
    • CPU: Uses smaller model variant
  • keycloak: Authentication and user management (port 8080)
  • adminer: Database admin UI (port 8082)
  • nginx: Load balancer for processing services

Port Mappings

ServicePortDescription
Web UI5173React application
API8000REST API endpoint
Segmentation8001Layout detection service
OCR8002Text recognition service
Keycloak8080Authentication
Adminer8082Database UI
PostgreSQL5432Database
Redis6379Cache/Queue
MinIO9000Object storage
MinIO Console9001Storage admin UI

GPU vs CPU Performance

Performance comparison for a typical 10-page PDF:
ConfigurationProcessing TimeHardware Requirements
GPU~20-30 secondsNVIDIA GPU with 8GB+ VRAM
CPU~60-120 seconds8+ CPU cores, 16GB+ RAM
Mac ARM~45-90 secondsM1/M2/M3 with 16GB+ RAM
GPU acceleration provides 3-4x speedup for segmentation and OCR operations.

Scaling Configuration

Adjusting Worker Replicas

Edit compose.yaml to scale processing:
compose.yaml
services:
  task:
    deploy:
      replicas: 30  # Reduce for less memory usage
  
  segmentation-backend:
    deploy:
      replicas: 6   # Scale based on GPU count
  
  ocr-backend:
    deploy:
      replicas: 3   # Scale based on available resources

Resource Limits

For production, add resource constraints:
services:
  task:
    deploy:
      replicas: 30
      resources:
        limits:
          memory: 2G
        reservations:
          memory: 1G

Stopping and Managing Services

# GPU deployment
docker compose down

# CPU deployment
docker compose -f compose.yaml -f compose.cpu.yaml down

# Mac ARM deployment
docker compose -f compose.yaml -f compose.cpu.yaml -f compose.mac.yaml down

Troubleshooting

Check Docker daemon:
sudo systemctl status docker
View startup errors:
docker compose logs
Common issues:
  • Port conflicts (8000, 5173, etc. already in use)
  • Insufficient memory (requires 16GB+ for full stack)
  • Missing .env or models.yaml files
Verify GPU access:
docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi
Check NVIDIA Container Toolkit:
nvidia-ctk --version
Restart Docker after toolkit install:
sudo systemctl restart docker
Reduce worker replicas in compose.yaml:
task:
  deploy:
    replicas: 10  # Down from 30
Use CPU deployment if GPU memory is limited:
docker compose -f compose.yaml -f compose.cpu.yaml up -d
Monitor resource usage:
docker stats
Test LLM endpoint manually:
curl -X POST https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"test"}]}'
Check models.yaml syntax:
# Validate YAML
python -c "import yaml; yaml.safe_load(open('models.yaml'))"
View server logs:
docker compose logs -f server
Ensure using Mac compose override:
docker compose -f compose.yaml -f compose.cpu.yaml -f compose.mac.yaml up -d
Reduce concurrent tasks:
  • Decrease replicas for task, segmentation-backend, ocr-backend
  • Process documents sequentially instead of parallel
Allocate more resources to Docker Desktop:
  • Open Docker Desktop → Settings → Resources
  • Increase CPUs to 8+ and Memory to 16GB+

Production Deployment

The default configuration is designed for development. For production:
  1. Enable authentication: Configure Keycloak properly
  2. Use HTTPS: Set up reverse proxy with SSL/TLS
  3. Secure secrets: Use Docker secrets or environment encryption
  4. Configure backups: Back up PostgreSQL and MinIO data
  5. Monitor resources: Set up alerts for CPU, memory, disk usage
  6. Rate limiting: Configure per-model rate limits in models.yaml
  7. Task expiration: Set appropriate expires_in values

Environment Variables Reference

Key configuration options in .env:
# Database
PG__URL=postgresql://postgres:postgres@postgres:5432/chunkr

# Redis
REDIS__URL=redis://redis:6379

# Object Storage
AWS__ENDPOINT=http://minio:9000
AWS__ACCESS_KEY=minioadmin
AWS__SECRET_KEY=minioadmin

# LLM Configuration Path
LLM__MODELS_PATH=./models.yaml

# Worker URLs
WORKER__GENERAL_OCR_URL=http://ocr:8000
WORKER__SEGMENTATION_URL=http://segmentation:8000
WORKER__SERVER_URL=http://localhost:8000

# Authentication
AUTH__KEYCLOAK_URL=http://keycloak:8080

Next Steps

Quickstart

Make your first API request

API Reference

Explore the complete API

Configuration

Advanced configuration options

Examples

Code examples and use cases

Build docs developers (and LLMs) love