Installation Guide

Chunkr runs as a collection of Docker services orchestrated with Docker Compose. This guide covers installation for GPU-accelerated deployments, CPU-only systems, and Mac ARM devices.

Prerequisites

Install Docker

Install Docker Desktop or Docker Engine:

Docker Desktop: Download here
Docker Engine: For Linux servers

Verify installation:

docker --version
docker compose version

Install NVIDIA Container Toolkit (GPU Only)

For GPU acceleration, install the NVIDIA Container Toolkit:

Skip this step if you’re using CPU-only or Mac ARM deployment.

# Add the NVIDIA repository
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# Install nvidia-container-toolkit
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

# Restart Docker
sudo systemctl restart docker

Full installation guide

Verify GPU Access (GPU Only)

Test that Docker can access your GPU:

docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

You should see your GPU information.

Quick Installation

Clone the Repository

git clone https://github.com/lumina-ai-inc/chunkr.git
cd chunkr

Set Up Environment

# Copy environment template
cp .env.example .env

# Copy LLM models template
cp models.example.yaml models.yaml

Configure LLM Models

Edit models.yaml with your LLM configuration. See LLM Configuration below.

Start Services

# Recommended: Uses NVIDIA GPUs for faster processing
docker compose up -d

First startup downloads several GB of models and may take 10-15 minutes.

Verify Installation

Check that all services are running:

docker compose ps

All services should show “Up” status. Access:

Web UI: http://localhost:5173
API: http://localhost:8000
API Docs: http://localhost:8000/docs

LLM Configuration

Chunkr requires at least one LLM for vision-language model processing. You can configure multiple models with fallbacks.

Using models.yaml (Recommended)

The models.yaml file supports multiple LLM providers with advanced options:

models.yaml

models:
  # OpenAI Configuration
  - id: gpt-4o
    model: gpt-4o
    provider_url: https://api.openai.com/v1/chat/completions
    api_key: "sk-your-openai-key-here"
    default: true
    rate-limit: 200  # requests per minute (optional)

  # Google AI Studio Configuration
  - id: gemini-2.0-flash-lite
    model: gemini-2.0-flash-lite
    provider_url: https://generativelanguage.googleapis.com/v1beta/openai/chat/completions
    api_key: "your-google-ai-key-here"
    fallback: true

  # OpenRouter Configuration
  - id: gemini-pro-1.5
    model: google/gemini-pro-1.5
    provider_url: https://openrouter.ai/api/v1/chat/completions
    api_key: "your-openrouter-key-here"

  # Self-hosted LLM (Ollama, vLLM, etc.)
  - id: local-llm
    model: mistral-7b
    provider_url: http://localhost:11434/v1/chat/completions
    api_key: ""  # Leave empty if not required

Exactly one model must have default: true
Exactly one model must have fallback: true (can be the same as default)
Use id to reference models in API requests
rate-limit is optional and sets requests per minute cap

Using Environment Variables (Basic)

For simple single-LLM setups, use environment variables in .env:

.env

LLM__KEY=sk-your-api-key-here
LLM__MODEL=gpt-4o
LLM__URL=https://api.openai.com/v1/chat/completions

Environment variables are overridden by models.yaml. If you use models.yaml, remove or comment out the LLM__* variables.

Common LLM Providers

OpenAI
Google AI Studio
OpenRouter
Ollama (Local)
vLLM (Self-hosted)

- id: gpt-4o
  model: gpt-4o
  provider_url: https://api.openai.com/v1/chat/completions
  api_key: "sk-your-key-here"
  default: true

Get API Key | Documentation

- id: gemini-flash
  model: gemini-2.0-flash-lite
  provider_url: https://generativelanguage.googleapis.com/v1beta/openai/chat/completions
  api_key: "your-key-here"
  default: true

Get API Key | Documentation

- id: openrouter-model
  model: google/gemini-pro-1.5
  provider_url: https://openrouter.ai/api/v1/chat/completions
  api_key: "your-key-here"
  default: true

Get API Key | Browse Models

- id: ollama-llama
  model: llama3.2-vision
  provider_url: http://host.docker.internal:11434/v1/chat/completions
  api_key: ""
  default: true

Install Ollama, then:

ollama pull llama3.2-vision

Installation | OpenAI Compatibility

- id: vllm-model
  model: meta-llama/Llama-3.2-11B-Vision
  provider_url: http://your-vllm-server:8000/v1/chat/completions
  api_key: "your-key-if-required"
  default: true

vLLM Documentation

Service Architecture

Chunkr consists of multiple containerized services:

Core Services

server: Main API server (Rust/Actix-Web) on port 8000
task: Background worker pool (30 replicas for GPU, 10 for CPU)
web: React-based UI on port 5173
postgres: Database for metadata and task state
redis: Queue and cache for job processing
minio: S3-compatible object storage for files

Processing Services

segmentation: YOLO-based layout detection (6 replicas)
- GPU: Uses NVIDIA GPU acceleration
- CPU: Optimized for multi-core processing
ocr: DocTR OCR engine (3 replicas)
- GPU: CUDA-accelerated inference
- CPU: Uses smaller model variant

Supporting Services

keycloak: Authentication and user management (port 8080)
adminer: Database admin UI (port 8082)
nginx: Load balancer for processing services

Port Mappings

Service	Port	Description
Web UI	5173	React application
API	8000	REST API endpoint
Segmentation	8001	Layout detection service
OCR	8002	Text recognition service
Keycloak	8080	Authentication
Adminer	8082	Database UI
PostgreSQL	5432	Database
Redis	6379	Cache/Queue
MinIO	9000	Object storage
MinIO Console	9001	Storage admin UI

GPU vs CPU Performance

Performance comparison for a typical 10-page PDF:

Configuration	Processing Time	Hardware Requirements
GPU	~20-30 seconds	NVIDIA GPU with 8GB+ VRAM
CPU	~60-120 seconds	8+ CPU cores, 16GB+ RAM
Mac ARM	~45-90 seconds	M1/M2/M3 with 16GB+ RAM

GPU acceleration provides 3-4x speedup for segmentation and OCR operations.

Scaling Configuration

Adjusting Worker Replicas

Edit compose.yaml to scale processing:

compose.yaml

services:
  task:
    deploy:
      replicas: 30  # Reduce for less memory usage
  
  segmentation-backend:
    deploy:
      replicas: 6   # Scale based on GPU count
  
  ocr-backend:
    deploy:
      replicas: 3   # Scale based on available resources

Resource Limits

For production, add resource constraints:

services:
  task:
    deploy:
      replicas: 30
      resources:
        limits:
          memory: 2G
        reservations:
          memory: 1G

Stopping and Managing Services

# GPU deployment
docker compose down

# CPU deployment
docker compose -f compose.yaml -f compose.cpu.yaml down

# Mac ARM deployment
docker compose -f compose.yaml -f compose.cpu.yaml -f compose.mac.yaml down

Troubleshooting

Services won't start

Check Docker daemon:

sudo systemctl status docker

View startup errors:

docker compose logs

Common issues:

Port conflicts (8000, 5173, etc. already in use)
Insufficient memory (requires 16GB+ for full stack)
Missing .env or models.yaml files

GPU not detected

Verify GPU access:

docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu22.04 nvidia-smi

Check NVIDIA Container Toolkit:

nvidia-ctk --version

Restart Docker after toolkit install:

sudo systemctl restart docker

Out of memory errors

Reduce worker replicas in compose.yaml:

task:
  deploy:
    replicas: 10  # Down from 30

Use CPU deployment if GPU memory is limited:

docker compose -f compose.yaml -f compose.cpu.yaml up -d

Monitor resource usage:

docker stats

LLM connection failures

Test LLM endpoint manually:

curl -X POST https://api.openai.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"gpt-4o","messages":[{"role":"user","content":"test"}]}'

Check models.yaml syntax:

# Validate YAML
python -c "import yaml; yaml.safe_load(open('models.yaml'))"

View server logs:

docker compose logs -f server

Slow processing on Mac ARM

Ensure using Mac compose override:

docker compose -f compose.yaml -f compose.cpu.yaml -f compose.mac.yaml up -d

Reduce concurrent tasks:

Decrease replicas for task, segmentation-backend, ocr-backend
Process documents sequentially instead of parallel

Allocate more resources to Docker Desktop:

Open Docker Desktop → Settings → Resources
Increase CPUs to 8+ and Memory to 16GB+

Production Deployment

The default configuration is designed for development. For production:

Enable authentication: Configure Keycloak properly
Use HTTPS: Set up reverse proxy with SSL/TLS
Secure secrets: Use Docker secrets or environment encryption
Configure backups: Back up PostgreSQL and MinIO data
Monitor resources: Set up alerts for CPU, memory, disk usage
Rate limiting: Configure per-model rate limits in models.yaml
Task expiration: Set appropriate expires_in values

Environment Variables Reference

Key configuration options in .env:

# Database
PG__URL=postgresql://postgres:postgres@postgres:5432/chunkr

# Redis
REDIS__URL=redis://redis:6379

# Object Storage
AWS__ENDPOINT=http://minio:9000
AWS__ACCESS_KEY=minioadmin
AWS__SECRET_KEY=minioadmin

# LLM Configuration Path
LLM__MODELS_PATH=./models.yaml

# Worker URLs
WORKER__GENERAL_OCR_URL=http://ocr:8000
WORKER__SEGMENTATION_URL=http://segmentation:8000
WORKER__SERVER_URL=http://localhost:8000

# Authentication
AUTH__KEYCLOAK_URL=http://keycloak:8080

Next Steps

Quickstart

Make your first API request

API Reference

Explore the complete API

Configuration

Advanced configuration options

Examples

Code examples and use cases

Getting Started

Core Concepts

Configuration

Deployment

Guides

Installation

Installation Guide

Prerequisites

Quick Installation

LLM Configuration

Using models.yaml (Recommended)

Using Environment Variables (Basic)

Common LLM Providers

Service Architecture

Port Mappings

GPU vs CPU Performance

Scaling Configuration

Adjusting Worker Replicas

Resource Limits

Stopping and Managing Services

Troubleshooting

Production Deployment

Environment Variables Reference

Next Steps

Quickstart

API Reference

Configuration

Examples

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Configuration

Deployment

Guides

Documentation Index

​Installation Guide

​Prerequisites

​Quick Installation

​LLM Configuration

​Using models.yaml (Recommended)

​Using Environment Variables (Basic)

​Common LLM Providers

​Service Architecture

​Port Mappings

​GPU vs CPU Performance

​Scaling Configuration

​Adjusting Worker Replicas

​Resource Limits

​Stopping and Managing Services

​Troubleshooting

​Production Deployment

​Environment Variables Reference

​Next Steps

Quickstart

API Reference

Configuration

Examples

Build docs developers (and LLMs) love

Installation Guide

Prerequisites

Quick Installation

LLM Configuration

Using models.yaml (Recommended)

Using Environment Variables (Basic)

Common LLM Providers

Service Architecture

Port Mappings

GPU vs CPU Performance

Scaling Configuration

Adjusting Worker Replicas

Resource Limits

Stopping and Managing Services

Troubleshooting

Production Deployment

Environment Variables Reference

Next Steps