Documentation Index
Fetch the complete documentation index at: https://mintlify.com/lumina-ai-inc/chunkr/llms.txt
Use this file to discover all available pages before exploring further.
Installation Guide
Chunkr runs as a collection of Docker services orchestrated with Docker Compose. This guide covers installation for GPU-accelerated deployments, CPU-only systems, and Mac ARM devices.Prerequisites
Install Docker
Install Docker Desktop or Docker Engine:
- Docker Desktop: Download here
- Docker Engine: For Linux servers
Install NVIDIA Container Toolkit (GPU Only)
For GPU acceleration, install the NVIDIA Container Toolkit:Full installation guide
Skip this step if you’re using CPU-only or Mac ARM deployment.
Quick Installation
Configure LLM Models
Edit
models.yaml with your LLM configuration. See LLM Configuration below.Verify Installation
Check that all services are running:All services should show “Up” status. Access:
- Web UI: http://localhost:5173
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
LLM Configuration
Chunkr requires at least one LLM for vision-language model processing. You can configure multiple models with fallbacks.Using models.yaml (Recommended)
Themodels.yaml file supports multiple LLM providers with advanced options:
models.yaml
- Exactly one model must have
default: true - Exactly one model must have
fallback: true(can be the same as default) - Use
idto reference models in API requests rate-limitis optional and sets requests per minute cap
Using Environment Variables (Basic)
For simple single-LLM setups, use environment variables in.env:
.env
Common LLM Providers
- OpenAI
- Google AI Studio
- OpenRouter
- Ollama (Local)
- vLLM (Self-hosted)
Service Architecture
Chunkr consists of multiple containerized services:Core Services
Core Services
- server: Main API server (Rust/Actix-Web) on port 8000
- task: Background worker pool (30 replicas for GPU, 10 for CPU)
- web: React-based UI on port 5173
- postgres: Database for metadata and task state
- redis: Queue and cache for job processing
- minio: S3-compatible object storage for files
Processing Services
Processing Services
- segmentation: YOLO-based layout detection (6 replicas)
- GPU: Uses NVIDIA GPU acceleration
- CPU: Optimized for multi-core processing
- ocr: DocTR OCR engine (3 replicas)
- GPU: CUDA-accelerated inference
- CPU: Uses smaller model variant
Supporting Services
Supporting Services
- keycloak: Authentication and user management (port 8080)
- adminer: Database admin UI (port 8082)
- nginx: Load balancer for processing services
Port Mappings
| Service | Port | Description |
|---|---|---|
| Web UI | 5173 | React application |
| API | 8000 | REST API endpoint |
| Segmentation | 8001 | Layout detection service |
| OCR | 8002 | Text recognition service |
| Keycloak | 8080 | Authentication |
| Adminer | 8082 | Database UI |
| PostgreSQL | 5432 | Database |
| Redis | 6379 | Cache/Queue |
| MinIO | 9000 | Object storage |
| MinIO Console | 9001 | Storage admin UI |
GPU vs CPU Performance
Performance comparison for a typical 10-page PDF:| Configuration | Processing Time | Hardware Requirements |
|---|---|---|
| GPU | ~20-30 seconds | NVIDIA GPU with 8GB+ VRAM |
| CPU | ~60-120 seconds | 8+ CPU cores, 16GB+ RAM |
| Mac ARM | ~45-90 seconds | M1/M2/M3 with 16GB+ RAM |
GPU acceleration provides 3-4x speedup for segmentation and OCR operations.
Scaling Configuration
Adjusting Worker Replicas
Editcompose.yaml to scale processing:
compose.yaml
Resource Limits
For production, add resource constraints:Stopping and Managing Services
Troubleshooting
Services won't start
Services won't start
Check Docker daemon:View startup errors:Common issues:
- Port conflicts (8000, 5173, etc. already in use)
- Insufficient memory (requires 16GB+ for full stack)
- Missing
.envormodels.yamlfiles
GPU not detected
GPU not detected
Verify GPU access:Check NVIDIA Container Toolkit:Restart Docker after toolkit install:
Out of memory errors
Out of memory errors
Reduce worker replicas in Use CPU deployment if GPU memory is limited:Monitor resource usage:
compose.yaml:LLM connection failures
LLM connection failures
Test LLM endpoint manually:Check models.yaml syntax:View server logs:
Slow processing on Mac ARM
Slow processing on Mac ARM
Ensure using Mac compose override:Reduce concurrent tasks:
- Decrease
replicasfortask,segmentation-backend,ocr-backend - Process documents sequentially instead of parallel
- Open Docker Desktop → Settings → Resources
- Increase CPUs to 8+ and Memory to 16GB+
Production Deployment
Environment Variables Reference
Key configuration options in.env:
Next Steps
Quickstart
Make your first API request
API Reference
Explore the complete API
Configuration
Advanced configuration options
Examples
Code examples and use cases