Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/lumina-ai-inc/chunkr/llms.txt

Use this file to discover all available pages before exploring further.

Chunkr uses GPU acceleration to significantly improve document processing performance. This guide covers GPU configuration for NVIDIA GPUs.

Prerequisites

Hardware Requirements

  • NVIDIA GPU with CUDA support (compute capability 6.0+)
  • At least 6GB GPU memory (12GB+ recommended for production)
  • Multiple GPUs supported for increased throughput

Software Requirements

  • NVIDIA GPU drivers (version 470.x or later)
  • NVIDIA Container Toolkit
  • Docker Engine 19.03 or later
  • Docker Compose V2

Installing NVIDIA Container Toolkit

1

Add NVIDIA package repository

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list
2

Install nvidia-container-toolkit

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
3

Configure Docker daemon

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
4

Verify installation

Test GPU access from Docker:
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
You should see your GPU(s) listed in the output.

GPU Configuration in Docker Compose

Chunkr’s default compose.yaml configures GPU access for ML services:

Segmentation Backend

segmentation-backend:
  build:
    context: .
    dockerfile: docker/segmentation/Dockerfile
  deploy:
    replicas: 6
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]
  volumes:
    - /dev/shm:/dev/shm
  environment:
    - MAX_BATCH_SIZE=4
    - BATCH_WAIT_TIME=0.2
    - OVERLAP_THRESHOLD=0.025
    - SCORE_THRESHOLD=0.2
Configuration details:
  • replicas: 6 - Six worker processes share available GPUs
  • count: all - All GPUs are available to workers
  • capabilities: [gpu] - Enables GPU support
  • /dev/shm - Shared memory for faster data transfer

OCR Backend

ocr-backend:
  build:
    context: .
    dockerfile: docker/doctr/Dockerfile
  deploy:
    replicas: 3
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]
  volumes:
    - /dev/shm:/dev/shm
Configuration details:
  • replicas: 3 - Three OCR workers for parallel processing
  • Full GPU access for text recognition
The /dev/shm volume mount enables faster GPU memory transfers and is critical for performance.

Performance Tuning

Batch Size Configuration

Adjust batch sizes based on your GPU memory:
environment:
  - MAX_BATCH_SIZE=4        # Increase for GPUs with >8GB memory
  - BATCH_WAIT_TIME=0.2     # Time to wait for batch to fill
Recommended batch sizes:
  • 6GB GPU: MAX_BATCH_SIZE=2
  • 8GB GPU: MAX_BATCH_SIZE=4 (default)
  • 12GB+ GPU: MAX_BATCH_SIZE=8

Replica Count Optimization

Adjust worker replicas based on GPU count and memory: Single GPU (8GB+):
segmentation-backend:
  deploy:
    replicas: 4

ocr-backend:
  deploy:
    replicas: 2
Multiple GPUs:
segmentation-backend:
  deploy:
    replicas: 6  # 3 per GPU for 2 GPUs

ocr-backend:
  deploy:
    replicas: 4  # 2 per GPU for 2 GPUs

Model Parameters

Fine-tune model inference parameters:
environment:
  - OVERLAP_THRESHOLD=0.025  # Lower = stricter duplicate detection
  - SCORE_THRESHOLD=0.2      # Lower = more detections, higher recall

Multi-GPU Configuration

To specify exact GPU allocation:
segmentation-backend:
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            device_ids: ['0', '1']  # Use first two GPUs
            capabilities: [gpu]
Or limit to a specific count:
devices:
  - driver: nvidia
    count: 2  # Use exactly 2 GPUs
    capabilities: [gpu]

Monitoring GPU Usage

Real-time Monitoring

Monitor GPU utilization in real-time:
watch -n 1 nvidia-smi

Per-Container GPU Stats

# List containers using GPU
docker ps --format "table {{.Names}}\t{{.Status}}"

# Monitor specific container
docker stats <container_name>

GPU Memory Usage

nvidia-smi --query-gpu=memory.used,memory.total --format=csv

Switching to CPU-Only Mode

If GPUs are unavailable or for testing, use CPU mode:
docker compose -f compose.yaml -f compose.cpu.yaml up -d
The CPU configuration removes GPU requirements and adjusts settings:
segmentation-backend:
  deploy:
    replicas: 6
    resources: {}  # No GPU reservation
  environment:
    - MAX_BATCH_SIZE=64
    - OMP_NUM_THREADS=12
    - MKL_NUM_THREADS=12
    - NUMEXPR_NUM_THREADS=12
CPU mode is significantly slower. Expect 5-10x longer processing times compared to GPU acceleration.

Troubleshooting

GPU not detected

Check NVIDIA driver:
nvidia-smi
If this fails, reinstall NVIDIA drivers. Verify Docker GPU access:
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
Check container toolkit:
nvidia-ctk --version

Out of memory errors

  1. Reduce batch size:
    environment:
      - MAX_BATCH_SIZE=2
    
  2. Decrease replica count:
    deploy:
      replicas: 2  # Fewer workers per GPU
    
  3. Monitor GPU memory:
    nvidia-smi dmon -s mu
    

Performance issues

  1. Check GPU utilization - Should be >70% during processing
  2. Verify shared memory - Ensure /dev/shm is mounted
  3. Review batch settings - Optimize MAX_BATCH_SIZE and BATCH_WAIT_TIME
  4. Check for GPU throttling - Monitor temperature with nvidia-smi

Docker Compose GPU errors

Error: “could not select device driver”
# Reconfigure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Error: “failed to initialize NVML”
# Restart NVIDIA persistence daemon
sudo systemctl restart nvidia-persistenced

Best Practices

  1. Monitor GPU temperature - Keep below 80°C for optimal performance
  2. Use appropriate batch sizes - Balance throughput vs. memory usage
  3. Scale replicas carefully - More replicas isn’t always faster
  4. Regular driver updates - Keep NVIDIA drivers current
  5. Shared memory mounting - Always include /dev/shm volume

Next Steps

Build docs developers (and LLMs) love