GPU Setup

Chunkr uses GPU acceleration to significantly improve document processing performance. This guide covers GPU configuration for NVIDIA GPUs.

Prerequisites

Hardware Requirements

NVIDIA GPU with CUDA support (compute capability 6.0+)
At least 6GB GPU memory (12GB+ recommended for production)
Multiple GPUs supported for increased throughput

Software Requirements

NVIDIA GPU drivers (version 470.x or later)
NVIDIA Container Toolkit
Docker Engine 19.03 or later
Docker Compose V2

Installing NVIDIA Container Toolkit

Add NVIDIA package repository

distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
  sudo tee /etc/apt/sources.list.d/nvidia-docker.list

Install nvidia-container-toolkit

sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit

Configure Docker daemon

sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Verify installation

Test GPU access from Docker:

docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

You should see your GPU(s) listed in the output.

GPU Configuration in Docker Compose

Chunkr’s default compose.yaml configures GPU access for ML services:

Segmentation Backend

segmentation-backend:
  build:
    context: .
    dockerfile: docker/segmentation/Dockerfile
  deploy:
    replicas: 6
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]
  volumes:
    - /dev/shm:/dev/shm
  environment:
    - MAX_BATCH_SIZE=4
    - BATCH_WAIT_TIME=0.2
    - OVERLAP_THRESHOLD=0.025
    - SCORE_THRESHOLD=0.2

Configuration details:

replicas: 6 - Six worker processes share available GPUs
count: all - All GPUs are available to workers
capabilities: [gpu] - Enables GPU support
/dev/shm - Shared memory for faster data transfer

OCR Backend

ocr-backend:
  build:
    context: .
    dockerfile: docker/doctr/Dockerfile
  deploy:
    replicas: 3
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: all
            capabilities: [gpu]
  volumes:
    - /dev/shm:/dev/shm

Configuration details:

replicas: 3 - Three OCR workers for parallel processing
Full GPU access for text recognition

The /dev/shm volume mount enables faster GPU memory transfers and is critical for performance.

Performance Tuning

Batch Size Configuration

Adjust batch sizes based on your GPU memory:

environment:
  - MAX_BATCH_SIZE=4        # Increase for GPUs with >8GB memory
  - BATCH_WAIT_TIME=0.2     # Time to wait for batch to fill

Recommended batch sizes:

6GB GPU: MAX_BATCH_SIZE=2
8GB GPU: MAX_BATCH_SIZE=4 (default)
12GB+ GPU: MAX_BATCH_SIZE=8

Replica Count Optimization

Adjust worker replicas based on GPU count and memory: Single GPU (8GB+):

segmentation-backend:
  deploy:
    replicas: 4

ocr-backend:
  deploy:
    replicas: 2

Multiple GPUs:

segmentation-backend:
  deploy:
    replicas: 6  # 3 per GPU for 2 GPUs

ocr-backend:
  deploy:
    replicas: 4  # 2 per GPU for 2 GPUs

Model Parameters

Fine-tune model inference parameters:

environment:
  - OVERLAP_THRESHOLD=0.025  # Lower = stricter duplicate detection
  - SCORE_THRESHOLD=0.2      # Lower = more detections, higher recall

Multi-GPU Configuration

To specify exact GPU allocation:

segmentation-backend:
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            device_ids: ['0', '1']  # Use first two GPUs
            capabilities: [gpu]

Or limit to a specific count:

devices:
  - driver: nvidia
    count: 2  # Use exactly 2 GPUs
    capabilities: [gpu]

Monitoring GPU Usage

Real-time Monitoring

Monitor GPU utilization in real-time:

watch -n 1 nvidia-smi

Per-Container GPU Stats

# List containers using GPU
docker ps --format "table {{.Names}}\t{{.Status}}"

# Monitor specific container
docker stats <container_name>

GPU Memory Usage

nvidia-smi --query-gpu=memory.used,memory.total --format=csv

Switching to CPU-Only Mode

If GPUs are unavailable or for testing, use CPU mode:

docker compose -f compose.yaml -f compose.cpu.yaml up -d

The CPU configuration removes GPU requirements and adjusts settings:

segmentation-backend:
  deploy:
    replicas: 6
    resources: {}  # No GPU reservation
  environment:
    - MAX_BATCH_SIZE=64
    - OMP_NUM_THREADS=12
    - MKL_NUM_THREADS=12
    - NUMEXPR_NUM_THREADS=12

CPU mode is significantly slower. Expect 5-10x longer processing times compared to GPU acceleration.

Troubleshooting

GPU not detected

Check NVIDIA driver:

nvidia-smi

If this fails, reinstall NVIDIA drivers. Verify Docker GPU access:

docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Check container toolkit:

nvidia-ctk --version

Out of memory errors

Reduce batch size:
```
environment:
  - MAX_BATCH_SIZE=2
```

Decrease replica count:

deploy:
  replicas: 2  # Fewer workers per GPU

Monitor GPU memory:
```
nvidia-smi dmon -s mu
```

Performance issues

Check GPU utilization - Should be >70% during processing
Verify shared memory - Ensure /dev/shm is mounted
Review batch settings - Optimize MAX_BATCH_SIZE and BATCH_WAIT_TIME
Check for GPU throttling - Monitor temperature with nvidia-smi

Docker Compose GPU errors

Error: “could not select device driver”

# Reconfigure Docker runtime
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Error: “failed to initialize NVML”

# Restart NVIDIA persistence daemon
sudo systemctl restart nvidia-persistenced

Best Practices

Monitor GPU temperature - Keep below 80°C for optimal performance
Use appropriate batch sizes - Balance throughput vs. memory usage
Scale replicas carefully - More replicas isn’t always faster
Regular driver updates - Keep NVIDIA drivers current
Shared memory mounting - Always include /dev/shm volume

Next Steps

Configure Environment Variables
Learn about Scaling GPU workers
Return to Docker Compose Deployment

Getting Started

Core Concepts

Configuration

Deployment

Guides

Prerequisites

Hardware Requirements

Software Requirements

Installing NVIDIA Container Toolkit

GPU Configuration in Docker Compose

Segmentation Backend

OCR Backend

Performance Tuning

Batch Size Configuration

Replica Count Optimization

Model Parameters

Multi-GPU Configuration

Monitoring GPU Usage

Real-time Monitoring

Per-Container GPU Stats

GPU Memory Usage

Switching to CPU-Only Mode

Troubleshooting

GPU not detected

Out of memory errors

Performance issues

Docker Compose GPU errors

Best Practices

Next Steps

Build docs developers (and LLMs) love

Getting Started

Core Concepts

Configuration

Deployment

Guides

Documentation Index

​Prerequisites

​Hardware Requirements

​Software Requirements

​Installing NVIDIA Container Toolkit

​GPU Configuration in Docker Compose

​Segmentation Backend

​OCR Backend

​Performance Tuning

​Batch Size Configuration

​Replica Count Optimization

​Model Parameters

​Multi-GPU Configuration

​Monitoring GPU Usage

​Real-time Monitoring

​Per-Container GPU Stats

​GPU Memory Usage

​Switching to CPU-Only Mode

​Troubleshooting

​GPU not detected

​Out of memory errors

​Performance issues

​Docker Compose GPU errors

​Best Practices

​Next Steps

Build docs developers (and LLMs) love

Prerequisites

Hardware Requirements

Software Requirements

Installing NVIDIA Container Toolkit

GPU Configuration in Docker Compose

Segmentation Backend

OCR Backend

Performance Tuning

Batch Size Configuration

Replica Count Optimization

Model Parameters

Multi-GPU Configuration

Monitoring GPU Usage

Real-time Monitoring

Per-Container GPU Stats

GPU Memory Usage

Switching to CPU-Only Mode

Troubleshooting

GPU not detected

Out of memory errors

Performance issues

Docker Compose GPU errors

Best Practices

Next Steps