Deep-Live-Cam ships with sensible defaults that work on most machines, but getting the best throughput requires matching the execution provider, thread count, and memory limit to your actual hardware. This guide explains exactly how those defaults are derived, when to override them, and what platform-specific constraints to watch for.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/hacksider/Deep-Live-Cam/llms.txt
Use this file to discover all available pages before exploring further.
Execution providers
The execution provider controls which hardware ONNX Runtime uses to run the face models. Deep-Live-Cam picks a provider automatically at startup usingsuggest_default_execution_provider in modules/core.py, which probes the list returned by onnxruntime.get_available_providers() and selects the first match in priority order.
Auto-detection order: cuda → rocm → coreml → dml → cpu
- NVIDIA (CUDA)
- Apple Silicon (CoreML)
- AMD (ROCm)
- DirectML (Windows AMD/Intel)
- CPU
Requires On CUDA, the in-memory FFmpeg pipeline also promotes
onnxruntime-gpu and CUDA Toolkit 12.8.0 with cuDNN v8.9.7.libx264 to h264_nvenc and libx265 to hevc_nvenc automatically, offloading encoding to the NVENC hardware unit.When any
--execution-provider value is given on the command line, OMP_NUM_THREADS is set to 6 before PyTorch is imported. This is done at module-load time in modules/core.py because OpenMP threads must be pinned before the first import torch.Thread count
The--execution-threads flag controls the ThreadPoolExecutor worker count in multi_process_frame. The default is chosen by suggest_execution_threads based on the active provider:
| Provider | Default threads | Rationale |
|---|---|---|
| DML | 1 | DirectML serializes ONNX sessions; extra threads stall on the DML lock. |
| ROCm | 1 | Same serialization behavior as DML. |
| CUDA | 2 | Two threads keep the GPU fed while I/O occurs on the other thread. |
| CPU | max(4, min(cpu_count − 2, 16)) | Uses most available cores, reserving 2 for the OS and FFmpeg. |
Memory limits
--max-memory sets a hard RAM ceiling enforced by limit_resources in modules/core.py after argument parsing completes.
Platform defaults:
- macOS: 4 GB — conservative default to avoid pressure on unified memory shared with the GPU.
- All other platforms: 16 GB.
TensorFlow GPU memory growth
If TensorFlow is installed,limit_resources also enables memory growth for every visible GPU device:
CUDA cache management
After each frame processor finishes its pass,release_resources in modules/core.py flears the PyTorch CUDA allocator cache:
face_swapper and face_enhancer), keeping peak VRAM usage predictable across a long video.
In-memory vs. disk-based pipeline
By default, Deep-Live-Cam avoids writing per-frame PNG files to disk. Instead it pipes raw BGR24 frames directly from an FFmpeg decoder into Python, processes them, and pushes them into a second FFmpeg encoder process. This eliminates the largest I/O bottleneck in earlier versions. The disk-based fallback is used automatically when:--map-facesis active (multi-face mapping requires random frame access).- The FFmpeg pipe pipeline fails (e.g., the hardware encoder is unavailable).
Tuning recommendations by hardware tier
High-end NVIDIA GPU (RTX 3080+, 10 GB+ VRAM)
face_enhancer for maximum quality. The NVENC hardware encoder handles output encoding in parallel with ONNX inference.Mid-range NVIDIA GPU (RTX 3060, 6–8 GB VRAM)
face_enhancer_gpen256 instead of GFPGAN to stay within VRAM budget while still improving output quality.Apple Silicon (M1/M2/M3)