Skip to main content

System Requirements

Before installing QualiVision, ensure your system meets these requirements:

Python Version

Python 3.8 or higher requiredVerify: python --version

GPU Support

CUDA-capable GPU recommended
  • DOVER++: ~12GB VRAM
  • V-JEPA2: ~16GB VRAM

Storage

~10GB for models and dependenciesAdditional space for datasets

Operating System

Linux, macOS, or WindowsLinux recommended for best performance
While CPU-only execution is supported, GPU acceleration is strongly recommended for practical evaluation and training. CPU inference can be 10-50x slower.

Installation Methods

1

Clone the Repository

First, clone the QualiVision repository from GitHub:
git clone https://github.com/RITIK-12/QualiVision.git
cd QualiVision
This will download the complete framework including:
  • Model implementations
  • Training and evaluation scripts
  • Configuration files
  • Example notebooks
2

Install Dependencies

Install all dependencies using pip:
pip install -r requirements.txt
This is the recommended method for most users.
3

Verify Installation

Verify that QualiVision is installed correctly:
python -c "import torch; print(f'PyTorch: {torch.__version__}')"
python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}')"
python -c "import transformers; print(f'Transformers: {transformers.__version__}')"
Expected output:
PyTorch: 2.0.0+cu118
CUDA Available: True
Transformers: 4.30.0

Dependencies

QualiVision requires the following core dependencies:
torch>=2.0.0
torchvision>=0.15.0
torchaudio>=2.0.0
PyTorch and related libraries for model training and inference.
transformers>=4.30.0
sentence-transformers>=2.2.0
For text encoding using BGE-Large and other language models.
timm>=0.9.0
opencv-python>=4.7.0
decord>=0.6.0
Pillow>=9.0.0
Image and video processing libraries.
scipy>=1.10.0
scikit-learn>=1.2.0
numpy>=1.24.0
pandas>=1.5.0
Numerical computing and data processing.
accelerate>=0.20.0
flash-attn>=2.0.0
xformers>=0.0.20
einops>=0.6.0
Memory optimization and training acceleration.
tqdm>=4.65.0
wandb>=0.15.0
matplotlib>=3.6.0
seaborn>=0.11.0
pyyaml>=6.0
Progress bars, experiment tracking, and visualization.
jupyterlab>=4.0.0
ipywidgets>=8.0.0
datasets>=2.12.0
For interactive development and data loading.
From requirements.txt in the source repository:
torch>=2.0.0
torchvision>=0.15.0
torchaudio>=2.0.0
transformers>=4.30.0
sentence-transformers>=2.2.0
timm>=0.9.0
scipy>=1.10.0
scikit-learn>=1.2.0
opencv-python>=4.7.0
decord>=0.6.0
pandas>=1.5.0
numpy>=1.24.0
Pillow>=9.0.0
tqdm>=4.65.0
wandb>=0.15.0
accelerate>=0.20.0
datasets>=2.12.0
einops>=0.6.0
flash-attn>=2.0.0
xformers>=0.0.20
pyyaml>=6.0
argparse
matplotlib>=3.6.0
seaborn>=0.11.0
jupyterlab>=4.0.0
ipywidgets>=8.0.0

GPU Setup

For NVIDIA GPUs, ensure CUDA is properly installed:
1

Check CUDA Version

nvidia-smi
This shows your GPU and CUDA driver version.
2

Install PyTorch with CUDA

Visit PyTorch Get Started and select your CUDA version:
# For CUDA 11.8
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# For CUDA 12.1
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
3

Verify CUDA

import torch
print(f"CUDA Available: {torch.cuda.is_available()}")
print(f"CUDA Version: {torch.version.cuda}")
print(f"Device Count: {torch.cuda.device_count()}")
print(f"Current Device: {torch.cuda.current_device()}")
print(f"Device Name: {torch.cuda.get_device_name(0)}")

Troubleshooting

If you encounter OOM errors:
  1. Reduce batch size:
    python scripts/evaluate.py --model dover --batch-size 1 --data data/test
    
  2. Use gradient checkpointing (for training)
  3. Close other GPU applications
  4. Use CPU if necessary:
    python scripts/evaluate.py --model dover --device cpu --data data/test
    
If flash-attn fails to install:
  1. Skip flash attention (optional dependency):
    pip install -r requirements.txt --no-deps
    pip install <packages except flash-attn>
    
  2. Or install from source:
    pip install flash-attn --no-build-isolation
    
  3. Flash attention is optional for inference, mainly benefits training speed
If video loading fails:
  1. Ensure FFmpeg is installed:
    # Ubuntu/Debian
    sudo apt-get install ffmpeg
    
    # macOS
    brew install ffmpeg
    
    # Windows
    # Download from https://ffmpeg.org/download.html
    
  2. Test decord:
    import decord
    vr = decord.VideoReader('path/to/video.mp4')
    print(f"Frames: {len(vr)}")
    
If model downloads are slow or fail:
  1. Use Hugging Face mirror (China):
    export HF_ENDPOINT=https://hf-mirror.com
    
  2. Pre-download models:
    from transformers import AutoModel
    AutoModel.from_pretrained("BAAI/bge-large-en-v1.5")
    
  3. Set cache directory:
    export HF_HOME=/path/to/cache
    

Verify Installation

Run this comprehensive verification script:
import sys
import torch
import transformers
import torchvision
import decord
import cv2
import numpy as np
import pandas as pd

print("QualiVision Installation Check")
print("=" * 50)

# Python version
print(f"Python: {sys.version}")

# PyTorch
print(f"\nPyTorch: {torch.__version__}")
print(f"CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA Version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

# Transformers
print(f"\nTransformers: {transformers.__version__}")

# Other key libraries
print(f"\nTorchvision: {torchvision.__version__}")
print(f"OpenCV: {cv2.__version__}")
print(f"NumPy: {np.__version__}")
print(f"Pandas: {pd.__version__}")
print(f"Decord: {decord.__version__}")

print("\n✓ All dependencies installed successfully!")
Save as check_install.py and run:
python check_install.py

Next Steps

Quick Start

Run your first evaluation with pre-trained models

Data Preparation

Learn how to structure your dataset

Model Configuration

Customize model settings for your use case

Training Guide

Fine-tune models on custom datasets
Getting Help: If you encounter issues not covered here, please:

Build docs developers (and LLMs) love