Skip to main content

Get Started in 5 Minutes

This guide will help you run your first video quality assessment using QualiVision’s pre-trained models.
Prerequisites: Make sure you have Python 3.8+ and CUDA installed (optional but recommended for GPU acceleration).

Setup

1

Clone the Repository

git clone https://github.com/RITIK-12/QualiVision.git
cd QualiVision
2

Install Dependencies

Install all required packages using pip:
pip install -r requirements.txt
This will install PyTorch, transformers, and other necessary libraries. See the installation guide for detailed setup options.
3

Prepare Your Data

Organize your test videos in the following structure:
data/
└── test/
    ├── test_labels.csv
    └── videos/
        ├── video001.mp4
        ├── video002.mp4
        └── ...
Your test_labels.csv should include:
video_name,Prompt,Traditional_MOS,Alignment_MOS,Aesthetic_MOS,Temporal_MOS,Overall_MOS
video001.mp4,"A cat playing piano",3.2,4.1,3.8,3.5,3.65

Run Your First Evaluation

Evaluate videos using the DOVER++ model with ConvNeXt 3D architecture:
python scripts/evaluate.py \
  --model dover \
  --checkpoint models/dover_best.pt \
  --data path/to/test/data
Model weights will be automatically downloaded on first use if not present locally.

Understanding the Output

After evaluation completes, QualiVision generates comprehensive results:
1

Predictions CSV

Contains quality scores for each video across all dimensions:
video_name,Traditional_MOS,Alignment_MOS,Aesthetic_MOS,Temporal_MOS,Overall_MOS
video001.mp4,3.15,4.08,3.82,3.47,3.63
video002.mp4,4.52,4.19,4.76,4.13,4.40
2

Metrics Report

If ground truth labels are provided, you’ll see performance metrics:
Evaluation Results:
------------------
  SROCC: 0.8542
  PLCC: 0.8731
  VQualA Score: 0.8637
3

Summary Report

A detailed text report with model configuration and prediction statistics:
QualiVision Model Evaluation Report
===================================
Model: DOVER
Checkpoint: models/dover_best.pt
Samples: 150

Prediction Statistics:
---------------------
  Min: 2.1234
  Max: 4.8765
  Mean: 3.6542
  Std: 0.7234

Example: Complete Evaluation Workflow

Here’s a complete example showing how to evaluate a test dataset:
# Evaluate DOVER++ model with custom output directory
python scripts/evaluate.py \
  --model dover \
  --checkpoint models/dover_best.pt \
  --data data/test \
  --output results/dover \
  --batch-size 4
QualiVision Model Evaluation
============================
Model: DOVER
Checkpoint: models/dover_best.pt
Test CSV: data/test/test_labels.csv
Test videos: data/test/videos
Output: results/dover
Device: cuda

Initializing DOVER Model Evaluator
Checkpoint: models/dover_best.pt
Device: cuda
Loading DOVER checkpoint: models/dover_best.pt
✓ DOVER checkpoint loaded
✓ Model loaded successfully
GPU Memory: 2.3 GB / 24.0 GB

Evaluating on test dataset:
  CSV: data/test/test_labels.csv
  Videos: data/test/videos
  Batch size: 4

Generating predictions...
Predicting: 100%|████████████████| 38/38 [02:15<00:00,  3.56s/it]
✓ Generated predictions for 150 samples

✓ Ground truth labels found, computing metrics

Evaluation Results:
------------------
  SROCC: 0.8542
  PLCC: 0.8731
  RMSE: 0.3421
  VQualA Score: 0.8637

✓ Predictions saved:
  CSV: results/dover/predictions_DOVER_20250304_143022.csv
  Excel: results/dover/predictions_DOVER_20250304_143022.xlsx
✓ Results saved: results/dover/results_DOVER_20250304_143022.json
✓ Summary report saved: results/dover/report_DOVER_20250304_143022.txt

✓ Evaluation completed successfully!
Final VQualA Score: 0.8637

Code Example: Using the Evaluator Class

You can also use QualiVision programmatically in your Python code:
import sys
from pathlib import Path

# Add src to path
sys.path.insert(0, str(Path(__file__).parent / "src"))

from src.models.dover_model import DOVERModel
import torch

# Initialize model
model = DOVERModel(
    dover_weights_path="models/DOVER_plus_plus.pth",
    text_encoder_name="BAAI/bge-large-en-v1.5",
    device="cuda"
)

# Load checkpoint
checkpoint = torch.load("models/dover_best.pt")
model.load_state_dict(checkpoint['model_state_dict'])
model.eval()

# Run inference
with torch.no_grad():
    outputs = model(video_tensor, text_prompts)
    quality_scores = outputs.cpu().numpy()
Here’s the core evaluation logic from scripts/evaluate.py:
def _predict_on_loader(self, test_loader) -> Dict[str, Any]:
    """Run predictions on data loader."""
    predictions = []
    targets = []
    video_names = []
    
    print("\nGenerating predictions...")
    
    with torch.no_grad():
        for i, batch in enumerate(tqdm(test_loader, desc="Predicting")):
            try:
                # Forward pass
                if self.model_type == 'dover':
                    outputs = self.model(batch['pixel_values_videos'], batch['prompts'])
                else:  # vjepa
                    outputs = self.model(batch['pixel_values_videos'], batch['text_emb'])
                
                # Extract predictions
                batch_predictions = outputs.cpu().numpy()
                batch_targets = batch['labels'].cpu().numpy()
                
                predictions.append(batch_predictions)
                targets.append(batch_targets)
                video_names.extend(batch['video_names'])
                
                # Memory cleanup
                del outputs
                if i % 10 == 0:
                    ultra_memory_cleanup()
            
            except Exception as e:
                print(f"⚠ Error processing batch {i}: {e}")
    
    # Concatenate all predictions
    predictions = np.concatenate(predictions, axis=0)
    targets = np.concatenate(targets, axis=0)
    
    return {
        'predictions': predictions,
        'targets': targets,
        'video_names': video_names
    }
From: scripts/evaluate.py:188-239

Advanced Options

Adjust batch size based on your GPU memory:
# Smaller batch for limited memory
python scripts/evaluate.py --model dover --checkpoint models/dover_best.pt \
  --data data/test --batch-size 2

# Larger batch for more memory
python scripts/evaluate.py --model dover --checkpoint models/dover_best.pt \
  --data data/test --batch-size 8
If you don’t have a GPU, you can run on CPU (slower):
python scripts/evaluate.py --model dover --checkpoint models/dover_best.pt \
  --data data/test --device cpu
If your data uses different file names:
python scripts/evaluate.py --model dover --checkpoint models/dover_best.pt \
  --data data/test \
  --csv-name my_labels.csv \
  --video-dir my_videos

Next Steps

Train Custom Models

Fine-tune QualiVision on your own dataset

Model Architecture

Deep dive into DOVER++ and V-JEPA2 architectures

API Reference

Explore the complete API documentation

Memory Optimization

Optimize memory usage for your hardware
Memory Requirements: DOVER++ requires ~12GB GPU memory, V-JEPA2 requires ~16GB. For limited memory, reduce batch size or use gradient accumulation.

Build docs developers (and LLMs) love