Skip to main content
QualiVision Hero Light

Welcome to QualiVision

QualiVision is a state-of-the-art framework for video quality assessment specifically designed for AI-generated content. Built for the VQualA 2025 Challenge, it provides comprehensive quality evaluation across four critical dimensions.

Quick Start

Get started with QualiVision in minutes using pre-trained models

Installation

Set up your environment and install dependencies

Models Overview

Learn about DOVER++ and V-JEPA2 architectures

Quality Assessment Dimensions

QualiVision evaluates AI-generated videos across four critical quality dimensions:
Measures the coherence and smoothness of motion across video frames. Ensures that objects and scenes maintain logical continuity throughout the video sequence.
Evaluates visual quality including sharpness, clarity, and absence of artifacts. Assesses the technical quality of individual frames and overall video rendering.
Analyzes artistic and visual attractiveness including composition, color harmony, and overall visual appeal. Goes beyond technical quality to evaluate subjective beauty.
Determines how well the video content corresponds to the input text prompt. Critical for ensuring AI-generated videos match user intentions.

Two Powerful Models

QualiVision provides two complementary state-of-the-art architectures:

DOVER++

ConvNeXt 3D-based Architecture
  • Cross-modal attention between video and text
  • Quality-aware fusion mechanism
  • 640×640 resolution, 64 frames
  • ~120M parameters, ~12GB memory
  • Robust aesthetic/technical quality separation

V-JEPA2

Vision-JEPA2 ViT-Giant Architecture
  • Strategic layer freezing (85% frozen)
  • Discriminative learning rates
  • 384×384 resolution, 64 frames
  • ~1.1B parameters, ~16GB memory
  • Strong video representation learning

Key Features

Multi-Modal Fusion

Advanced cross-modal attention mechanisms that combine video features with text prompt embeddings using BGE-Large encoder

Hybrid Loss Function

Sophisticated loss combining smooth L1, ranking loss, and scale-aware components for robust training

Pre-trained Models

Ready-to-use checkpoints trained on VQualA 2025 Challenge dataset for immediate evaluation

Efficient Training

Strategic model freezing and memory optimization techniques enable training on consumer GPUs

Real-World Application

VQualA 2025 ChallengeQualiVision is our submission for the VQualA 2025 Challenge at ICCV 2025 Workshops. The framework is designed to handle the TaobaoVD-GC dataset containing thousands of AI-generated videos with comprehensive quality annotations.

Data Format

QualiVision works with structured video datasets:
video_name,Prompt,Traditional_MOS,Alignment_MOS,Aesthetic_MOS,Temporal_MOS,Overall_MOS
video001.mp4,"A cat playing piano",3.2,4.1,3.8,3.5,3.65
video002.mp4,"Sunset over mountains",4.5,4.2,4.8,4.1,4.4
Each video is evaluated across multiple Mean Opinion Score (MOS) dimensions on a 1-5 scale.

Technical Innovations

1

Quality-Aware Fusion

Dynamic attention weighting that adapts based on text content, allowing the model to focus on relevant quality aspects for different types of videos.
2

Strategic Layer Freezing

Freeze 85% of V-JEPA2 layers to maintain pre-trained knowledge while enabling efficient fine-tuning on domain-specific data.
3

Adaptive Loss Weighting

Dynamically adjusts loss component weights during training to balance different quality objectives.
4

Memory Optimization

Advanced techniques including gradient checkpointing and efficient frame sampling enable training large models on limited hardware.

Performance Metrics

Our models are evaluated using industry-standard video quality metrics:
  • SROCC: Spearman Rank Order Correlation Coefficient
  • PLCC: Pearson Linear Correlation Coefficient
  • VQualA Score: Custom challenge metric combining multiple quality dimensions
ModelParametersMemoryResolution
DOVER++~120M~12GB640×640
V-JEPA2~1.1B~16GB384×384

Next Steps

Try the Quickstart

Run your first evaluation in under 5 minutes

Install QualiVision

Set up your development environment
QualiVision requires Python 3.8+ and is optimized for CUDA-enabled GPUs. CPU evaluation is supported but significantly slower.

Build docs developers (and LLMs) love