QualiVision - AI Video Quality Assessment

Welcome to QualiVision

QualiVision is a state-of-the-art framework for video quality assessment specifically designed for AI-generated content. Built for the VQualA 2025 Challenge, it provides comprehensive quality evaluation across four critical dimensions.

Quick Start

Get started with QualiVision in minutes using pre-trained models

Installation

Set up your environment and install dependencies

Models Overview

Learn about DOVER++ and V-JEPA2 architectures

Quality Assessment Dimensions

QualiVision evaluates AI-generated videos across four critical quality dimensions:

Temporal Consistency

Measures the coherence and smoothness of motion across video frames. Ensures that objects and scenes maintain logical continuity throughout the video sequence.

Image Fidelity

Evaluates visual quality including sharpness, clarity, and absence of artifacts. Assesses the technical quality of individual frames and overall video rendering.

Aesthetic Appeal

Analyzes artistic and visual attractiveness including composition, color harmony, and overall visual appeal. Goes beyond technical quality to evaluate subjective beauty.

Text-Video Alignment

Determines how well the video content corresponds to the input text prompt. Critical for ensuring AI-generated videos match user intentions.

Two Powerful Models

QualiVision provides two complementary state-of-the-art architectures:

DOVER++

ConvNeXt 3D-based Architecture

Cross-modal attention between video and text
Quality-aware fusion mechanism
640×640 resolution, 64 frames
~120M parameters, ~12GB memory
Robust aesthetic/technical quality separation

V-JEPA2

Vision-JEPA2 ViT-Giant Architecture

Strategic layer freezing (85% frozen)
Discriminative learning rates
384×384 resolution, 64 frames
~1.1B parameters, ~16GB memory
Strong video representation learning

Key Features

Multi-Modal Fusion

Advanced cross-modal attention mechanisms that combine video features with text prompt embeddings using BGE-Large encoder

Hybrid Loss Function

Sophisticated loss combining smooth L1, ranking loss, and scale-aware components for robust training

Pre-trained Models

Ready-to-use checkpoints trained on VQualA 2025 Challenge dataset for immediate evaluation

Efficient Training

Strategic model freezing and memory optimization techniques enable training on consumer GPUs

Real-World Application

VQualA 2025 ChallengeQualiVision is our submission for the VQualA 2025 Challenge at ICCV 2025 Workshops. The framework is designed to handle the TaobaoVD-GC dataset containing thousands of AI-generated videos with comprehensive quality annotations.

Data Format

QualiVision works with structured video datasets:

video_name,Prompt,Traditional_MOS,Alignment_MOS,Aesthetic_MOS,Temporal_MOS,Overall_MOS
video001.mp4,"A cat playing piano",3.2,4.1,3.8,3.5,3.65
video002.mp4,"Sunset over mountains",4.5,4.2,4.8,4.1,4.4

Each video is evaluated across multiple Mean Opinion Score (MOS) dimensions on a 1-5 scale.

Technical Innovations

Quality-Aware Fusion

Dynamic attention weighting that adapts based on text content, allowing the model to focus on relevant quality aspects for different types of videos.

Strategic Layer Freezing

Freeze 85% of V-JEPA2 layers to maintain pre-trained knowledge while enabling efficient fine-tuning on domain-specific data.

Adaptive Loss Weighting

Dynamically adjusts loss component weights during training to balance different quality objectives.

Memory Optimization

Advanced techniques including gradient checkpointing and efficient frame sampling enable training large models on limited hardware.

Performance Metrics

Our models are evaluated using industry-standard video quality metrics:

SROCC: Spearman Rank Order Correlation Coefficient
PLCC: Pearson Linear Correlation Coefficient
VQualA Score: Custom challenge metric combining multiple quality dimensions

Model	Parameters	Memory	Resolution
DOVER++	~120M	~12GB	640×640
V-JEPA2	~1.1B	~16GB	384×384

Next Steps

Try the Quickstart

Run your first evaluation in under 5 minutes

Install QualiVision

Set up your development environment

QualiVision requires Python 3.8+ and is optimized for CUDA-enabled GPUs. CPU evaluation is supported but significantly slower.

Get Started

Core Concepts

Guides

QualiVision - AI Video Quality Assessment

Welcome to QualiVision

Quick Start

Installation

Models Overview

Quality Assessment Dimensions

Two Powerful Models

DOVER++

V-JEPA2

Key Features

Multi-Modal Fusion

Hybrid Loss Function

Pre-trained Models

Efficient Training

Real-World Application

Data Format

Technical Innovations

Performance Metrics

Next Steps

Try the Quickstart

Install QualiVision

Build docs developers (and LLMs) love

Get Started

Core Concepts

Guides

​Welcome to QualiVision

Quick Start

Installation

Models Overview

​Quality Assessment Dimensions

​Two Powerful Models

DOVER++

V-JEPA2

​Key Features

Multi-Modal Fusion

Hybrid Loss Function

Pre-trained Models

Efficient Training

​Real-World Application

​Data Format

​Technical Innovations

​Performance Metrics

​Next Steps

Try the Quickstart

Install QualiVision

Build docs developers (and LLMs) love

Welcome to QualiVision

Quality Assessment Dimensions

Two Powerful Models

Key Features

Real-World Application

Data Format

Technical Innovations

Performance Metrics

Next Steps