QualiVision uses a centralized configuration system defined in src/config/config.py. This page documents all available configuration dictionaries and their parameters.
Model Configurations
Configuration for the DOVER++ model.from src.config.config import DOVER_CONFIG
# Access configuration
model_name = DOVER_CONFIG["model_name"]
video_res = DOVER_CONFIG["video_resolution"]
Parameters
Name of the DOVER model variant
video_resolution
tuple[int, int]
default:"(640, 640)"
Input video resolution as (width, height) in pixels
Number of frames to sample from each video
text_encoder
string
default:"BAAI/bge-large-en-v1.5"
HuggingFace model identifier for the text encoder
Dimensionality of DOVER feature embeddings
Dimensionality of text embeddings
Hidden layer dimensionality for fusion layers
URL to download pretrained DOVER++ weights
Number of videos per batch during training
Initial learning rate for optimization
Total number of training epochs
gradient_accumulation_steps
Number of steps to accumulate gradients before updating weights
Effective batch size (batch_size × gradient_accumulation_steps)
Configuration for the V-JEPA2 model with discriminative learning rates.from src.config.config import VJEPA_CONFIG
# Access configuration
model_name = VJEPA_CONFIG["model_name"]
freeze_ratio = VJEPA_CONFIG["freeze_ratio"]
discriminative_lr = VJEPA_CONFIG["discriminative_lr"]
Parameters
Name of the V-JEPA model variant
video_resolution
tuple[int, int]
default:"(384, 384)"
Input video resolution as (width, height) in pixels
Number of frames to sample from each video
text_encoder
string
default:"BAAI/bge-large-en-v1.5"
HuggingFace model identifier for the text encoder
video_encoder
string
default:"facebook/vjepa-vit-giant-p16"
HuggingFace model identifier for the V-JEPA video encoder
Proportion of bottom layers to freeze (0.85 = freeze bottom 85% of layers)
Dimensionality of V-JEPA video embeddings
Dimensionality of text embeddings
Hidden layer dimensionality for fusion layers
Number of videos per batch during training
Base learning rate for optimization
Total number of training epochs
gradient_accumulation_steps
Number of steps to accumulate gradients before updating weights
Effective batch size (batch_size × gradient_accumulation_steps)
discriminative_lr
dict
default:"{\"text\": 0.1, \"video\": 0.5, \"head\": 2.0}"
Learning rate multipliers for different model components:
text: 0.1 (10% of base LR for text encoder)
video: 0.5 (50% of base LR for video encoder)
head: 2.0 (200% of base LR for prediction head)
Training Configuration
Global training parameters that apply to all models.
from src.config.config import TRAINING_CONFIG
# Example: Configure training
device = TRAINING_CONFIG["device"]
mixed_precision = TRAINING_CONFIG["mixed_precision"]
Device to use for training (‘cuda’ or ‘cpu’)
Enable automatic mixed precision (AMP) training for faster computation
Maximum gradient norm for gradient clipping
Number of warmup steps for learning rate scheduler
Log training metrics every N steps
Save model checkpoint every N steps
Run evaluation every N steps
Maximum gradient norm for clipping (same as gradient_clipping)
Weight decay (L2 regularization) coefficient
Epsilon parameter for Adam optimizer
adam_betas
tuple[float, float]
default:"(0.9, 0.999)"
Beta parameters for Adam optimizer (beta1, beta2)
Learning rate scheduler type (‘cosine’, ‘linear’, etc.)
Number of worker processes for data loading
Pin memory in DataLoader for faster CPU-to-GPU transfer
Keep worker processes alive between epochs
Loss Configuration
Configuration for the custom loss function combining smooth L1, ranking, and scale-aware components.
from src.config.config import LOSS_CONFIG
# Example: Configure loss weights
loss_weights = LOSS_CONFIG["loss_weights"]
alpha = loss_weights["alpha"] # 0.7
beta = loss_weights["beta"] # 0.3
gamma = loss_weights["gamma"] # 0.1
Beta parameter for smooth L1 loss (transition point between L1 and L2)
Margin for pairwise ranking loss
Weights for different quality ranges:
low_quality: 1.5 (for MOS < 2.5)
high_quality: 1.5 (for MOS > 4.0)
normal: 1.0 (for 2.5 ≤ MOS ≤ 4.0)
loss_weights
dict
default:"{\"alpha\": 0.7, \"beta\": 0.3, \"gamma\": 0.1}"
Weights for combining different loss components:
alpha: 0.7 (smooth L1 loss weight)
beta: 0.3 (ranking loss weight)
gamma: 0.1 (scale-aware weight)
Enable adaptive loss weighting during training
Rate at which loss weights adapt (only used if adaptive_weighting is True)
Dataset Configuration
Configuration for dataset loading and preprocessing.
from src.config.config import DATASET_CONFIG
# Example: Access dataset columns
mos_columns = DATASET_CONFIG["mos_columns"]
text_column = DATASET_CONFIG["text_column"]
video_column = DATASET_CONFIG["video_column"]
List of MOS (Mean Opinion Score) column names in the dataset
Name of the column containing text prompts
video_column
string
default:"video_name"
Name of the column containing video file names
Proportion of data to use for training (0.0 to 1.0)
Proportion of data to use for validation (0.0 to 1.0)
Random seed for reproducible train/val splits
Maximum length of text prompts in tokens
video_extensions
list[str]
default:"[\".mp4\", \".avi\", \".mov\", \".mkv\"]"
Supported video file extensions
Evaluation Configuration
Configuration for model evaluation and inference.
from src.config.config import EVAL_CONFIG
# Example: Run evaluation
metrics = EVAL_CONFIG["metrics"]
batch_size = EVAL_CONFIG["batch_size"]
metrics
list[str]
default:"[\"spearman\", \"pearson\"]"
Correlation metrics to compute during evaluation
Batch size for evaluation (typically 1 for video quality assessment)
Number of worker processes for evaluation data loading
Save prediction results to file
output_format
list[str]
default:"[\"csv\", \"xlsx\"]"
Output formats for saving predictions
Generate a detailed evaluation report
GPU Configuration
Configuration for GPU memory management and optimization.
from src.config.config import GPU_CONFIG
# Example: Configure GPU settings
memory_fraction = GPU_CONFIG["memory_fraction"]
mixed_precision = GPU_CONFIG["mixed_precision"]
Fraction of GPU memory to allocate (0.0 to 1.0)
Allow GPU memory to grow dynamically as needed
Enable mixed precision (FP16/FP32) for faster computation
Enable gradient checkpointing to save memory (disabled for V-JEPA2)
Pin memory in DataLoader for faster data transfer
Clean GPU memory cache every N batches
Usage Examples
Basic Configuration Access
from src.config.config import (
DOVER_CONFIG,
VJEPA_CONFIG,
TRAINING_CONFIG,
LOSS_CONFIG,
DATASET_CONFIG,
EVAL_CONFIG,
GPU_CONFIG
)
# Access DOVER configuration
model_name = DOVER_CONFIG["model_name"]
video_resolution = DOVER_CONFIG["video_resolution"]
batch_size = DOVER_CONFIG["batch_size"]
# Access training configuration
device = TRAINING_CONFIG["device"]
learning_rate = TRAINING_CONFIG["learning_rate"]
num_workers = TRAINING_CONFIG["num_workers"]
Custom Configuration Override
import copy
from src.config.config import VJEPA_CONFIG, TRAINING_CONFIG
# Create a custom configuration
custom_config = copy.deepcopy(VJEPA_CONFIG)
custom_config["batch_size"] = 8
custom_config["learning_rate"] = 1e-4
custom_config["epochs"] = 20
# Override training settings
custom_training = copy.deepcopy(TRAINING_CONFIG)
custom_training["gradient_clipping"] = 0.5
custom_training["warmup_steps"] = 200
Using Discriminative Learning Rates
from src.config.config import VJEPA_CONFIG
# Get discriminative learning rate multipliers
base_lr = VJEPA_CONFIG["learning_rate"] # 2e-4
discriminative_lr = VJEPA_CONFIG["discriminative_lr"]
# Calculate actual learning rates for each component
text_lr = base_lr * discriminative_lr["text"] # 2e-5 (10%)
video_lr = base_lr * discriminative_lr["video"] # 1e-4 (50%)
head_lr = base_lr * discriminative_lr["head"] # 4e-4 (200%)
print(f"Text encoder LR: {text_lr}")
print(f"Video encoder LR: {video_lr}")
print(f"Prediction head LR: {head_lr}")
Configuring Loss Function
from src.config.config import LOSS_CONFIG
# Extract loss configuration
smooth_l1_beta = LOSS_CONFIG["smooth_l1_beta"]
ranking_margin = LOSS_CONFIG["ranking_margin"]
scale_weights = LOSS_CONFIG["scale_weights"]
loss_weights = LOSS_CONFIG["loss_weights"]
# Use in loss computation
alpha = loss_weights["alpha"] # 0.7 for smooth L1
beta = loss_weights["beta"] # 0.3 for ranking
gamma = loss_weights["gamma"] # 0.1 for scale-aware
total_loss = (
alpha * smooth_l1_loss +
beta * ranking_loss +
gamma * scale_aware_loss
)
Dataset Configuration
from src.config.config import DATASET_CONFIG
# Load dataset configuration
mos_columns = DATASET_CONFIG["mos_columns"]
text_column = DATASET_CONFIG["text_column"]
video_column = DATASET_CONFIG["video_column"]
train_split = DATASET_CONFIG["train_split"]
seed = DATASET_CONFIG["seed"]
# Use in dataset loading
import pandas as pd
df = pd.read_csv("data/annotations.csv")
text_prompts = df[text_column]
video_files = df[video_column]
mos_scores = df[mos_columns]
Path Constants
In addition to configuration dictionaries, the config module also defines useful path constants:
from src.config.config import (
PROJECT_ROOT,
DATA_DIR,
MODELS_DIR,
NOTEBOOKS_DIR,
DOCS_DIR,
TRAIN_DATA_PATH,
VAL_DATA_PATH,
TEST_DATA_PATH
)
# Example usage
print(f"Project root: {PROJECT_ROOT}")
print(f"Training data: {TRAIN_DATA_PATH}")
print(f"Models directory: {MODELS_DIR}")
See Also