Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/mlfoundations/open_clip/llms.txt

Use this file to discover all available pages before exploring further.

Overview

AugmentationCfg defines data augmentation parameters for training image transforms. It controls random augmentations like resized cropping, color jitter, and random erasing.

Class Definition

@dataclass
class AugmentationCfg:
    scale: Tuple[float, float] = (0.9, 1.0)
    ratio: Optional[Tuple[float, float]] = None
    color_jitter: Optional[Union[float, Tuple[float, float, float], Tuple[float, float, float, float]]] = None
    re_prob: Optional[float] = None
    re_count: Optional[int] = None
    use_timm: bool = False
    color_jitter_prob: float = None
    gray_scale_prob: float = None

Fields

scale
Tuple[float, float]
default:"(0.9, 1.0)"
Range of size of the random crop relative to the original image size. Used in RandomResizedCrop.
  • First value: minimum crop scale (e.g., 0.08 = crop can be 8% of original)
  • Second value: maximum crop scale (e.g., 1.0 = crop can be 100% of original)
Common values:
  • (0.08, 1.0): Standard ImageNet training
  • (0.9, 1.0): Light augmentation
ratio
Tuple[float, float]
default:"None"
Range of aspect ratio of the random crop. Used in RandomResizedCrop.
  • First value: minimum aspect ratio (e.g., 0.75 = 3:4)
  • Second value: maximum aspect ratio (e.g., 1.33 = 4:3)
If None, defaults to (3/4, 4/3) in torchvision.
color_jitter
Union[float, Tuple[float, ...]]
default:"None"
Color jitter augmentation strength. Can be specified as:
  • float: Applied to brightness, contrast, saturation (e.g., 0.4)
  • Tuple[float, float, float]: (brightness, contrast, saturation)
  • Tuple[float, float, float, float]: (brightness, contrast, saturation, hue)
Values are typically in range [0, 1]. Higher values = stronger augmentation.Example: (0.4, 0.4, 0.4, 0.1) = moderate jitter with slight hue variation
re_prob
float
default:"None"
Random erasing probability. Probability of applying random erasing augmentation.
  • 0.0: No random erasing
  • 0.25: 25% chance of erasing per image
  • 1.0: Always apply random erasing
Requires use_timm=True.
re_count
int
default:"None"
Number of random erasing operations per image when random erasing is applied.Requires use_timm=True.
use_timm
bool
default:"False"
Whether to use timm (PyTorch Image Models) augmentation transforms.When True, enables advanced augmentations from timm:
  • RandAugment
  • Random erasing
  • More sophisticated augmentation pipelines
When False, uses simple torchvision-based augmentations.
color_jitter_prob
float
default:"None"
Probability of applying color jitter when use_timm=False.
  • 0.0: Never apply color jitter
  • 0.8: Apply color jitter 80% of the time (common default)
  • 1.0: Always apply color jitter
Only used when use_timm=False.
gray_scale_prob
float
default:"None"
Probability of converting image to grayscale (with 3 channels) when use_timm=False.
  • 0.0: Never grayscale
  • 0.2: 20% chance of grayscale (common default)
  • 1.0: Always grayscale
Only used when use_timm=False.

Examples

Standard ImageNet augmentation

from open_clip import AugmentationCfg, PreprocessCfg, image_transform_v2

# ImageNet-style training augmentation
aug_cfg = AugmentationCfg(
    scale=(0.08, 1.0),
    ratio=(0.75, 1.33),
    color_jitter=(0.4, 0.4, 0.4, 0.1),
    color_jitter_prob=0.8,
    gray_scale_prob=0.2
)

preprocess_cfg = PreprocessCfg(size=224)
train_transform = image_transform_v2(
    cfg=preprocess_cfg,
    is_train=True,
    aug_cfg=aug_cfg
)

Light augmentation

# Minimal augmentation for fine-tuning
aug_cfg = AugmentationCfg(
    scale=(0.9, 1.0),  # Only small crops
    color_jitter=0.2,   # Light color jitter
    color_jitter_prob=0.5
)

Strong augmentation with timm

# Advanced augmentation with timm
aug_cfg = AugmentationCfg(
    scale=(0.08, 1.0),
    color_jitter=0.4,
    re_prob=0.25,      # Random erasing
    re_count=1,
    use_timm=True      # Enable timm augmentations
)

No augmentation

# Training without augmentation (only random crop)
aug_cfg = AugmentationCfg(
    scale=(1.0, 1.0),  # No scale variation
    ratio=None,
    color_jitter=None
)

Custom aspect ratio range

# Allow more extreme aspect ratios
aug_cfg = AugmentationCfg(
    scale=(0.5, 1.0),
    ratio=(0.5, 2.0),  # From 1:2 to 2:1
    color_jitter=0.3
)

Grayscale augmentation

# High grayscale probability for robustness
aug_cfg = AugmentationCfg(
    scale=(0.8, 1.0),
    gray_scale_prob=0.5  # 50% chance of grayscale
)

Usage with image_transform_v2

import open_clip

preprocess_cfg = open_clip.PreprocessCfg(size=224)

# Create augmentation config
aug_cfg = open_clip.AugmentationCfg(
    scale=(0.08, 1.0),
    color_jitter=0.4,
    color_jitter_prob=0.8
)

# Create training transform
train_transform = open_clip.image_transform_v2(
    cfg=preprocess_cfg,
    is_train=True,
    aug_cfg=aug_cfg
)

# Use with dataset
from torchvision.datasets import ImageFolder
train_dataset = ImageFolder('data/train', transform=train_transform)

Augmentation Strategy Guide

Light Augmentation

  • scale: (0.9, 1.0)
  • color_jitter: 0.2
  • Best for: Fine-tuning, small datasets

Standard Augmentation

  • scale: (0.08, 1.0)
  • color_jitter: 0.4
  • Best for: Training from scratch

Strong Augmentation

  • use_timm: True
  • re_prob: 0.25
  • Best for: Large-scale training

Minimal Augmentation

  • scale: (0.95, 1.0)
  • No color jitter
  • Best for: High-quality datasets

Notes

  • Only used when is_train=True in image_transform_v2()
  • color_jitter_prob and gray_scale_prob are ignored when use_timm=True
  • Random erasing (re_prob, re_count) requires use_timm=True
  • Default values provide minimal augmentation; increase for stronger regularization
  • For contrastive learning, stronger augmentation typically improves performance

See Also

Build docs developers (and LLMs) love