Medical image augmentation for mammography training

Mammography datasets are small by deep-learning standards, and the visual appearance of cancer lesions varies significantly with tissue density, scanner model, and imaging angle. Without augmentation, a detector trained on a few hundred images tends to memorise the specific appearance of training cases and fails to generalise to new patients or institutions. MammoMix uses an albumentations pipeline that applies a diverse set of geometric and photometric transforms at training time to artificially expand the effective dataset size and expose the model to realistic imaging variation.

Full augmentation pipeline

The pipeline is defined in BreastCancerDataset.get_transforms in loader.py. It is applied only when split == 'train'; validation and test splits receive A.NoOp().

loader.py

def get_transforms(self):
    if self.split == 'train': # Apply augmentation if training
        return A.Compose([
            # Geometric transformations
            A.ElasticTransform(alpha=50, sigma=5, approximate=False, p=0.5), # Elastic deformation to simulate tissue variability
            A.Perspective(scale=(0.05, 0.1), p=0.5), # Perspective distortion to simulate different angles
            A.HorizontalFlip(p=0.5), # Mirror image
            A.Rotate(limit=10, p=0.5), # Small angles to avoid disrupting anatomical structure
            A.RandomScale(scale_limit=0.2, p=0.5), # Random scaling to simulate different distances
            A.Affine(
                scale=(0.9, 1.1), translate_percent=(0.1, 0.1), rotate=(-10, 10), shear=(-5, 5),
                interpolation=1, p=0.5 # Affine transformation to simulate different angles and scales
            ),

            # Color and intensity transformations
            A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=0.5),
            A.GaussNoise(std_range=(0.05, 0.05), mean_range=(0.0, 0.0), per_channel=True, p=0.5),
            A.GaussianBlur(p=0.5),
        ], bbox_params=A.BboxParams(
            format='pascal_voc', # [x_min, y_min, x_max, y_max]
            label_fields=['labels'], # Labels for bounding boxes
            min_area=25, # Drop boxes smaller than 25 pixels after augmentation
            min_visibility=0.1, # Discard boxes with less than 10% visibility after augmentation
            clip=True # Clip bounding boxes to image boundaries
        ))
    return A.Compose([A.NoOp()], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels'], clip=True))

Transform explanations

ElasticTransform

A.ElasticTransform(alpha=50, sigma=5, approximate=False, p=0.5)

Applies a smooth, spatially-varying displacement field to the image — simulating the natural elastic deformation of breast tissue under compression. alpha=50 controls displacement magnitude and sigma=5 controls the smoothness of the deformation. This is one of the most effective augmentations for medical image segmentation and detection because real anatomical structures deform non-rigidly.

Perspective

A.Perspective(scale=(0.05, 0.1), p=0.5)

Applies a random four-point perspective warp. This simulates the effect of the X-ray source or detector not being perfectly orthogonal to the breast, which produces projective distortion in practice. scale=(0.05, 0.1) keeps the warp subtle enough to preserve anatomical integrity.

HorizontalFlip

A.HorizontalFlip(p=0.5)

Randomly mirrors the image left-to-right. Because mammograms are acquired from both the left and right breast, a horizontally flipped left-breast image is visually indistinguishable from a right-breast image. This effectively doubles the usable training samples with no labelling cost.

Rotate

A.Rotate(limit=10, p=0.5)

Rotates the image by a uniformly sampled angle in [-10°, +10°]. The small limit is intentional: large rotations would make the image anatomically implausible (a mammogram rotated 45° no longer looks like a clinical acquisition). Small rotations simulate slight patient positioning variation.

RandomScale

A.RandomScale(scale_limit=0.2, p=0.5)

Rescales the image by a random factor in [0.8, 1.2]. This simulates acquiring mammograms at slightly different distances from the X-ray source, which changes the apparent size of anatomical structures and lesions.

Affine

A.Affine(
    scale=(0.9, 1.1), translate_percent=(0.1, 0.1), rotate=(-10, 10), shear=(-5, 5),
    interpolation=1, p=0.5
)

Applies a combined affine transformation with independent control over scale, translation, rotation, and shear. This is a more general geometric augmentation that covers positioning artefacts not captured by the individual transforms above. interpolation=1 uses bilinear interpolation to keep edge quality reasonable.

RandomBrightnessContrast

A.RandomBrightnessContrast(brightness_limit=0.2, contrast_limit=0.2, p=0.5)

Randomly shifts pixel intensity (brightness) and the intensity range (contrast) by up to ±20%. Different mammography systems and exposure settings produce images with substantially different brightness and contrast profiles, so this transform helps the model generalise across scanners and acquisition protocols.

GaussNoise

A.GaussNoise(std_range=(0.05, 0.05), mean_range=(0.0, 0.0), per_channel=True, p=0.5)

Adds zero-mean Gaussian noise independently to each colour channel, simulating electronic sensor noise and quantum mottle that appear in low-dose or high-sensitivity mammography acquisitions. The fixed std=0.05 keeps the noise level realistic without degrading image quality.

GaussianBlur

A.GaussianBlur(p=0.5)

Applies Gaussian smoothing to simulate motion blur (from patient movement during exposure) or focus blur from depth-of-field effects. This encourages the model to detect cancer regions based on shape and location rather than high-frequency texture that may be absent in blurry acquisitions.

Bounding box parameters

The bbox_params argument propagates transforms to the bounding boxes alongside the image:

loader.py

bbox_params=A.BboxParams(
    format='pascal_voc',       # [x_min, y_min, x_max, y_max]
    label_fields=['labels'],   # Labels for bounding boxes
    min_area=25,               # Drop boxes smaller than 25 pixels after augmentation
    min_visibility=0.1,        # Discard boxes with less than 10% visibility after augmentation
    clip=True                  # Clip bounding boxes to image boundaries
)

format='pascal_voc': boxes are expressed as absolute pixel coordinates [x_min, y_min, x_max, y_max], matching the output of parse_voc_xml.
clip=True: after geometric transforms, boxes are clamped to the image boundary so no coordinate falls outside [0, W] or [0, H].
min_area=25: any box whose area after augmentation is smaller than 25 pixels is dropped. This prevents degenerate near-zero-area annotations from entering the loss computation.
min_visibility=0.1: any box that has less than 10% of its original area visible after augmentation (e.g. because it was cropped to the image edge) is dropped.

Retry logic

If all bounding boxes are dropped by the augmentation pipeline (e.g. a combination of aggressive scaling and rotation pushes every annotation off-screen), __getitem__ retries the same index:

loader.py

transformed = self.transforms(image=image, bboxes=bboxes, labels=labels)
labels = np.array(transformed['labels'], dtype=np.int64)
if len(transformed['labels']) <= 0: return self.__getitem__(idx)  # Retry if no valid boxes after augmentation

This guarantees that every sample returned by the dataset contains at least one valid annotation, preventing the model from receiving supervision-free examples.

Validation and test behaviour

Validation and test splits bypass all augmentation:

loader.py

return A.Compose([A.NoOp()], bbox_params=A.BboxParams(format='pascal_voc', label_fields=['labels'], clip=True))

A.NoOp() is a no-operation transform; the image and boxes pass through unchanged. clip=True is still applied, which is a safe no-op for well-formed annotations but guards against any annotation that marginally exceeds the image boundary.

The min_area=25 and min_visibility=0.1 thresholds deserve careful tuning for your dataset. If your annotations include very small lesions (e.g. micro-calcifications spanning only a few pixels), a min_area of 25 may silently discard real cancer regions after aggressive scaling. Conversely, setting min_area too low keeps near-invisible annotations that add noise to the loss. Check the distribution of annotation areas in your training set before adjusting these values.

Get Started

Concepts

Training

Evaluation & Inference

Data Pipeline

Medical image augmentation for mammography training

Full augmentation pipeline

Transform explanations

ElasticTransform

Perspective

HorizontalFlip

Rotate

RandomScale

Affine

RandomBrightnessContrast

GaussNoise

GaussianBlur

Bounding box parameters

Retry logic

Validation and test behaviour

Build docs developers (and LLMs) love

Get Started

Concepts

Training

Evaluation & Inference

Data Pipeline

Documentation Index

​Full augmentation pipeline

​Transform explanations

​ElasticTransform

​Perspective

​HorizontalFlip

​Rotate

​RandomScale

​Affine

​RandomBrightnessContrast

​GaussNoise

​GaussianBlur

​Bounding box parameters

​Retry logic

​Validation and test behaviour

Build docs developers (and LLMs) love

Full augmentation pipeline

Transform explanations

ElasticTransform

Perspective

HorizontalFlip

Rotate

RandomScale

Affine

RandomBrightnessContrast

GaussNoise

GaussianBlur

Bounding box parameters

Retry logic

Validation and test behaviour