Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/tommyngx/MammoMix/llms.txt

Use this file to discover all available pages before exploring further.

loader.py provides the PyTorch Dataset implementation for mammography data and the collate function used with DataLoader. It handles split-based file discovery, VOC XML annotation parsing, Albumentations augmentation, and DETR-compatible encoding.

BreastCancerDataset

A torch.utils.data.Dataset that loads mammography images and bounding-box annotations for DETR-based object detection models. Augmentation is applied automatically for the train split.
from loader import BreastCancerDataset
from transformers import AutoImageProcessor

image_processor = AutoImageProcessor.from_pretrained(
    "hustvl/yolos-base",
    do_resize=True,
    do_pad=True,
    use_fast=True,
    size={"max_height": 640, "max_width": 640},
    pad_size={"height": 640, "width": 640},
)

dataset = BreastCancerDataset(
    split="train",
    splits_dir="AJCAI25/splits",
    dataset_name="CSAW",
    image_processor=image_processor,
)

Constructor parameters

split
string
required
Dataset split to load. Must be one of "train", "val", or "test". Raises ValueError for any other value. The "train" split activates the full Albumentations augmentation pipeline; "val" and "test" apply a no-op identity transform.
splits_dir
string
required
Path to the root directory that contains per-dataset split files. The constructor expects a file at {splits_dir}/{dataset_name}/{split}.txt. Each line in that file is a relative path to one image. Raises FileNotFoundError if the file does not exist.
dataset_name
string
required
Name of the dataset subdirectory inside splits_dir. Supported values used in MammoMix are "CSAW", "DMID", and "DDSM".
image_processor
AutoImageProcessor
required
A HuggingFace AutoImageProcessor instance (e.g. from hustvl/yolos-base or a DETR checkpoint). Used to resize, pad, and normalise images, and to encode COCO-format annotations into the tensors expected by DETR.

Return value — __getitem__

Each call to dataset[idx] returns a Python dict with the following fields.
pixel_values
torch.Tensor
Preprocessed image tensor of shape (3, H, W) after resizing, padding, and normalisation. The batch dimension from the image processor is squeezed out.
labels
dict
DETR-compatible annotation dict produced by the image processor. Contains at minimum:

Training augmentation pipeline

When split="train", get_transforms() returns an albumentations.Compose with the following transforms applied to both the image and bounding boxes:
TransformKey parameters
ElasticTransformalpha=50, sigma=5, p=0.5
Perspectivescale=(0.05, 0.1), p=0.5
HorizontalFlipp=0.5
Rotatelimit=10, p=0.5
RandomScalescale_limit=0.2, p=0.5
Affinescale, translate, rotate, shear, p=0.5
RandomBrightnessContrastbrightness_limit=0.2, contrast_limit=0.2, p=0.5
GaussNoisestd_range=(0.05, 0.05), p=0.5
GaussianBlurp=0.5
Bounding-box params: format pascal_voc, min_area=25, min_visibility=0.1, clip=True. If all boxes are removed by augmentation, the item is retried automatically.

collate_fn

Collates a list of dataset samples into a batch suitable for a DataLoader.
from torch.utils.data import DataLoader
from loader import BreastCancerDataset, collate_fn

loader = DataLoader(
    dataset,
    batch_size=16,
    shuffle=True,
    num_workers=4,
    pin_memory=True,
    collate_fn=collate_fn,
)

Parameters

batch
list[dict]
required
A list of sample dicts as returned by BreastCancerDataset.__getitem__. Each dict must contain pixel_values and labels, and may optionally contain pixel_mask.

Return value

pixel_values
torch.Tensor
Stacked image tensor of shape (B, 3, H, W) produced by torch.stack.
labels
list[dict]
List of per-image label dicts (length B). Kept as a Python list because each image may have a different number of bounding boxes and DETR expects this structure directly.
pixel_mask
torch.Tensor
Stacked attention mask of shape (B, H, W), present only when pixel_mask exists in the first sample. Each value is 1 for real pixels and 0 for padding.

Build docs developers (and LLMs) love