Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/tommyngx/MammoMix/llms.txt

Use this file to discover all available pages before exploring further.

MammoMix is a two-stage breast cancer detection system built on transformer-based object detectors. It trains separate YOLOS and Deformable DETR models on three mammography datasets — CSAW, DMID, and DDSM — then combines their predictions at inference time using MoCaE (Mixture of Calibrated Experts), a post-processing ensemble that applies score calibration, Soft-NMS, and Score Voting to produce a single refined set of detections per image.

Detection pipeline

Each model follows the same end-to-end pipeline from raw image to evaluation metric:
1

Raw mammography image

DICOM-derived images are stored as JPEG or PNG files. Each image has a paired Pascal VOC XML annotation file containing the bounding box coordinates of any cancer region.
2

Augmentation

During training, the BreastCancerDataset.__getitem__ method applies an albumentations pipeline (elastic deformation, perspective distortion, flips, noise, and blur). Validation and test images use A.NoOp() — no augmentation. If augmentation removes all bounding boxes, the sample is retried automatically.
3

Image processor

AutoImageProcessor resizes and pads the image to a fixed square (max_size × max_size) and normalises pixel values. The processor is loaded from the HuggingFace model hub and is the same object used during training and inference.
utils.py
def get_image_processor(model_name, max_size):
    return AutoImageProcessor.from_pretrained(
        model_name,
        do_resize=True, do_pad=True, use_fast=True,
        size={'max_height': max_size, 'max_width': max_size},
        pad_size={'height': max_size, 'width': max_size},
    )
4

Model forward pass

The processed batch is passed to AutoModelForObjectDetection. Both YOLOS and Deformable DETR produce a set of predicted boxes and logits. The single detection class is cancer (id=0).
5

Post-processing

image_processor.post_process_object_detection converts raw logits and normalised box coordinates into absolute-pixel boxes filtered by a confidence threshold (default 0.5).
6

mAP evaluation

The compute_metrics function in evaluation.py converts predictions and targets to Pascal VOC format and computes mAP, mAP@50, mAP@75, and size-stratified variants using torchmetrics.

YOLOS

YOLOS (You Only Look One-level Series) reformulates object detection as a sequence-to-sequence task on top of a Vision Transformer. MammoMix uses the hustvl/yolos-base checkpoint from HuggingFace, loaded via AutoModelForObjectDetection:
train.py
model = AutoModelForObjectDetection.from_pretrained(
    MODEL_NAME,
    id2label={0: 'cancer'},
    label2id={'cancer': 0},
    ignore_mismatched_sizes=True,
)
The model is configured with a single label (cancer, id=0) and runs at a maximum resolution of 640 × 640. Training uses the HuggingFace Trainer API with a cosine_with_restarts scheduler, gradient accumulation of 2 steps (effective batch size 16), and fp16 mixed precision when a GPU is available.

Deformable DETR

Deformable DETR extends DETR with multi-scale deformable attention, which reduces the quadratic complexity of standard attention and improves detection of small objects. MammoMix uses the SenseTime/deformable-detr checkpoint at a maximum resolution of 800 × 800. Because Deformable DETR is more memory-intensive than YOLOS, train_detrd.py hard-codes a physical batch size of 1 with gradient accumulation of 32, producing an effective batch size of 32 while keeping GPU memory usage manageable:
train_detrd.py
batch_size = 1   # Deformable DETR is memory hungry, use 1 for safety
grad_accum = 32  # Effective batch size = 32
learning_rate = 0.0005
The model is also trained with a higher gradient-clipping norm (max_grad_norm=5.0) and a simpler cosine scheduler (no restarts) to improve training stability.

Model comparison

PropertyYOLOSDeformable DETR
HuggingFace IDhustvl/yolos-baseSenseTime/deformable-detr
Max input size640 × 640800 × 800
Physical batch size81
Gradient accumulation232
Effective batch size1632
Learning rate1e-45e-4
Weight decay5e-41e-5
LR schedulercosine_with_restartscosine
fp16Yes (when GPU available)No
Num object queriesDefault (100)300
Config fileconfigs/config_yolos.yamlconfigs/config_d_detr.yaml

MoCaE ensemble

MoCaE (Mixture of Calibrated Experts) combines the predictions of all three per-dataset YOLOS models at inference time. It has three components: ResNet-18 feature extractor. A pretrained ResNet-18 with its classification head replaced by an identity layer extracts a 512-dimensional image embedding for each image in the batch. These embeddings capture visual context independently of the detector output.
mocae.py
feature_extractor = resnet18(weights=ResNet18_Weights.IMAGENET1K_V1)
feature_extractor.fc = torch.nn.Identity()  # Remove classification head
RandomForest calibrator. For each expert model, a RandomForestRegressor (300 trees) is trained to predict the IoU between a predicted box and the nearest ground-truth box. The input to the calibrator is the concatenation of the 512-dim image embedding and the raw detector confidence score (513 features total). At inference time, the calibrator replaces the raw confidence with a calibrated score that reflects predicted localisation quality.
mocae.py
calibrator = RandomForestRegressor(n_estimators=300, n_jobs=-1)
calibrator.fit(inputs_val, ious_val)
Soft-NMS and Score Voting. After calibration, boxes from all three experts are pooled and deduplicated with Soft-NMS (Gaussian decay, sigma=0.08, iou_thresh=0.65). Score Voting then refines each surviving box by computing a weighted average of nearby boxes, where the weight is the product of the calibrated score and a Gaussian IoU similarity, with self-influence removed:
mocae.py
combined_boxes, combined_scores = soft_nms(
    torch.cat(combined_boxes, dim=0),
    torch.cat(combined_scores, dim=0),
    sigma_nms=sigma_nms, iou_nms=iou_nms,
    score_thresh=score_thresh, method=method
)
combined_boxes, combined_scores = score_voting(
    combined_boxes, combined_scores, sigma_sv=sigma_nms
)

Training YOLOS

Run YOLOS training with train.py and config_yolos.yaml.

Training Deformable DETR

Run Deformable DETR training with train_detrd.py and config_d_detr.yaml.

MoCaE ensemble inference

Combine per-dataset experts using score calibration and Soft-NMS.

Evaluation and metrics

Compute mAP, mAP@50, and mAP@75 on test splits.

Build docs developers (and LLMs) love