Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/tommyngx/MammoMix/llms.txt

Use this file to discover all available pages before exploring further.

Deformable DETR (SenseTime/deformable-detr) extends the original DETR architecture with multi-scale deformable attention, making it especially effective for detecting small, diffuse lesions in high-resolution mammograms. Because the model is significantly more memory-intensive than YOLOS, the train_detrd.py script uses a dedicated training configuration: batch_size=1 with gradient_accumulation_steps=32 to reach an effective batch size of 32, disabled mixed precision (fp16=False) for numerical stability, and gradient norm clipping at 5.0 to handle the long attention spans.

Quickstart

python train_detrd.py --config configs/config_d_detr.yaml --dataset CSAW
python train_detrd.py --config configs/config_d_detr.yaml --dataset CSAW

Overriding the epoch count

python train_detrd.py --config configs/config_d_detr.yaml --dataset CSAW --epoch 100
Deformable DETR requires at least 16 GB of VRAM. Training with the default settings (max_size=800, batch_size=1, gradient_accumulation_steps=32) was validated on a single A100-80 GB. On smaller GPUs, reduce dataset.max_size to 640 and verify that CUDA does not OOM during the first evaluation step.

Memory optimization strategy

The pipeline explicitly trades per-step throughput for memory headroom:
ParameterValueEffect
per_device_train_batch_size1Minimum VRAM per step
gradient_accumulation_steps32Effective batch size = 32
fp16FalseAvoids NaN losses in deformable attention
gradient_checkpointingFalseDisabled to avoid recomputation overhead
max_grad_norm5.0Clips exploding gradients
dataloader_num_workers0Prevents shared-memory conflicts with large images
Gradient checkpointing is intentionally disabled. Deformable DETR’s multi-scale attention layers are memory-heavy, but recomputing them during the backward pass introduces enough latency that training wall-clock time can increase by 40–60 % with only modest VRAM savings. The batch_size=1 + accumulation strategy achieves a better trade-off.

Key differences from YOLOS training

The table below summarizes where train_detrd.py diverges from the YOLOS pipeline in train.py:
SettingYOLOS (train.py)Deformable DETR (train_detrd.py)
Learning rate0.00010.0005
Mixed precisionfp16=True (auto)fp16=False
Best-model metriceval_map_50eval_loss
greater_is_betterTrueFalse
Logging strategyepochsteps (every 10 steps)
save_total_limit12
The higher learning rate (0.0005) is intentional: Deformable DETR’s deformable attention modules need a larger gradient signal to adapt the sampling offsets from their ImageNet-pretrained initialization to the narrow distribution of mammography lesions. Using eval_loss as the best-model metric (rather than eval_map_50) is a practical choice. Because the Deformable DETR trainer does not attach the custom mAP compute_metrics function during training (the compute_metrics line is commented out in train_detrd.py:189), validation mAP is computed separately after training using calculate_custom_map_metrics. Tracking eval_loss ensures the best checkpoint is still selected automatically.

What happens during training

1

Dataset and processor loading

BreastCancerDataset is loaded for train and val splits, identical to the YOLOS pipeline. The image processor uses max_size=800 by default (configurable via dataset.max_size), which is the standard resolution for Deformable DETR.
2

Model loading

load_deformable_detr_model calls AutoModelForObjectDetection.from_pretrained('SenseTime/deformable-detr') with id2label={0: 'cancer'}. If a deformable_detr section is present in the config, num_queries is read from it (default 300). If loading fails with the custom config, the function falls back to a minimal configuration automatically.
3

Training

Trainer.train() runs with the memory-optimized arguments. Loss and learning rate are logged to W&B every 10 steps. Checkpoints are saved at the end of each epoch, keeping the best 2 by eval_loss.
4

Model saving

After training the best checkpoint is saved to:
../deformable_detr_{DATASET_NAME}_{DDMMYY}
5

Custom mAP evaluation

calculate_custom_map_metrics runs inference over the entire test split and computes mAP using torchvision’s box_iou. Results are printed to stdout:
Custom Test mAP Results:
------------------------------
map: 0.381
map_50: 0.612
map_75: 0.344

Model output path

../deformable_detr_{DATASET_NAME}_{DDMMYY}
For example, a DMID run completed on 12 May 2026 saves to:
../deformable_detr_DMID_120526/

Build docs developers (and LLMs) love