Deformable DETR (SenseTime/deformable-detr) extends the original DETR architecture with multi-scale deformable attention, making it especially effective for detecting small, diffuse lesions in high-resolution mammograms. Because the model is significantly more memory-intensive than YOLOS, theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/tommyngx/MammoMix/llms.txt
Use this file to discover all available pages before exploring further.
train_detrd.py script uses a dedicated training configuration: batch_size=1 with gradient_accumulation_steps=32 to reach an effective batch size of 32, disabled mixed precision (fp16=False) for numerical stability, and gradient norm clipping at 5.0 to handle the long attention spans.
Quickstart
Overriding the epoch count
Memory optimization strategy
The pipeline explicitly trades per-step throughput for memory headroom:| Parameter | Value | Effect |
|---|---|---|
per_device_train_batch_size | 1 | Minimum VRAM per step |
gradient_accumulation_steps | 32 | Effective batch size = 32 |
fp16 | False | Avoids NaN losses in deformable attention |
gradient_checkpointing | False | Disabled to avoid recomputation overhead |
max_grad_norm | 5.0 | Clips exploding gradients |
dataloader_num_workers | 0 | Prevents shared-memory conflicts with large images |
Key differences from YOLOS training
The table below summarizes wheretrain_detrd.py diverges from the YOLOS pipeline in train.py:
| Setting | YOLOS (train.py) | Deformable DETR (train_detrd.py) |
|---|---|---|
| Learning rate | 0.0001 | 0.0005 |
| Mixed precision | fp16=True (auto) | fp16=False |
| Best-model metric | eval_map_50 | eval_loss |
greater_is_better | True | False |
| Logging strategy | epoch | steps (every 10 steps) |
save_total_limit | 1 | 2 |
0.0005) is intentional: Deformable DETR’s deformable attention modules need a larger gradient signal to adapt the sampling offsets from their ImageNet-pretrained initialization to the narrow distribution of mammography lesions.
Using eval_loss as the best-model metric (rather than eval_map_50) is a practical choice. Because the Deformable DETR trainer does not attach the custom mAP compute_metrics function during training (the compute_metrics line is commented out in train_detrd.py:189), validation mAP is computed separately after training using calculate_custom_map_metrics. Tracking eval_loss ensures the best checkpoint is still selected automatically.
What happens during training
Dataset and processor loading
BreastCancerDataset is loaded for train and val splits, identical to the YOLOS pipeline. The image processor uses max_size=800 by default (configurable via dataset.max_size), which is the standard resolution for Deformable DETR.Model loading
load_deformable_detr_model calls AutoModelForObjectDetection.from_pretrained('SenseTime/deformable-detr') with id2label={0: 'cancer'}. If a deformable_detr section is present in the config, num_queries is read from it (default 300). If loading fails with the custom config, the function falls back to a minimal configuration automatically.Training
Trainer.train() runs with the memory-optimized arguments. Loss and learning rate are logged to W&B every 10 steps. Checkpoints are saved at the end of each epoch, keeping the best 2 by eval_loss.