Object detection metrics in MammoMix

MammoMix uses torchmetrics to compute mean average precision (mAP) for breast cancer detection. All metrics are computed in Pascal VOC coordinate space after converting model outputs from YOLO format.

Metric definitions

map — COCO-style mAP (primary metric)

The primary summary metric. map averages precision over 10 IoU thresholds from 0.50 to 0.95 in steps of 0.05, then averages over all object sizes.

map = mean(AP @ IoU ∈ {0.50, 0.55, 0.60, ..., 0.95})

This is the standard COCO detection metric. It penalises imprecise localisation more than map_50 does, so improvements here indicate genuine advances in both detection and box quality.

map_50 — mAP at IoU 0.50 (best-model criterion)

Average precision computed at a single IoU threshold of 0.50. A predicted box counts as a true positive when its intersection-over-union with the matched ground-truth box is at least 50%.map_50 is used as metric_for_best_model in TrainingArguments, meaning the Trainer saves the checkpoint that maximises this value:

training_args = TrainingArguments(
    metric_for_best_model="eval_map_50",
    greater_is_better=True,
    # ...
)

It is more lenient than map and correlates well with clinical recall in mammography screening, where detecting the lesion at all matters more than tight localisation.

map_75 — mAP at IoU 0.75 (strict threshold)

Average precision at an IoU threshold of 0.75. Only tightly localised predictions count as true positives. A substantially lower map_75 relative to map_50 indicates the model detects lesions but localises them loosely.

map_small — mAP for small objects

Average precision for ground-truth boxes with an area smaller than 32 × 32 pixels (1,024 px²). Small lesions are the hardest to detect and most clinically significant in early-stage mammography.

map_medium — mAP for medium objects

Average precision for ground-truth boxes with area between 32² and 96² pixels (1,024–9,216 px²).

map_large — mAP for large objects

Average precision for ground-truth boxes with area greater than 96 × 96 pixels (9,216 px²).

Why `map_per_class` is removed

torchmetrics.functional.detection.map.mean_average_precision returns a map_per_class tensor by default. MammoMix removes it before returning metrics:

evaluation.py

metrics = mean_average_precision(post_processed_predictions, post_processed_targets)
metrics.pop('map_per_class')
return {k: v for k, v in metrics.items() if k.startswith('map')}

MammoMix is a single-class detector — the only object category is cancer (id2label={0: 'cancer'}). With a single class, map_per_class duplicates the top-level map value and adds noise to logging dashboards and Trainer checkpointing logic. Removing it keeps the returned dictionary clean and avoids confusing the metric_for_best_model selector.

`ModelOutput` dataclass

During evaluation MammoMix wraps raw model tensors in a minimal dataclass so that image_processor.post_process_object_detection can process them without importing the full YOLOS output class:

evaluation.py

from dataclasses import dataclass
import torch

@dataclass
class ModelOutput:
    logits: torch.Tensor      # shape: [batch_size, num_queries, num_classes + 1]
    pred_boxes: torch.Tensor  # shape: [batch_size, num_queries, 4], YOLO format

The image processor reads logits to extract class probabilities and pred_boxes to obtain box coordinates. It applies softmax over logits, filters by the threshold argument (0.5 by default), and converts pred_boxes to Pascal VOC format using the provided target_sizes.

Metric computation flow

evaluation.py

# 1. Convert ground-truth boxes: YOLO → Pascal VOC
boxes = convert_bbox_yolo_to_pascal(boxes, [max_size, max_size])

# 2. Wrap model output and post-process predictions
output = ModelOutput(logits=torch.tensor(batch_logits), pred_boxes=batch_boxes_tensor)
post_processed_output = image_processor.post_process_object_detection(
    output, threshold=0.5, target_sizes=target_sizes
)

# 3. Accumulate and compute metrics
metrics = mean_average_precision(post_processed_predictions, post_processed_targets)
metrics.pop('map_per_class')
return {k: v for k, v in metrics.items() if k.startswith('map')}

Both target boxes and prediction boxes must be in Pascal VOC format (x_min, y_min, x_max, y_max) with absolute pixel coordinates before being passed to mean_average_precision. Passing normalised YOLO coordinates will produce silently incorrect mAP values near zero.

Metric summary table

Key	IoU threshold	Size filter	Notes
`map`	0.50–0.95 (10 steps)	All	Primary COCO metric
`map_50`	0.50	All	Used as `metric_for_best_model`
`map_75`	0.75	All	Strict localisation quality
`map_small`	0.50–0.95	area < 1,024 px²	Early-stage lesions
`map_medium`	0.50–0.95	1,024–9,216 px²	Mid-size lesions
`map_large`	0.50–0.95	area > 9,216 px²	Large lesions
`map_per_class`	—	—	Removed (single-class detector)

Get Started

Concepts

Training

Evaluation & Inference

Data Pipeline

Object detection metrics in MammoMix

Metric definitions

Why `map_per_class` is removed

`ModelOutput` dataclass

Metric computation flow

Metric summary table

Build docs developers (and LLMs) love

Get Started

Concepts

Training

Evaluation & Inference

Data Pipeline

Documentation Index

​Metric definitions

​Why map_per_class is removed

​ModelOutput dataclass

​Metric computation flow

​Metric summary table

Build docs developers (and LLMs) love

Metric definitions

Why `map_per_class` is removed

`ModelOutput` dataclass

Metric computation flow

Metric summary table