Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/tommyngx/MammoMix/llms.txt

Use this file to discover all available pages before exploring further.

MammoMix supports two evaluation workflows: automatic evaluation hooked into the Hugging Face Trainer loop, and a standalone function for running inference and computing mAP on any test dataset.

Evaluation approaches

1

Automatic evaluation via Trainer

During training, pass the metrics function returned by get_eval_compute_metrics_fn to the Trainer as compute_metrics. The Trainer calls it after each evaluation epoch with an EvalPrediction object containing batched predictions and ground-truth labels.
evaluation.py
from transformers import Trainer, TrainingArguments
from evaluation import get_eval_compute_metrics_fn

compute_metrics = get_eval_compute_metrics_fn(image_processor)

training_args = TrainingArguments(
    output_dir="./output",
    eval_do_concat_batches=False,
    metric_for_best_model="eval_map_50",
    # ...other args
)

trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    # ...
)
trainer.train()
You must set eval_do_concat_batches=False in TrainingArguments. The compute_metrics function iterates over individual batches from evaluation_results.predictions and evaluation_results.label_ids. Concatenating batches before this step produces incorrect image-size tensors and breaks post-processing.
2

Standalone inference with mAP evaluation

Use run_model_inference_with_map to evaluate any trained model against a test dataset outside the Trainer loop. This is the recommended path for final benchmark runs.
from evaluation import run_model_inference_with_map

metrics = run_model_inference_with_map(
    model=model,
    test_dataset=test_dataset,
    image_processor=image_processor,
    device=device,
    batch_size=8,
)
print(metrics)
# {'map': 0.42, 'map_50': 0.71, 'map_75': 0.38, ...}
Signature
evaluation.py
def run_model_inference_with_map(
    model,           # Trained AutoModelForObjectDetection
    test_dataset,    # torch Dataset yielding pixel_values + labels
    image_processor, # AutoImageProcessor used during training
    device,          # torch.device
    batch_size=8,    # Images per forward pass
) -> dict[str, float]:
    ...
Internally the function:
  1. Wraps test_dataset in a DataLoader using collate_fn.
  2. Runs model.eval() and collects outputs under torch.no_grad().
  3. Delegates metric computation to calculate_custom_map_metrics.

get_eval_compute_metrics_fn

evaluation.py
def get_eval_compute_metrics_fn(image_processor):
    return partial(
        compute_metrics, image_processor=image_processor,
        threshold=0.5, id2label={0: 'cancer'}
    )
The factory returns a partially applied version of compute_metrics with two fixed parameters:
ParameterValuePurpose
threshold0.5Confidence cutoff — boxes below this score are discarded before mAP accumulation
id2label{0: 'cancer'}Single-class mapping used by the image processor during post-processing
Pass the returned callable directly to Trainer(compute_metrics=...).

Bounding box conversion: YOLO → Pascal VOC

Ground-truth labels are stored and fed to YOLOS in YOLO format: (x_center, y_center, width, height) normalised to [0, 1]. Before computing IoU-based metrics, MammoMix converts all boxes to Pascal VOC format: (x_min, y_min, x_max, y_max) in absolute pixel coordinates.
evaluation.py
from transformers.image_transforms import center_to_corners_format

def convert_bbox_yolo_to_pascal(boxes, image_size):
    boxes = center_to_corners_format(boxes)          # (cx,cy,w,h) -> (x1,y1,x2,y2), still normalised
    height, width = image_size
    boxes = boxes * torch.tensor([[width, height, width, height]])  # scale to pixels
    return boxes
The conversion runs for both targets and model predictions before they are passed to torchmetrics.

Output metrics

compute_metrics returns a dictionary filtered to keys that start with map:
{
    'map':        0.42,  # COCO-style mAP averaged over IoU thresholds 0.50–0.95
    'map_50':     0.71,  # mAP at IoU threshold 0.50
    'map_75':     0.38,  # mAP at IoU threshold 0.75
    'map_small':  0.09,  # mAP for objects with area < 32² px
    'map_medium': 0.35,  # mAP for objects with area 32²–96² px
    'map_large':  0.58,  # mAP for objects with area > 96² px
}
map_per_class is explicitly removed before returning because MammoMix is a single-class detector (cancer only). See Object detection metrics for a full explanation of each key.

ModelOutput dataclass

Post-processing via image_processor.post_process_object_detection requires a model output object with specific attributes. When running inference manually, MammoMix wraps raw tensors in a lightweight dataclass:
evaluation.py
from dataclasses import dataclass
import torch

@dataclass
class ModelOutput:
    logits: torch.Tensor    # [batch, num_queries, num_classes + 1]
    pred_boxes: torch.Tensor  # [batch, num_queries, 4] in YOLO format
This mirrors the shape of a real YolosObjectDetectionOutput and satisfies the image processor’s interface without importing the full model output class.

Build docs developers (and LLMs) love