Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/tommyngx/MammoMix/llms.txt

Use this file to discover all available pages before exploring further.

evaluation.py provides mAP computation, HuggingFace Trainer-compatible metric callbacks, and a standalone inference loop for DETR-based mammography models.

ModelOutput

A lightweight dataclass that mirrors the output structure expected by AutoImageProcessor.post_process_object_detection. Use it to wrap raw logits and predicted boxes when calling the image processor outside a HuggingFace model forward pass.
from evaluation import ModelOutput
import torch

output = ModelOutput(
    logits=torch.zeros(1, 100, 2),
    pred_boxes=torch.zeros(1, 100, 4),
)
logits
torch.Tensor
Class logits produced by the detection head. Shape (B, num_queries, num_classes + 1) where the last dimension includes the no-object class.
pred_boxes
torch.Tensor
Predicted bounding boxes in YOLO normalised format (cx, cy, w, h). Shape (B, num_queries, 4).

convert_bbox_yolo_to_pascal

Converts bounding boxes from normalised YOLO format to absolute Pascal VOC coordinates.
from evaluation import convert_bbox_yolo_to_pascal
import torch

yolo_boxes = torch.tensor([[0.5, 0.5, 0.4, 0.3]])  # (cx, cy, w, h)
pascal_boxes = convert_bbox_yolo_to_pascal(yolo_boxes, image_size=(640, 640))
# tensor([[192., 224., 448., 416.]])  — (x_min, y_min, x_max, y_max)

Parameters

boxes
torch.Tensor
required
Bounding boxes in YOLO format (cx, cy, w, h) with values normalised to [0, 1]. Shape (N, 4).
image_size
Tuple[int, int]
required
Target image dimensions as (height, width) used to scale the normalised coordinates to absolute pixel values.

Returns

boxes
torch.Tensor
Bounding boxes in Pascal VOC format (x_min, y_min, x_max, y_max) in absolute pixel coordinates. Shape (N, 4).

compute_metrics

Computes mean average precision (mAP) and related detection metrics from HuggingFace EvalPrediction objects. Decorated with @torch.no_grad().
from evaluation import compute_metrics

metrics = compute_metrics(
    evaluation_results=eval_prediction,
    image_processor=image_processor,
    threshold=0.5,
    id2label={0: "cancer"},
    max_size=640,
)
# {"map": 0.42, "map_50": 0.71, "map_75": 0.38, ...}

Parameters

evaluation_results
EvalPrediction
required
HuggingFace EvalPrediction object with .predictions (batched model outputs) and .label_ids (batched ground-truth annotations) as populated by Trainer.evaluate().
image_processor
AutoImageProcessor
required
The same image processor used during training. Called with post_process_object_detection to convert raw model outputs to scored, filtered bounding boxes.
threshold
float
default:"0.0"
Confidence threshold for filtering predicted boxes before metric computation. Boxes with scores below this value are discarded.
id2label
dict
default:"None"
Mapping from integer class id to string label name (e.g. {0: "cancer"}). Passed through to the image processor’s post-processing step.
max_size
int
default:"640"
The spatial dimension used as both height and width when constructing target_sizes for post-processing. Should match the pad_size used during preprocessing.

Returns

metrics
Mapping[str, float]
Dictionary of mAP metrics with keys prefixed by "map". Common keys include:

get_eval_compute_metrics_fn

Returns a functools.partial of compute_metrics pre-configured for MammoMix’s single-class cancer detection task. Pass the result directly as the compute_metrics argument to Trainer.
from evaluation import get_eval_compute_metrics_fn
from transformers import Trainer, TrainingArguments

compute_metrics_fn = get_eval_compute_metrics_fn(image_processor)

trainer = Trainer(
    model=model,
    args=TrainingArguments(...),
    compute_metrics=compute_metrics_fn,
)

Parameters

image_processor
AutoImageProcessor
required
Image processor used to post-process predictions inside compute_metrics.

Returns

fn
Callable[[EvalPrediction], Mapping[str, float]]
A partial of compute_metrics with threshold=0.5 and id2label={0: "cancer"} already bound. Accepts a single EvalPrediction argument.

calculate_custom_map_metrics

An alternative mAP implementation that processes raw model output objects directly, bypassing the EvalPrediction serialisation path used by compute_metrics. Useful when evaluating outside Trainer or when the standard path encounters tensor-shape issues.
from evaluation import calculate_custom_map_metrics

metrics = calculate_custom_map_metrics(
    predictions=all_predictions,
    targets=all_targets,
    image_processor=image_processor,
    device=device,
    max_size=640,
)
All tensors are moved to CPU before computing metrics for compatibility with torchmetrics. Returns zeroed metrics as a fallback if any exception is raised.

Parameters

predictions
list
required
List of model output objects. Each element must expose .logits and .pred_boxes attributes (e.g. instances of ModelOutput or native HuggingFace model outputs).
targets
list[dict]
required
List of ground-truth annotation dicts, each with keys "boxes" (YOLO format Tensor) and "class_labels" (Tensor).
image_processor
AutoImageProcessor
required
Image processor used to post-process predictions via post_process_object_detection with threshold=0.5.
device
torch.device
required
The device on which intermediate tensors are created. Final metric computation is performed on CPU.
max_size
int
default:"640"
Image spatial dimension used as target size during post-processing.

Returns

metrics
dict[str, float]
mAP metrics dict with the same keys as compute_metrics ("map", "map_50", "map_75", "map_small", "map_medium", "map_large"). All values are Python float. Returns all zeros on error.

run_model_inference_with_map

End-to-end inference loop: iterates over a test dataset, collects model predictions, and returns mAP metrics. This is the primary entry point for evaluating a trained checkpoint on a held-out split.
from evaluation import run_model_inference_with_map

metrics = run_model_inference_with_map(
    model=model,
    test_dataset=test_dataset,
    image_processor=image_processor,
    device=device,
    batch_size=8,
)
print(metrics)
# {"map": 0.45, "map_50": 0.73, ...}

Parameters

model
nn.Module
required
A trained HuggingFace detection model (e.g. AutoModelForObjectDetection). The function calls model.eval() before running inference and wraps the loop in torch.no_grad().
test_dataset
BreastCancerDataset
required
A BreastCancerDataset instance initialised with split="test". Wrapped in a DataLoader internally.
image_processor
AutoImageProcessor
required
Image processor forwarded to calculate_custom_map_metrics for post-processing.
device
torch.device
required
Device to move pixel_values and label tensors to before the forward pass.
batch_size
int
default:"8"
Number of images per inference batch.

Returns

metrics
dict[str, float]
mAP metrics dict returned by calculate_custom_map_metrics, aggregated over the full test set.

Build docs developers (and LLMs) love