MammoMix stores bounding box annotations as Pascal VOC XML files — oneDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/tommyngx/MammoMix/llms.txt
Use this file to discover all available pages before exploring further.
.xml file per image. These files are parsed by utils.py, cross-validated against bbox_annotations.csv, and converted to COCO format before being passed to the image processor in loader.py.
MammoMix is a single-class detection task. Every annotated object is a cancer lesion and is assigned
class_id = 0.Pascal VOC XML format
Each XML file lives in thelabels/ directory alongside its corresponding image in images/. The filename stem must match (e.g. image_001.jpg → image_001.xml).
image_001.xml
<object> elements if more than one lesion is present. MammoMix silently drops any bounding box where xmin >= xmax or ymin >= ymax.
parse_voc_xml return structure
parse_voc_xml in utils.py parses a single XML file and returns a plain Python dictionary:
utils.py
xml2dicts output structure
xml2dicts converts the raw parse_voc_xml bboxes list to a format suitable for DETR/YOLOS, assigning the hardcoded class_id = 0 to every object:
utils.py
bbox_annotations.csv cross-validation file
Every dataset directory must contain a bbox_annotations.csv file at its root. splitting.py reads this file to verify that the image dimensions recorded in each XML annotation match a trusted ground-truth source before the image is included in any split.
Expected columns:
| Column | Type | Description |
|---|---|---|
name | str | Image filename (must match <filename> in the XML). |
width | int | Expected image width in pixels. |
height | int | Expected image height in pixels. |
width or height in the XML does not match the CSV, the image is dropped from the split and an error is logged. If the image is absent from the CSV entirely, it is kept with a warning.
COCO-style format for the image processor
loader.py converts the output of xml2dicts into the COCO annotation format expected by AutoImageProcessor before passing it to the model:
loader.py
bboxuses[xmin, ymin, width, height]format — not the[xmin, ymin, xmax, ymax]format stored in the XML.areais computed aswidth × heightof the bounding box.iscrowdis always0; MammoMix does not use crowd annotations.category_idmaps directly toclass_idfromxml2dicts(always0).
image_processor(images=image, annotations=annotations, ...), which handles resizing, padding, and normalisation before the tensor is fed to the model.