Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/tommyngx/MammoMix/llms.txt

Use this file to discover all available pages before exploring further.

MammoMix stores bounding box annotations as Pascal VOC XML files — one .xml file per image. These files are parsed by utils.py, cross-validated against bbox_annotations.csv, and converted to COCO format before being passed to the image processor in loader.py.
MammoMix is a single-class detection task. Every annotated object is a cancer lesion and is assigned class_id = 0.

Pascal VOC XML format

Each XML file lives in the labels/ directory alongside its corresponding image in images/. The filename stem must match (e.g. image_001.jpgimage_001.xml).
image_001.xml
<annotation>
  <filename>image_001.jpg</filename>
  <size>
    <width>1024</width>
    <height>768</height>
    <depth>3</depth>
  </size>
  <object>
    <name>cancer</name>
    <bndbox>
      <xmin>120</xmin>
      <ymin>200</ymin>
      <xmax>310</xmax>
      <ymax>390</ymax>
    </bndbox>
  </object>
</annotation>
An image may contain multiple <object> elements if more than one lesion is present. MammoMix silently drops any bounding box where xmin >= xmax or ymin >= ymax.

parse_voc_xml return structure

parse_voc_xml in utils.py parses a single XML file and returns a plain Python dictionary:
utils.py
def parse_voc_xml(xml_path):
    tree = ET.parse(xml_path)
    root = tree.getroot()
    image_name = root.find('filename').text
    size = root.find('size')
    width = int(size.find('width').text)
    height = int(size.find('height').text)
    bboxes = []
    for obj in root.findall('object'):
        name = obj.find('name').text
        bbox = obj.find('bndbox')
        xmin = float(bbox.find('xmin').text)
        ymin = float(bbox.find('ymin').text)
        xmax = float(bbox.find('xmax').text)
        ymax = float(bbox.find('ymax').text)
        if xmin < xmax and ymin < ymax:
            bboxes.append({
                'class': name,
                'xmin': xmin, 'ymin': ymin,
                'xmax': xmax, 'ymax': ymax
            })
    return {
        'image_name': image_name,
        'width': width,
        'height': height,
        'bboxes': bboxes
    }
The returned dictionary has the following shape:
{
    'image_name': 'image_001.jpg',   # str  — value of <filename>
    'width': 1024,                   # int  — image width in pixels
    'height': 768,                   # int  — image height in pixels
    'bboxes': [                      # list — one entry per valid <object>
        {
            'class': 'cancer',       # str   — value of <name>
            'xmin': 120.0,           # float — left edge
            'ymin': 200.0,           # float — top edge
            'xmax': 310.0,           # float — right edge
            'ymax': 390.0,           # float — bottom edge
        }
    ]
}

xml2dicts output structure

xml2dicts converts the raw parse_voc_xml bboxes list to a format suitable for DETR/YOLOS, assigning the hardcoded class_id = 0 to every object:
utils.py
def xml2dicts(bboxes, width, height):
    detr_bboxes = []
    for bbox in bboxes:
        class_id = 0  # Single class 'cancer'
        detr_bboxes.append({
            'class_id': class_id,
            'xmin': bbox['xmin'],
            'ymin': bbox['ymin'],
            'xmax': bbox['xmax'],
            'ymax': bbox['ymax'],
        })
    return detr_bboxes
Each element in the returned list:
{
    'class_id': 0,       # int   — always 0 (cancer)
    'xmin': 120.0,       # float — left edge (pixels)
    'ymin': 200.0,       # float — top edge (pixels)
    'xmax': 310.0,       # float — right edge (pixels)
    'ymax': 390.0,       # float — bottom edge (pixels)
}

bbox_annotations.csv cross-validation file

Every dataset directory must contain a bbox_annotations.csv file at its root. splitting.py reads this file to verify that the image dimensions recorded in each XML annotation match a trusted ground-truth source before the image is included in any split. Expected columns:
ColumnTypeDescription
namestrImage filename (must match <filename> in the XML).
widthintExpected image width in pixels.
heightintExpected image height in pixels.
If the width or height in the XML does not match the CSV, the image is dropped from the split and an error is logged. If the image is absent from the CSV entirely, it is kept with a warning.

COCO-style format for the image processor

loader.py converts the output of xml2dicts into the COCO annotation format expected by AutoImageProcessor before passing it to the model:
loader.py
annotations = {
    'image_id': idx,
    'annotations': [
        {
            'image_id': idx,
            'category_id': label,                            # class_id (0 = cancer)
            'bbox': [
                bbox[0],               # xmin
                bbox[1],               # ymin
                bbox[2] - bbox[0],     # width  (xmax - xmin)
                bbox[3] - bbox[1],     # height (ymax - ymin)
            ],
            'area': (bbox[2] - bbox[0]) * (bbox[3] - bbox[1]),
            'iscrowd': 0,
        }
        for bbox, label in zip(bboxes, labels)
    ]
}
Key points:
  • bbox uses [xmin, ymin, width, height] format — not the [xmin, ymin, xmax, ymax] format stored in the XML.
  • area is computed as width × height of the bounding box.
  • iscrowd is always 0; MammoMix does not use crowd annotations.
  • category_id maps directly to class_id from xml2dicts (always 0).
This dictionary is passed directly to image_processor(images=image, annotations=annotations, ...), which handles resizing, padding, and normalisation before the tensor is fed to the model.

Build docs developers (and LLMs) love