utils.py — configuration and annotation helpers

utils.py provides thin, stateless helper functions used across the MammoMix training and evaluation scripts. Functions cover YAML config loading, image processor initialisation, model architecture detection, and VOC XML annotation parsing.

`load_config`

Loads a YAML configuration file and returns its contents as a Python dictionary.

from utils import load_config

config = load_config("configs/train_config.yaml")
print(config["model_name"])  # "hustvl/yolos-base"

Parameters

config_path

string

default:"configs/train_config.yaml"

Path to the YAML file to load. Passed directly to open(), so both relative and absolute paths are accepted.

Returns

config

dict

Parsed YAML document as a Python dictionary. Keys and value types reflect the structure of the config file.

`get_image_processor`

Creates and returns an AutoImageProcessor configured for fixed-size padding, which is required by YOLOS and DETR models in MammoMix.

from utils import get_image_processor

image_processor = get_image_processor(
    model_name="hustvl/yolos-base",
    max_size=640,
)

Parameters

model_name

string

required

HuggingFace model identifier or local path. Used as the source for the processor configuration. Supports any checkpoint compatible with AutoImageProcessor.from_pretrained.

max_size

int

required

Maximum height and width in pixels. Applied to both size (for resizing) and pad_size (for padding), so all images are resized to fit within a max_size × max_size canvas and padded to exactly that resolution.

Returns

image_processor

AutoImageProcessor

Processor instance with the following flags set:

Flag	Value
`do_resize`	`True`
`do_pad`	`True`
`use_fast`	`True`
`size`	`{"max_height": max_size, "max_width": max_size}`
`pad_size`	`{"height": max_size, "width": max_size}`

`get_model_type`

Infers the model architecture family from the model identifier string.

from utils import get_model_type

get_model_type("hustvl/yolos-base")          # "yolos"
get_model_type("facebook/detr-resnet-50")    # "detr"

Parameters

model_name

string

required

HuggingFace model identifier or local directory name. The function checks whether the string contains the substring "yolos" (case-sensitive).

Returns

model_type

string

"yolos" if "yolos" is found in model_name; otherwise "detr".

`parse_voc_xml`

Parses a Pascal VOC XML annotation file and extracts image metadata and bounding box coordinates.

from utils import parse_voc_xml

data = parse_voc_xml("splits/CSAW/labels/img001.xml")
# {
#   "image_name": "img001.jpg",
#   "width": 1024,
#   "height": 768,
#   "bboxes": [
#     {"class": "cancer", "xmin": 120.0, "ymin": 80.0, "xmax": 340.0, "ymax": 260.0}
#   ]
# }

Parameters

xml_path

string

required

Absolute or relative path to a Pascal VOC XML file. The file must have a standard VOC structure with <filename>, <size>, and one or more <object> elements.

Returns

annotation

dict

Parsed annotation data.

Show annotation fields

image_name

string

Value of the <filename> element in the XML.

width

int

Image width in pixels from the <size> element.

height

int

Image height in pixels from the <size> element.

bboxes

list[dict]

List of valid bounding boxes. Each dict has keys "class" (string), "xmin", "ymin", "xmax", "ymax" (all float). Boxes where xmin >= xmax or ymin >= ymax are silently skipped with a console warning.

`xml2dicts`

Converts a list of raw VOC bounding-box dicts (as produced by parse_voc_xml) into the format expected by the DETR image processor.

from utils import parse_voc_xml, xml2dicts

xml_data = parse_voc_xml("splits/CSAW/labels/img001.xml")
detr_boxes = xml2dicts(xml_data["bboxes"], xml_data["width"], xml_data["height"])
# [{"class_id": 0, "xmin": 120.0, "ymin": 80.0, "xmax": 340.0, "ymax": 260.0}]

width and height are accepted as parameters for future normalisation use, but are not applied in the current implementation. All coordinates remain in absolute pixel space.

Parameters

bboxes

list[dict]

required

List of bounding-box dicts as returned by parse_voc_xml. Each dict must contain "xmin", "ymin", "xmax", and "ymax" keys.

width

int

required

Image width in pixels. Accepted for interface consistency; not used for coordinate normalisation.

height

int

required

Image height in pixels. Accepted for interface consistency; not used for coordinate normalisation.

Returns

detr_bboxes

list[dict]

List of annotation dicts, one per input bounding box.

Show dict fields

class_id

int

Always 0 (the single cancer class used in MammoMix).

xmin

float

Left edge of the bounding box in absolute pixels.

ymin

float

Top edge of the bounding box in absolute pixels.

xmax

float

Right edge of the bounding box in absolute pixels.

ymax

float

Bottom edge of the bounding box in absolute pixels.

Core Modules

Ensemble & Post-processing

utils.py — configuration and annotation helpers

`load_config`

Parameters

Returns

`get_image_processor`

Parameters

Returns

`get_model_type`

Parameters

Returns

`parse_voc_xml`

Parameters

Returns

`xml2dicts`

Parameters

Returns

Build docs developers (and LLMs) love

Core Modules

Ensemble & Post-processing

Documentation Index

​load_config

​Parameters

​Returns

​get_image_processor

​Parameters

​Returns

​get_model_type

​Parameters

​Returns

​parse_voc_xml

​Parameters

​Returns

​xml2dicts

​Parameters

​Returns

Build docs developers (and LLMs) love

`load_config`

Parameters

Returns

`get_image_processor`

Parameters

Returns

`get_model_type`

Parameters

Returns

`parse_voc_xml`

Parameters

Returns

`xml2dicts`

Parameters

Returns