Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/tommyngx/MammoMix/llms.txt

Use this file to discover all available pages before exploring further.

utils.py provides thin, stateless helper functions used across the MammoMix training and evaluation scripts. Functions cover YAML config loading, image processor initialisation, model architecture detection, and VOC XML annotation parsing.

load_config

Loads a YAML configuration file and returns its contents as a Python dictionary.
from utils import load_config

config = load_config("configs/train_config.yaml")
print(config["model_name"])  # "hustvl/yolos-base"

Parameters

config_path
string
default:"configs/train_config.yaml"
Path to the YAML file to load. Passed directly to open(), so both relative and absolute paths are accepted.

Returns

config
dict
Parsed YAML document as a Python dictionary. Keys and value types reflect the structure of the config file.

get_image_processor

Creates and returns an AutoImageProcessor configured for fixed-size padding, which is required by YOLOS and DETR models in MammoMix.
from utils import get_image_processor

image_processor = get_image_processor(
    model_name="hustvl/yolos-base",
    max_size=640,
)

Parameters

model_name
string
required
HuggingFace model identifier or local path. Used as the source for the processor configuration. Supports any checkpoint compatible with AutoImageProcessor.from_pretrained.
max_size
int
required
Maximum height and width in pixels. Applied to both size (for resizing) and pad_size (for padding), so all images are resized to fit within a max_size × max_size canvas and padded to exactly that resolution.

Returns

image_processor
AutoImageProcessor
Processor instance with the following flags set:
FlagValue
do_resizeTrue
do_padTrue
use_fastTrue
size{"max_height": max_size, "max_width": max_size}
pad_size{"height": max_size, "width": max_size}

get_model_type

Infers the model architecture family from the model identifier string.
from utils import get_model_type

get_model_type("hustvl/yolos-base")          # "yolos"
get_model_type("facebook/detr-resnet-50")    # "detr"

Parameters

model_name
string
required
HuggingFace model identifier or local directory name. The function checks whether the string contains the substring "yolos" (case-sensitive).

Returns

model_type
string
"yolos" if "yolos" is found in model_name; otherwise "detr".

parse_voc_xml

Parses a Pascal VOC XML annotation file and extracts image metadata and bounding box coordinates.
from utils import parse_voc_xml

data = parse_voc_xml("splits/CSAW/labels/img001.xml")
# {
#   "image_name": "img001.jpg",
#   "width": 1024,
#   "height": 768,
#   "bboxes": [
#     {"class": "cancer", "xmin": 120.0, "ymin": 80.0, "xmax": 340.0, "ymax": 260.0}
#   ]
# }

Parameters

xml_path
string
required
Absolute or relative path to a Pascal VOC XML file. The file must have a standard VOC structure with <filename>, <size>, and one or more <object> elements.

Returns

annotation
dict
Parsed annotation data.

xml2dicts

Converts a list of raw VOC bounding-box dicts (as produced by parse_voc_xml) into the format expected by the DETR image processor.
from utils import parse_voc_xml, xml2dicts

xml_data = parse_voc_xml("splits/CSAW/labels/img001.xml")
detr_boxes = xml2dicts(xml_data["bboxes"], xml_data["width"], xml_data["height"])
# [{"class_id": 0, "xmin": 120.0, "ymin": 80.0, "xmax": 340.0, "ymax": 260.0}]
width and height are accepted as parameters for future normalisation use, but are not applied in the current implementation. All coordinates remain in absolute pixel space.

Parameters

bboxes
list[dict]
required
List of bounding-box dicts as returned by parse_voc_xml. Each dict must contain "xmin", "ymin", "xmax", and "ymax" keys.
width
int
required
Image width in pixels. Accepted for interface consistency; not used for coordinate normalisation.
height
int
required
Image height in pixels. Accepted for interface consistency; not used for coordinate normalisation.

Returns

detr_bboxes
list[dict]
List of annotation dicts, one per input bounding box.

Build docs developers (and LLMs) love