Training on a single mammography dataset risks producing models that overfit to that dataset’s imaging protocol, scanner characteristics, and patient demographics.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/tommyngx/MammoMix/llms.txt
Use this file to discover all available pages before exploring further.
momo.py merges CSAW, DMID, and DDSM into one unified dataset so that a model can learn features that generalise across acquisition conditions.
merge_datasets function
momo.py
| Parameter | Type | Description |
|---|---|---|
input_dir | str | Parent directory that contains the CSAW, DMID, and DDSM split folders (output of splitting.py). |
output_name | str | Name of the merged output folder, created inside input_dir. |
The list of datasets (
['CSAW', 'DMID', 'DDSM']) is hardcoded inside merge_datasets. Any dataset folder not present in input_dir is skipped with a warning; it does not cause an error.CLI usage
| Flag | Required | Description |
|---|---|---|
--input_dir | Yes | Path to the directory containing the individual dataset split folders. |
--name | Yes | Name for the merged output folder. |
input_dir/MammoMix_merged/ (i.e. alongside the source dataset folders).
What it does
Create output directory structure
Directories for
train, val, and test splits are created under input_dir/output_name/, each containing images/ and labels/ subdirectories.Copy files with dataset prefix
Every image and label file is copied to the merged split directory with its source dataset name prepended to the filename, preventing collisions between datasets that share identical filenames.For example,
momo.py
image_001.jpg from CSAW becomes CSAW_image_001.jpg in the merged folder, while the same filename from DDSM becomes DDSM_image_001.jpg.