Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/tommyngx/MammoMix/llms.txt

Use this file to discover all available pages before exploring further.

momo.py merges three independent mammography datasets — CSAW, DMID, and DDSM — into a single YOLO-style directory tree. Each file is copied with a dataset-name prefix to avoid filename collisions, and per-split .txt manifests are updated to point to the merged paths.

merge_datasets

Copies all images and labels from each source dataset into a new unified directory, prefixing every filename with its source dataset name, and writes merged train.txt, val.txt, and test.txt manifests.
from momo import merge_datasets

merge_datasets(
    input_dir="/data/mammography",
    output_name="MammoMix",
)
# Creates /data/mammography/MammoMix/

Parameters

input_dir
string
required
Path to the parent directory that contains the individual dataset folders (CSAW/, DMID/, DDSM/). The merged output directory is also created here.
output_name
string
required
Name of the output folder to create inside input_dir. For example, passing "MammoMix" creates {input_dir}/MammoMix/.

Side effects

The function creates the following directory structure under {input_dir}/{output_name}/:
{output_name}/
├── train/
│   ├── images/      # prefixed image files from all datasets
│   └── labels/      # prefixed label files from all datasets
├── val/
│   ├── images/
│   └── labels/
├── test/
│   ├── images/
│   └── labels/
├── train.txt        # absolute paths to merged training images
├── val.txt
└── test.txt
File prefixing. Every copied file is renamed to {DATASET}_{original_filename}. For example, img001.jpg from CSAW becomes CSAW_img001.jpg. This applies to both image and label files. Manifest lines. Each line in the merged .txt files is an absolute path of the form {output_root}/{split}/images/{DATASET}_{filename}. Missing datasets. If a source dataset folder does not exist inside input_dir, that dataset is skipped with a console warning and processing continues. Missing splits. If a specific {dataset}/{split}/images or {dataset}/{split}/labels directory is absent, that split for that dataset is skipped with a console warning.

Example — directory layout before merging

/data/mammography/
├── CSAW/
│   ├── train/images/   ├── train/labels/   ├── val/...  ├── test/...
│   ├── train.txt
│   ├── val.txt
│   └── test.txt
├── DMID/  (same structure)
└── DDSM/  (same structure)

Example — directory layout after merging

/data/mammography/
└── MammoMix/
    ├── train/
    │   ├── images/
    │   │   ├── CSAW_img001.jpg
    │   │   ├── DMID_scan042.png
    │   │   └── DDSM_case007.jpg
    │   └── labels/
    │       ├── CSAW_img001.xml
    │       ├── DMID_scan042.xml
    │       └── DDSM_case007.xml
    ├── val/  ...
    ├── test/ ...
    ├── train.txt
    ├── val.txt
    └── test.txt

CLI usage

momo.py is also executable as a standalone script via argparse.
python momo.py --input_dir <path> --name <output_name>

Arguments

--input_dir
string
required
Path to the parent directory containing CSAW/, DMID/, and DDSM/ subdirectories. Passed directly to merge_datasets(input_dir=...).
--name
string
required
Name for the output merged dataset folder created inside input_dir. Passed directly to merge_datasets(output_name=...).

Example

python momo.py --input_dir /data/mammography --name MammoMix
This command creates /data/mammography/MammoMix/ with the merged directory structure described above.

Build docs developers (and LLMs) love