MammoMix uses a single YAML file to control every aspect of training: dataset location, model selection, HuggingFaceDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/tommyngx/MammoMix/llms.txt
Use this file to discover all available pages before exploring further.
TrainingArguments, and Weights & Biases logging. Both train.py (YOLOS) and train_detrd.py (Deformable DETR) read the same YAML schema via utils.load_config. The config file is the single source of truth; CLI flags --config, --dataset, and --epoch can override individual values without editing the file.
CLI flags take precedence over YAML values.
--dataset overrides dataset.name and --epoch overrides training.epochs. All other fields must be changed in the YAML file directly.Full example: config_yolos.yaml
config_yolos.yaml
dataset section
Controls which dataset is loaded and how images are pre-processed before entering the model.
Dataset identifier. Accepted values are
CSAW, DMID, and DDSM. This value is passed as dataset_name to BreastCancerDataset and is also used to name W&B runs and model save directories. Overridden by the --dataset CLI flag.Absolute or relative path to the directory that contains the split
.txt files and the image subdirectories. train.py reads {splits_dir}/train, {splits_dir}/val, and {splits_dir}/test from this root.Maximum image dimension (height and width) used by
AutoImageProcessor when resizing and padding. Images are resized so that their longest side equals max_size, then zero-padded to a square of max_size × max_size. Use 800 for Deformable DETR to match its standard input resolution.model section
HuggingFace Hub model ID loaded by
AutoModelForObjectDetection.from_pretrained. Common values:| Model | ID |
|---|---|
| YOLOS-base | hustvl/yolos-base |
| Deformable DETR | SenseTime/deformable-detr |
| DETR-ResNet-50 | facebook/detr-resnet-50 |
utils.get_model_type uses this string to determine whether the dataset loader should return YOLOS or DETR-style pixel value tensors.data section
Controls subdirectory names and DataLoader settings. These values are relative to dataset.splits_dir.
Name of the training split subdirectory under
splits_dir.Name of the validation split subdirectory under
splits_dir.Name of the test split subdirectory under
splits_dir. Used only for the final evaluation after training.DataLoader batch size. Note that
training.batch_size is the value actually passed to TrainingArguments; this field is informational and may be used by custom DataLoader construction code.Number of DataLoader worker processes for the
torch.utils.data.DataLoader constructed in train.py. Set to 0 when using Deformable DETR to avoid shared-memory conflicts.Target image size used within the dataset loader prior to processor resizing. Acts as a pre-resize step before
max_size is applied.training section
All keys under training map directly to HuggingFace TrainingArguments parameters.
Directory where HuggingFace
Trainer writes intermediate checkpoints. This is separate from the final model save path (../yolos_{DATASET}_{DDMMYY}).Total number of training epochs. Overridden by the
--epoch CLI flag.Per-device training and evaluation batch size passed to
TrainingArguments as per_device_train_batch_size and per_device_eval_batch_size.Peak learning rate for the optimizer. The Deformable DETR pipeline overrides this in code to
0.0005; set it explicitly in your config if you want a different value.L2 regularization coefficient applied to all non-bias parameters.
Fraction of total training steps used for linear learning-rate warmup.
0.05 means 5 % of all steps warm up from 0 to learning_rate.Learning rate schedule. Accepted values are any HuggingFace
SchedulerType string, e.g. cosine, cosine_with_restarts, linear, constant.Extra keyword arguments forwarded to the scheduler factory. For
cosine_with_restarts, set num_cycles to control how many cosine cycles run over the training duration.When
True, evaluation batches are concatenated before metric computation. Set to False (default) to compute metrics per-batch and average, which is more memory efficient.When to run evaluation.
epoch evaluates after every epoch; steps evaluates every eval_steps steps.When to save checkpoints. Must match
evaluation_strategy when load_best_model_at_end=True.Maximum number of checkpoints to keep on disk. Older checkpoints are deleted automatically. The Deformable DETR config uses
2 to retain the previous-best checkpoint as a safety net.When to log training metrics. Use
steps together with logging_steps for finer-grained W&B curves (the Deformable DETR pipeline logs every 10 steps).When
True, the trainer reloads the best checkpoint at the end of training before saving the final model and running test evaluation.Validation metric used to rank checkpoints. Use
eval_map_50 for YOLOS (computed by the custom compute_metrics function). Use eval_loss for Deformable DETR (since compute_metrics is not attached during training).Set to
True when the best model has the highest metric value (e.g. eval_map_50). Set to False when the best model has the lowest value (e.g. eval_loss).dataloader_num_workers passed to TrainingArguments. Controls the number of worker processes used by Trainer’s internal DataLoader (distinct from data.num_workers).Number of forward passes to accumulate gradients over before performing an optimizer step. Effective batch size =
batch_size × gradient_accumulation_steps. The Deformable DETR pipeline hardcodes this to 32.When
False, HuggingFace Trainer passes all dataset columns to the model. Must be False for MammoMix because the dataset returns custom keys (pixel_values, labels) that are not automatically recognized.logging section
Enable or disable Weights & Biases integration. When
True, report_to is set to include wandb in the training arguments.W&B project name. All runs for a given config are grouped under this project in the W&B dashboard.
Step interval for logging. Used as a reference value; the actual
logging_steps in TrainingArguments is set independently in training.logging_strategy.wandb section
Local filesystem path where W&B stores run artifacts, offline logs, and sync cache. Passed to
TrainingArguments as logging_dir. If omitted, W&B defaults to ./wandb in the working directory.deformable_detr section
This section is read by train_detrd.py only when loading the model. Only num_queries is currently consumed from this section.
Number of object queries for Deformable DETR. Passed to
AutoModelForObjectDetection.from_pretrained during model loading in train_detrd.py.Other keys under
deformable_detr (num_feature_levels, dec_n_points, enc_n_points, with_box_refine, two_stage) are present in config_d_detr.yaml but are not read by train_detrd.py. The training hyperparameters for Deformable DETR are hardcoded in create_deformable_training_args and override the training section values.Top-level seed
Random seed documented in the config file for reference. Note:
train.py does not currently read or apply this value — reproducibility in the data split is handled by the fixed random_state=42 in splitting.py.