Paired dataset training requires that every image inDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/ollm/opencomic-ai-training/llms.txt
Use this file to discover all available pages before exploring further.
clean/ has a corresponding file in degraded/ and that the spatial dimensions of each pair satisfy the expected scale relationship. When a generation run is interrupted — whether by a crash, a manual stop, or a Krita restart — some images may be written to one folder but not the other, or may be written with incorrect dimensions. The fix-images.mjs script scans both folders, identifies every mismatch, and optionally deletes the offending files before you start training.
Why Validation Matters
- Missing pairs — A clean image without a degraded counterpart (or vice versa) will cause most training data loaders to raise a file-not-found error or silently skip batches.
- Scale mismatches — For upscale models, the clean image must be exactly
scaletimes the width and height of the degraded image. IfcheckSizes: truewas not active during generation, dimension mismatches can slip through and corrupt a training batch. - Safe resume — Running
fix-images.mjsbefore re-enablingresume: trueensures the generator picks up from a consistent state rather than re-processing a partially written image.
The fix-images.mjs Script
The script reads the union of filenames from datasets/<name>/clean/ and datasets/<name>/degraded/, then for each filename:
- Checks whether the file exists in both folders. If either is missing, the pair is flagged.
- Reads the pixel dimensions of both files using
sharp. - Verifies that
clean.width === degraded.width × scaleandclean.height === degraded.height × scale. If either dimension check fails, the pair is flagged.
Flags
| Flag | Required | Description |
|---|---|---|
--dataset <name> | Yes | Name of the dataset folder under datasets/. The script looks for datasets/<name>/clean/ and datasets/<name>/degraded/. |
--scale <n> | No | Scale factor to use for dimension checks. Auto-detected from the dataset name by matching <n>x or x<n> (e.g., opencomic-ai-upscale-2x → 2). Defaults to 1 if no match is found. |
--print | No | Print the filename and dimensions of each size-mismatched pair. Does not delete anything on its own. |
--delete | No | Delete both the clean and degraded files for every flagged pair. |
--print and --delete can be combined: mismatched filenames are printed and then deleted in the same run.
Inspection-and-Delete Workflow
Run a print-only scan
First, inspect the dataset without deleting anything. This shows you every size-mismatched filename along with the recorded clean and degraded dimensions:Example output:
Review the output
Examine each flagged file. Mismatches fall into two categories:
- Equal dimensions on both sides for an upscale dataset — the degraded image was not downscaled, likely because generation was interrupted mid-step.
- Off-by-one or larger dimension mismatch — the resize kernel produced a slightly different output size for the clean and degraded paths.
--scale value is correct before proceeding.Delete mismatched pairs
Once you are satisfied with the list, re-run with Each deleted filename is printed as it is removed:
--delete to remove the flagged pairs: