Validate and Fix Paired Dataset Consistency Issues

Paired dataset training requires that every image in clean/ has a corresponding file in degraded/ and that the spatial dimensions of each pair satisfy the expected scale relationship. When a generation run is interrupted — whether by a crash, a manual stop, or a Krita restart — some images may be written to one folder but not the other, or may be written with incorrect dimensions. The fix-images.mjs script scans both folders, identifies every mismatch, and optionally deletes the offending files before you start training.

Why Validation Matters

Missing pairs — A clean image without a degraded counterpart (or vice versa) will cause most training data loaders to raise a file-not-found error or silently skip batches.
Scale mismatches — For upscale models, the clean image must be exactly scale times the width and height of the degraded image. If checkSizes: true was not active during generation, dimension mismatches can slip through and corrupt a training batch.
Safe resume — Running fix-images.mjs before re-enabling resume: true ensures the generator picks up from a consistent state rather than re-processing a partially written image.

The `fix-images.mjs` Script

The script reads the union of filenames from datasets/<name>/clean/ and datasets/<name>/degraded/, then for each filename:

Checks whether the file exists in both folders. If either is missing, the pair is flagged.
Reads the pixel dimensions of both files using sharp.
Verifies that clean.width === degraded.width × scale and clean.height === degraded.height × scale. If either dimension check fails, the pair is flagged.

After scanning all files, it prints a summary line and a blank line:

Total mismatched files: N / M (P%)

Flags

Flag	Required	Description
`--dataset <name>`	Yes	Name of the dataset folder under `datasets/`. The script looks for `datasets/<name>/clean/` and `datasets/<name>/degraded/`.
`--scale <n>`	No	Scale factor to use for dimension checks. Auto-detected from the dataset name by matching `<n>x` or `x<n>` (e.g., `opencomic-ai-upscale-2x` → `2`). Defaults to `1` if no match is found.
`--print`	No	Print the filename and dimensions of each size-mismatched pair. Does not delete anything on its own.
`--delete`	No	Delete both the clean and degraded files for every flagged pair.

--print and --delete can be combined: mismatched filenames are printed and then deleted in the same run.

--delete is irreversible. The files are removed with fs.unlinkSync and cannot be recovered unless you have a backup. Always run --print first to review what will be deleted before passing --delete.

Inspection-and-Delete Workflow

Run a print-only scan

First, inspect the dataset without deleting anything. This shows you every size-mismatched filename along with the recorded clean and degraded dimensions:

node fix-images.mjs --dataset opencomic-ai-upscale-2x --print

Example output:

Dataset : opencomic-ai-upscale-2x
Scale   : 2
Print   : true
Delete  : false

Mismatched size: 00001423.jpg (clean: 640x480, degraded: 640x480)
Mismatched size: 00001891.jpg (clean: 500x500, degraded: 249x249)

Total mismatched files: 2 / 98432 (0.00%)

Review the output

Examine each flagged file. Mismatches fall into two categories:

Equal dimensions on both sides for an upscale dataset — the degraded image was not downscaled, likely because generation was interrupted mid-step.
Off-by-one or larger dimension mismatch — the resize kernel produced a slightly different output size for the clean and degraded paths.

If the mismatch count is a small fraction of the total (as in the example above), deletion is safe. If a large percentage is flagged, investigate whether the --scale value is correct before proceeding.

Delete mismatched pairs

Once you are satisfied with the list, re-run with --delete to remove the flagged pairs:

node fix-images.mjs --dataset opencomic-ai-upscale-2x --scale 2 --delete

Each deleted filename is printed as it is removed:

Deleted: 00001423.jpg
Deleted: 00001891.jpg

Total mismatched files: 2 / 98432 (0.00%)

Verify the dataset is clean

Run the print-only scan one more time to confirm zero mismatches remain:

node fix-images.mjs --dataset opencomic-ai-upscale-2x --print

Total mismatched files: 0 / 98430 (0.00%)

Example Commands

# Inspect an upscale-2x dataset (scale auto-detected from name)
node fix-images.mjs --dataset opencomic-ai-upscale-2x --print

# Delete mismatched pairs from an upscale-2x dataset with explicit scale
node fix-images.mjs --dataset opencomic-ai-upscale-2x --scale 2 --delete

# Inspect an artifact-removal dataset (scale 1 — clean and degraded are the same size)
node fix-images.mjs --dataset opencomic-ai-artifact-removal --scale 1 --print

# Inspect and delete in one pass for a descreen dataset
node fix-images.mjs --dataset opencomic-ai-descreen-hard --scale 1 --print --delete

Get Started

Concepts

Guides

Models

Validate and Fix Paired Dataset Consistency Issues

Why Validation Matters

The `fix-images.mjs` Script

Flags

Inspection-and-Delete Workflow

Example Commands

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Models

Documentation Index

​Why Validation Matters

​The fix-images.mjs Script

​Flags

​Inspection-and-Delete Workflow

​Example Commands

Build docs developers (and LLMs) love

Why Validation Matters

The `fix-images.mjs` Script

Flags

Inspection-and-Delete Workflow

Example Commands