Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/ollm/opencomic-ai-training/llms.txt

Use this file to discover all available pages before exploring further.

Paired dataset training requires that every image in clean/ has a corresponding file in degraded/ and that the spatial dimensions of each pair satisfy the expected scale relationship. When a generation run is interrupted — whether by a crash, a manual stop, or a Krita restart — some images may be written to one folder but not the other, or may be written with incorrect dimensions. The fix-images.mjs script scans both folders, identifies every mismatch, and optionally deletes the offending files before you start training.

Why Validation Matters

  • Missing pairs — A clean image without a degraded counterpart (or vice versa) will cause most training data loaders to raise a file-not-found error or silently skip batches.
  • Scale mismatches — For upscale models, the clean image must be exactly scale times the width and height of the degraded image. If checkSizes: true was not active during generation, dimension mismatches can slip through and corrupt a training batch.
  • Safe resume — Running fix-images.mjs before re-enabling resume: true ensures the generator picks up from a consistent state rather than re-processing a partially written image.

The fix-images.mjs Script

The script reads the union of filenames from datasets/<name>/clean/ and datasets/<name>/degraded/, then for each filename:
  1. Checks whether the file exists in both folders. If either is missing, the pair is flagged.
  2. Reads the pixel dimensions of both files using sharp.
  3. Verifies that clean.width === degraded.width × scale and clean.height === degraded.height × scale. If either dimension check fails, the pair is flagged.
After scanning all files, it prints a summary line and a blank line:
Total mismatched files: N / M (P%)

Flags

FlagRequiredDescription
--dataset <name>YesName of the dataset folder under datasets/. The script looks for datasets/<name>/clean/ and datasets/<name>/degraded/.
--scale <n>NoScale factor to use for dimension checks. Auto-detected from the dataset name by matching <n>x or x<n> (e.g., opencomic-ai-upscale-2x2). Defaults to 1 if no match is found.
--printNoPrint the filename and dimensions of each size-mismatched pair. Does not delete anything on its own.
--deleteNoDelete both the clean and degraded files for every flagged pair.
--print and --delete can be combined: mismatched filenames are printed and then deleted in the same run.
--delete is irreversible. The files are removed with fs.unlinkSync and cannot be recovered unless you have a backup. Always run --print first to review what will be deleted before passing --delete.

Inspection-and-Delete Workflow

1

Run a print-only scan

First, inspect the dataset without deleting anything. This shows you every size-mismatched filename along with the recorded clean and degraded dimensions:
node fix-images.mjs --dataset opencomic-ai-upscale-2x --print
Example output:
Dataset : opencomic-ai-upscale-2x
Scale   : 2
Print   : true
Delete  : false

Mismatched size: 00001423.jpg (clean: 640x480, degraded: 640x480)
Mismatched size: 00001891.jpg (clean: 500x500, degraded: 249x249)

Total mismatched files: 2 / 98432 (0.00%)

2

Review the output

Examine each flagged file. Mismatches fall into two categories:
  • Equal dimensions on both sides for an upscale dataset — the degraded image was not downscaled, likely because generation was interrupted mid-step.
  • Off-by-one or larger dimension mismatch — the resize kernel produced a slightly different output size for the clean and degraded paths.
If the mismatch count is a small fraction of the total (as in the example above), deletion is safe. If a large percentage is flagged, investigate whether the --scale value is correct before proceeding.
3

Delete mismatched pairs

Once you are satisfied with the list, re-run with --delete to remove the flagged pairs:
node fix-images.mjs --dataset opencomic-ai-upscale-2x --scale 2 --delete
Each deleted filename is printed as it is removed:
Deleted: 00001423.jpg
Deleted: 00001891.jpg

Total mismatched files: 2 / 98432 (0.00%)

4

Verify the dataset is clean

Run the print-only scan one more time to confirm zero mismatches remain:
node fix-images.mjs --dataset opencomic-ai-upscale-2x --print
Total mismatched files: 0 / 98430 (0.00%)

Example Commands

# Inspect an upscale-2x dataset (scale auto-detected from name)
node fix-images.mjs --dataset opencomic-ai-upscale-2x --print

# Delete mismatched pairs from an upscale-2x dataset with explicit scale
node fix-images.mjs --dataset opencomic-ai-upscale-2x --scale 2 --delete

# Inspect an artifact-removal dataset (scale 1 — clean and degraded are the same size)
node fix-images.mjs --dataset opencomic-ai-artifact-removal --scale 1 --print

# Inspect and delete in one pass for a descreen dataset
node fix-images.mjs --dataset opencomic-ai-descreen-hard --scale 1 --print --delete

Build docs developers (and LLMs) love