OpenComic AI Training Generated Dataset Folder Structure

Every dataset generated by OpenComic AI Training follows a fixed three-directory layout under a named root folder. The structure is deliberately minimal and compatible with any training framework that accepts paired image datasets — including traiNNer-redux, which is what the OpenComic AI models are trained with.

Directory Tree

datasets/
└── <dataset-name>/
    ├── clean/
    │   ├── 0000000001-0001.jpg
    │   ├── 0000000001-0002.jpg
    │   ├── 0000000002-0001.jpg
    │   └── ...
    ├── degraded/
    │   ├── 0000000001-0001.jpg
    │   ├── 0000000001-0002.jpg
    │   ├── 0000000002-0001.jpg
    │   └── ...
    └── options/
        ├── 0000000001-0001.json
        ├── 0000000001-0002.json
        ├── 0000000002-0001.json
        └── ...

Subdirectories

`clean/`

Contains the ground-truth images rendered at full resolution by Krita. For upscaling tasks these are the high-resolution targets that the model learns to produce. For artifact removal and descreening tasks the clean images share the same resolution as the degraded images — they are simply the version of the canvas before halftone or compression was applied.

`degraded/`

Contains the synthetically degraded input images. Depending on the options preset in use:

Upscaling — the degraded image is scale times smaller than the clean image. For example, with scale: 0.5 (2× upscale), a 1000×1000 clean image is paired with a 500×500 degraded image.
Artifact removal / descreening — the degraded image is the same resolution as the clean image. Quality is reduced through JPEG/WebP/AVIF/JXL compression, blur, or halftone patterns rather than a spatial downscale.

`options/`

Contains one JSON file per image pair, capturing the fully resolved (randomized) options that were used to generate that specific pair. This enables exact reproduction of any individual training sample and is useful for debugging unexpected model behavior. The options/ directory is only written when output.options is set in the degradation block.

The options/ directory can grow large for datasets with many images and a high degradedImagesPerCleanImage count. If disk space is a concern, omit the output.options key from your options file.

File Naming Convention

Every file uses a zero-padded two-part numeric name:

<image-number>-<degradation-index>.<format>

<image-number> — 10-digit zero-padded index of the clean Krita render (e.g., 0000000001).
<degradation-index> — 4-digit zero-padded index of the degraded variant derived from that render (e.g., 0001).
<format> — jpg or png, as set by the top-level format key.

For example, image 7, degraded variant 3 → 0000000007-0003.jpg. The clean and degraded files for a given pair share the same name, so a training dataloader can pair them by filename without any manifest file.

Configuring Output Paths

Output paths are declared per-degradation inside the degradations list of the options file. Each degradation block has its own independent output sub-object, which means a single options file can write to multiple datasets simultaneously.

degradations:
  - name: opencomic-ai-artifact-removal
    output:
      clean:    datasets/opencomic-ai-artifact-removal/clean
      degraded: datasets/opencomic-ai-artifact-removal/degraded
      options:  datasets/opencomic-ai-artifact-removal/options
    inKrita:
      # ...
    inNode:
      # ...

All three output directories (clean, degraded, options) are created automatically if they do not already exist. You do not need to create them manually before running the generator.

Clean vs. Degraded Size Relationship

The spatial relationship between clean and degraded images depends on whether the resize step in inNode uses both: false (resize degraded only) or both: true (resize both):

Task	`both`	Clean size	Degraded size
Upscale 2×	`false`	1000×1000	500×500
Artifact removal	`true`	700×700	700×700
Descreen	`true`	700×700	700×700

For upscaling presets, the checkSizes flag can be enabled to discard pairs where the clean and degraded dimensions do not satisfy the expected ratio. The ratio is determined by base.size.multiple — checkSizes passes only when clean width == degraded width × multiple (and the same for height). This is useful when different resize kernels are used for clean and degraded images, which can occasionally produce slightly different output sizes:

degradations:
  - name: opencomic-ai-upscale-2x
    output:
      clean:    datasets/opencomic-ai-upscale-2x/clean
      degraded: datasets/opencomic-ai-upscale-2x/degraded
    inNode:
      - type: resize
        prob: 1
        both: false      # Only the degraded image is downscaled
        scale: 0.5
        kernel: [linear, cubic, mitchell, lanczos2, lanczos3]
    checkSizes: true     # Discard pairs where clean != degraded × base.size.multiple

Validating an Existing Dataset

Use the bundled fix-images.mjs script to verify that all paired images have the expected scale factor and that no files are missing from either the clean or degraded folder:

node fix-images.mjs --dataset opencomic-ai-upscale-2x --scale 2 --print

Pass --delete to automatically remove any unmatched files.

Get Started

Concepts

Guides

Models

OpenComic AI Training Generated Dataset Folder Structure

Directory Tree

Subdirectories

`clean/`

`degraded/`

`options/`

File Naming Convention

Configuring Output Paths

Clean vs. Degraded Size Relationship

Validating an Existing Dataset

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Models

Documentation Index

​Directory Tree

​Subdirectories

​clean/

​degraded/

​options/

​File Naming Convention

​Configuring Output Paths

​Clean vs. Degraded Size Relationship

​Validating an Existing Dataset

Build docs developers (and LLMs) love

Directory Tree

Subdirectories

`clean/`

`degraded/`

`options/`

File Naming Convention

Configuring Output Paths

Clean vs. Degraded Size Relationship

Validating an Existing Dataset