OpenComic AI Training: Ethical Dataset Generation Pipeline

OpenComic AI Training is a Node.js-based pipeline that procedurally generates paired training datasets inside Krita, purpose-built for training comic-focused AI enhancement models. Every image pair is synthesized from scratch at runtime — no real comic pages, no scraped artwork, no copyright risk — making it one of the few dataset generators in the AI image-enhancement space that is fully ethical and reproducible by design.

All training data is generated procedurally and synthetically inside Krita using randomized drawing routines. No real comics, scanned pages, or copyrighted artwork are ever used. This makes OpenComic AI Training safe for open-source model distribution and downstream commercial use.

What Is OpenComic AI Training?

OpenComic AI Training is the dataset generation layer of the OpenComic AI project. It drives Krita programmatically through the kra-remote plugin, instructing it to procedurally render synthetic comic-style artwork and then apply configurable degradations to produce matched clean / degraded image pairs. Those pairs are fed directly into any paired-image training framework — the project ships ready-made options for traiNNer-redux, but the output format is framework-agnostic. The generator is configured through YAML options files called presets. A single preset fully describes what kind of artwork to draw, what degradations to apply, how many image pairs to produce, and where to write the output.

The Three Supported Tasks

The pipeline ships presets for three distinct model tasks, each targeting a different class of comic image quality problem: Artifact Removal — Trains models to clean up compression artifacts, banding, ringing, and other encoding noise commonly found in JPEG- or WebP-compressed comic pages. The degradation pipeline adds layered compression artifacts to clean synthetic panels. Descreening (Halftone Removal) — Trains models to remove halftone dot patterns — the regular grid of colored or grayscale dots used in traditional print comics. Presets cover hard halftone patterns at fixed angles, hard patterns at any angle, and moiré-only scenarios. Removing halftone patterns without smearing edge detail or destroying linework is notoriously difficult; the synthetic data lets the model learn this precisely. Upscaling (2×, 3×, 4×) — Trains super-resolution models to enlarge comic pages at 2×, 3×, or 4× scale while preserving linework sharpness, flat color regions, and halftone structure. Separate presets are provided for each scale factor, and the trained models chain naturally — the 3× and 4× models are pretrained from the 2× checkpoint.

Pipeline Components

The full generation pipeline involves four moving parts working in sequence:

Component	Role
YAML preset file	Declares artwork style, degradation parameters, image count, and output paths
Node.js generator (`npm run generate`)	Reads the preset, drives Krita over D-Bus via kra-remote, and writes image pairs to disk
Krita + kra-remote plugin	Executes drawing and filter commands issued by the generator; renders and exports PNG frames
`datasets/` output directory	Receives the finished `clean/`, `degraded/`, and `options/` folders ready for training

The generator is written in TypeScript and compiled to ESM + CJS via tsc and rollup. Krita is launched as a subprocess (AppImage or system install) and communicates with the generator through the kra-remote D-Bus plugin. Krita is automatically restarted every N images (default: 20) to guard against memory leaks during long runs.

Where to Go Next

Quickstart

Install dependencies, build the project, and generate your first dataset in under 10 minutes using the built-in upscale-2x preset.

System Requirements

Review OS, Krita version, plugin, and Node.js requirements before you begin.

Pipeline Overview

Understand how the generator, Krita, and the kra-remote plugin fit together end to end.

Models Overview

Explore the trained artifact removal, descreen, and upscale models produced from these datasets.

Get Started

Concepts

Guides

Models

OpenComic AI Training: Ethical Dataset Generation Pipeline

What Is OpenComic AI Training?

The Three Supported Tasks

Pipeline Components

Where to Go Next

Quickstart

System Requirements

Pipeline Overview

Models Overview

Build docs developers (and LLMs) love

Get Started

Concepts

Guides

Models

Documentation Index

​What Is OpenComic AI Training?

​The Three Supported Tasks

​Pipeline Components

​Where to Go Next

Quickstart

System Requirements

Pipeline Overview

Models Overview

Build docs developers (and LLMs) love

What Is OpenComic AI Training?

The Three Supported Tasks

Pipeline Components

Where to Go Next