Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/gratitude5dee/Zap/llms.txt

Use this file to discover all available pages before exploring further.

Zap pipelines follow a creative grammar — each recipe is a directed, ordered sequence of typed steps that carry media from first frame to final artifact. Steps are not arbitrary scripts; they correspond to real provider capabilities (image generation, video animation, upscaling, audio synthesis, and composition). Because every step has an explicit kind, the Zap planner can quote costs before any provider call is made and route each step to the right adapter automatically.

Creative Grammar

The canonical pattern for a generative video recipe is:
InitialFrame -> InitialGen -> InitialGenReViz? -> ExtendGen x N -> Zap.mp4
  1. InitialFrame — generate or supply a reference image that anchors the visual identity.
  2. InitialGen — animate the frame into a base video clip.
  3. InitialGenReViz (optional) — revise or upscale the initial clip before extending.
  4. ExtendGen × N — chain one or more video.extend steps to grow duration.
  5. Zap.mp4 — a stitch step assembles all clips into the final artifact.

Step Kinds

Zap supports 11 step kinds, covering the full media production stack:
KindDescription
image.genCreate a first frame, storyboard, character sheet, or any reference image from a text prompt or existing inputs.
image.editTransform an input image while preserving subject identity — useful for style transfer or inpainting.
video.genAnimate image or prompt inputs into a video clip.
video.extendContinue a clip forward from its last frame. Supports repeat to chain multiple extensions.
video.editRevise an existing clip using a prompt or composition layer.
video.upscaleProduce a higher-resolution version of a clip.
audio.ttsGenerate voiceover narration from a text prompt.
audio.musicGenerate a music track from a style or lyric prompt.
audio.sfxGenerate sound effects to layer into the video.
keyframesExtract, score, or prepare frames for the next step in the pipeline.
stitchCombine all upstream assets into the final Zap artifact (video + optional audio).

Step Fields

id
string
required
Unique identifier for this step within the recipe. Referenced by downstream steps in their inputs list. Must be at least one character. Example: initial_frame.
kind
string
required
The step type. Must be one of the 11 values listed above.
provider
string
The provider adapter to use for this step. Overrides defaults.provider. Common values: mock, gmi, fal. See Providers.
model
string
The specific model to invoke on the provider. Example: fal-ai/flux/dev, seedance-2-0-260128. The planner uses this value to look up per-request or per-second rates for cost estimation.
prompt
string
Path to a Markdown prompt template relative to the recipe root. Example: prompts/initial-gen.md. The template may contain {INPUT_NAME} placeholders that are resolved at run time from the supplied inputs.
inputs
array
List of upstream step IDs whose outputs this step consumes. The Zap runtime resolves these references and passes the media assets to the provider adapter. Example: [initial_frame].
duration_s
number
Target clip duration in seconds. Used by video generation and extension steps. Also used by the cost planner: cost = rate_per_second × duration_s.
candidates
integer
Number of candidate outputs to generate. Range: 1–16. When greater than 1, the best candidate is selected (optionally via RLHF scoring) before passing to the next step.
repeat
object
Controls how many times a video.extend step is expanded at plan time. Contains three sub-fields:
  • min (integer, ≥ 0) — minimum number of extensions, even if extendCount is lower.
  • max (integer, 0–64) — maximum number of extensions allowed. Defaults to 64.
  • default (integer, ≥ 0) — the default extension count when not specified by the caller.
At plan time, expandRepeatSteps expands the step into count = clamp(extendCount, min, max) concrete steps, each with a suffixed ID (extend_gen_1, extend_gen_2, …).
stitch
object
Stitching configuration for stitch-kind steps. See Stitch Configuration below.
tier
string
Processing tier. One of "draft" or "final". Signals to provider adapters whether to use faster, lower-quality rendering or full-quality rendering.
rlhf
boolean | string
Enables reinforcement learning from human feedback scoring for candidate selection. Set to true, false, or "optional".
reference_images
array
List of input image paths or upstream step IDs to pass to the provider as reference images. Used by image.edit and video.gen steps that support image-to-video conditioning.
first_frame
object
Provider-specific configuration for the first-frame anchor. Passed as a free-form object to the adapter and interpreted per-provider. Used when the provider requires explicit first-frame parameters beyond the inputs reference.
extend
object
Extension mode configuration for video.extend steps. Contains one sub-field:
  • mode (string, default: "chain") — how the extension attaches to the source clip. "chain" continues from the last frame of the previous clip; "anchored" holds the first frame of the original clip as a fixed anchor throughout the extension.
audio
object
Provider-specific audio configuration passed as a free-form object to the adapter. Used on audio.tts, audio.music, and audio.sfx steps for model parameters not covered by top-level fields (e.g. voice ID, tempo, style tags).
keyframes
object
Provider-specific keyframe configuration passed as a free-form object to the adapter. Used on keyframes-kind steps to control extraction, scoring, or preparation parameters.
judge
object
Provider-specific judge configuration for automated candidate scoring. Passed as a free-form object to the adapter when candidates is greater than 1 and automated selection is preferred over RLHF.
shared
boolean
When true, the output of this step is shareable across recipe instances (e.g. a common reference frame reused by multiple runs).

Wiring Steps with inputs

The inputs array on each step names the upstream step IDs whose outputs it depends on. The Zap runtime resolves these at execution time and passes the media assets forward:
steps:
  - id: initial_frame
    kind: image.gen
    provider: gmi
    model: fal-ai/flux/dev
    prompt: prompts/initial-frame.md

  - id: initial_gen
    kind: video.gen
    provider: gmi
    model: seedance-2-0-260128
    inputs: [initial_frame]        # consumes the image output of initial_frame
    duration_s: 5
    prompt: prompts/initial-gen.md

  - id: extend_gen
    kind: video.extend
    provider: gmi
    model: seedance-2-0-260128
    inputs: [initial_gen]          # extends the clip produced by initial_gen
    duration_s: 5
    repeat:
      min: 1
      max: 4
      default: 2

  - id: stitch
    kind: stitch
    inputs: [initial_gen, extend_gen]

Stitch Configuration

The stitch field on a stitch-kind step controls how the final video is assembled:
stitch.engine
string
default:"auto"
The composition engine. One of:
  • auto — Zap selects the best available engine automatically.
  • local — ffmpeg-based local stitching; no external service required.
  • hyperframes — HyperFrames cloud composition engine; required for HTML-layer compositions.
stitch.format
string
default:"mp4"
Output container format. "mp4" or "webm".
stitch.quality
string
default:"standard"
Render quality preset. One of "draft", "standard", or "high".
stitch.fps
integer
Output frame rate. Range: 1–120. Omit to use the source clip’s native frame rate.
HyperFrames is only needed when HTML-layer composition is required — for example, rendering lower-thirds, animated overlays, or browser-based visual effects on top of video. Using engine: hyperframes requires a DESIGN.md file in the recipe directory describing the HTML composition layers. If the HyperFrames CLI is unavailable at run time, Zap falls back to the local stitch path and records the fallback on the run step — the recipe will still complete.

Full Multi-Step Pipeline Example

The following recipe generates a sports entrance video from a selfie:
---
zap: world-cup-entrance
version: 1
description: Transform a selfie into a dramatic stadium entrance video.
budget:
  estimate_usd: 1.40
  cap_usd: 5
defaults:
  provider: gmi
  aspect: "9:16"
inputs:
  SELFIE:
    type: image
    label: Your Photo
    hint: Upload a clear front-facing photo.
    required: true
  PLAYER_NAME:
    type: string
    label: Player Name
    required: true
steps:
  - id: initial_frame
    kind: image.gen
    model: fal-ai/flux/dev
    prompt: prompts/initial-frame.md

  - id: initial_gen
    kind: video.gen
    model: seedance-2-0-260128
    inputs: [initial_frame]
    duration_s: 5
    prompt: prompts/initial-gen.md

  - id: extend_gen
    kind: video.extend
    model: seedance-2-0-260128
    inputs: [initial_gen]
    duration_s: 5
    repeat:
      min: 1
      max: 4
      default: 2

  - id: upscale
    kind: video.upscale
    model: seedance-2-0-260128-upscale
    inputs: [extend_gen]
    tier: final

  - id: stitch
    kind: stitch
    inputs: [upscale]
    stitch:
      engine: auto
      format: mp4
      quality: high
      fps: 30
output: Zap.mp4
---

Build docs developers (and LLMs) love