Zap pipelines follow a creative grammar — each recipe is a directed, ordered sequence of typed steps that carry media from first frame to final artifact. Steps are not arbitrary scripts; they correspond to real provider capabilities (image generation, video animation, upscaling, audio synthesis, and composition). Because every step has an explicitDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/gratitude5dee/Zap/llms.txt
Use this file to discover all available pages before exploring further.
kind, the Zap planner can quote costs before any provider call is made and route each step to the right adapter automatically.
Creative Grammar
The canonical pattern for a generative video recipe is:- InitialFrame — generate or supply a reference image that anchors the visual identity.
- InitialGen — animate the frame into a base video clip.
- InitialGenReViz (optional) — revise or upscale the initial clip before extending.
- ExtendGen × N — chain one or more
video.extendsteps to grow duration. - Zap.mp4 — a
stitchstep assembles all clips into the final artifact.
Step Kinds
Zap supports 11 step kinds, covering the full media production stack:| Kind | Description |
|---|---|
image.gen | Create a first frame, storyboard, character sheet, or any reference image from a text prompt or existing inputs. |
image.edit | Transform an input image while preserving subject identity — useful for style transfer or inpainting. |
video.gen | Animate image or prompt inputs into a video clip. |
video.extend | Continue a clip forward from its last frame. Supports repeat to chain multiple extensions. |
video.edit | Revise an existing clip using a prompt or composition layer. |
video.upscale | Produce a higher-resolution version of a clip. |
audio.tts | Generate voiceover narration from a text prompt. |
audio.music | Generate a music track from a style or lyric prompt. |
audio.sfx | Generate sound effects to layer into the video. |
keyframes | Extract, score, or prepare frames for the next step in the pipeline. |
stitch | Combine all upstream assets into the final Zap artifact (video + optional audio). |
Step Fields
Unique identifier for this step within the recipe. Referenced by downstream steps in their
inputs list. Must be at least one character. Example: initial_frame.The step type. Must be one of the 11 values listed above.
The provider adapter to use for this step. Overrides
defaults.provider. Common values: mock, gmi, fal. See Providers.The specific model to invoke on the provider. Example:
fal-ai/flux/dev, seedance-2-0-260128. The planner uses this value to look up per-request or per-second rates for cost estimation.Path to a Markdown prompt template relative to the recipe root. Example:
prompts/initial-gen.md. The template may contain {INPUT_NAME} placeholders that are resolved at run time from the supplied inputs.List of upstream step IDs whose outputs this step consumes. The Zap runtime resolves these references and passes the media assets to the provider adapter. Example:
[initial_frame].Target clip duration in seconds. Used by video generation and extension steps. Also used by the cost planner:
cost = rate_per_second × duration_s.Number of candidate outputs to generate. Range: 1–16. When greater than 1, the best candidate is selected (optionally via RLHF scoring) before passing to the next step.
Controls how many times a
video.extend step is expanded at plan time. Contains three sub-fields:min(integer, ≥ 0) — minimum number of extensions, even ifextendCountis lower.max(integer, 0–64) — maximum number of extensions allowed. Defaults to 64.default(integer, ≥ 0) — the default extension count when not specified by the caller.
expandRepeatSteps expands the step into count = clamp(extendCount, min, max) concrete steps, each with a suffixed ID (extend_gen_1, extend_gen_2, …).Stitching configuration for
stitch-kind steps. See Stitch Configuration below.Processing tier. One of
"draft" or "final". Signals to provider adapters whether to use faster, lower-quality rendering or full-quality rendering.Enables reinforcement learning from human feedback scoring for candidate selection. Set to
true, false, or "optional".List of input image paths or upstream step IDs to pass to the provider as reference images. Used by
image.edit and video.gen steps that support image-to-video conditioning.Provider-specific configuration for the first-frame anchor. Passed as a free-form object to the adapter and interpreted per-provider. Used when the provider requires explicit first-frame parameters beyond the
inputs reference.Extension mode configuration for
video.extend steps. Contains one sub-field:mode(string, default:"chain") — how the extension attaches to the source clip."chain"continues from the last frame of the previous clip;"anchored"holds the first frame of the original clip as a fixed anchor throughout the extension.
Provider-specific audio configuration passed as a free-form object to the adapter. Used on
audio.tts, audio.music, and audio.sfx steps for model parameters not covered by top-level fields (e.g. voice ID, tempo, style tags).Provider-specific keyframe configuration passed as a free-form object to the adapter. Used on
keyframes-kind steps to control extraction, scoring, or preparation parameters.Provider-specific judge configuration for automated candidate scoring. Passed as a free-form object to the adapter when
candidates is greater than 1 and automated selection is preferred over RLHF.When
true, the output of this step is shareable across recipe instances (e.g. a common reference frame reused by multiple runs).Wiring Steps with inputs
The inputs array on each step names the upstream step IDs whose outputs it depends on. The Zap runtime resolves these at execution time and passes the media assets forward:
Stitch Configuration
Thestitch field on a stitch-kind step controls how the final video is assembled:
The composition engine. One of:
auto— Zap selects the best available engine automatically.local— ffmpeg-based local stitching; no external service required.hyperframes— HyperFrames cloud composition engine; required for HTML-layer compositions.
Output container format.
"mp4" or "webm".Render quality preset. One of
"draft", "standard", or "high".Output frame rate. Range: 1–120. Omit to use the source clip’s native frame rate.
HyperFrames is only needed when HTML-layer composition is required — for example, rendering lower-thirds, animated overlays, or browser-based visual effects on top of video. Using
engine: hyperframes requires a DESIGN.md file in the recipe directory describing the HTML composition layers. If the HyperFrames CLI is unavailable at run time, Zap falls back to the local stitch path and records the fallback on the run step — the recipe will still complete.