Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/gratitude5dee/wzrd-studio-desktopfinal/llms.txt

Use this file to discover all available pages before exploring further.

WZRD Studio exposes a unified catalog of AI models spanning four media types: image, video, audio, and text. Every generation request — whether triggered from the Studio node canvas, the Timeline shot panel, the Editor, or project setup — flows through unifiedGenerationService, a single service layer that resolves model IDs, validates inputs, charges credits, and dispatches the request to the correct backend provider.

Architecture Overview

┌───────────────────────────────────────────────────────────┐
│                     Client Code                           │
│  (Project Setup · Studio Nodes · Editor · Storyboard)     │
└──────────────────────┬────────────────────────────────────┘


         ┌─────────────────────────┐
         │ unifiedGenerationService│   src/services/unifiedGenerationService.ts
         │   .generate(input)      │
         └────────────┬────────────┘
                      │  routes by model ID
        ┌─────────────┼───────────────────┐
        ▼             ▼                   ▼
   fal-stream    gemini-text         elevenlabs-*
  (fal.ai proxy)  (Groq/Gemini)     (TTS/SFX/Music)
        │             │                   │
        ▼             ▼                   ▼
  Supabase Edge Functions (cloud)
Key source files:
FilePurpose
src/services/unifiedGenerationService.tsCore service — input validation, routing, result normalization
src/lib/studio-model-constants.tsModel catalog — IDs, names, credits, defaults, supported params
src/lib/constants/credits.tsCredit cost helpers and pre-built lookup maps
src/lib/falModelNormalization.tsModel alias resolution and canonical input building

The GenerationInput Interface

Every generation call accepts a GenerationInput object:
interface GenerationInput {
  /** Model ID from the catalog or a provider-specific ID */
  model: string;
  /** Primary prompt / instruction text */
  prompt: string;
  /** Additional model-specific parameters */
  parameters?: Record<string, unknown>;
  /** Reference assets (input images, videos, audio) */
  referenceAssets?: ReferenceAsset[];
  /** Output configuration */
  outputConfig?: OutputConfig;
  /** Tracking metadata */
  metadata?: GenerationMetadata;
}

interface ReferenceAsset {
  url: string;
  type: 'image' | 'video' | 'audio' | 'text' | 'model';
  role?: string; // e.g. 'input_image', 'style_reference'
}

interface OutputConfig {
  format?: string;            // 'png', 'mp4', 'mp3', etc.
  count?: number;             // Number of outputs
  storageBucket?: string;     // Supabase Storage bucket
  storagePathPrefix?: string;
  autoStore?: boolean;        // Auto-upload to Supabase Storage (default: true)
}

interface GenerationMetadata {
  source?: 'project-setup' | 'studio' | 'editor' | 'storyboard' | 'timeline';
  projectId?: string;
  entityId?: string;          // Node / shot / clip ID
  custom?: Record<string, unknown>;
}
All calls return a GenerationResult with a url (Supabase Storage or provider URL), a status (pending | running | completed | failed), and a metadata block that includes credits consumed, the resolved model ID, and the raw provider response.

Routing & Providers

The service automatically selects a backend route based on the model ID prefix:
RouteTriggered ByBackend
fal-streamAll fal-ai/* model IDsSupabase Edge Function → fal.ai
gmi-cloudAll gmi/* model IDsSupabase Edge Function → GMI Cloud
gemini-textgoogle/gemini-* or openai/gpt-*gemini-text-generation Edge Function
groq-textgroq/* or llama-*groq-chat Edge Function
elevenlabs-ttselevenlabs-ttselevenlabs-tts Edge Function
elevenlabs-sfxelevenlabs-sfxelevenlabs-sfx Edge Function
elevenlabs-musicelevenlabs-musicelevenlabs-music Edge Function
GMI Cloud is the default provider for new projects. GMI Cloud models carry provider: 'gmi-cloud' in the catalog and offer the best price-to-quality ratio for most generation workflows. fal.ai models provide a broader selection of specialized and cutting-edge models.

Model Aliases

Legacy and shorthand IDs are resolved to canonical catalog IDs before dispatch. Alias resolution lives in src/lib/falModelNormalization.ts:
AliasResolves To
flux-schnellfal-ai/flux/schnell
flux-devfal-ai/flux/dev
flux-profal-ai/flux-pro/v1.1-ultra
kling-2-1fal-ai/kling-video/o3/standard/text-to-video
kling-pro-16fal-ai/kling-video/o3/pro/text-to-video
luma/dream-machinefal-ai/kling-video/v3/pro/image-to-video
hailuofal-ai/kling-video/o3/pro/image-to-video

Image Models

WZRD Studio ships with an extensive image model catalog covering both generation (text-to-image) and advanced (image editing, upscaling, relighting, multi-angle) workflows.

Default: Nano Banana 2

fal-ai/nano-banana-2 — 4 credits, ~4s. The studio default for new projects. Fast text-to-image with aspect ratio and safety controls.

Premium: Flux 2 Max

fal-ai/flux-2-max — 10 credits, ~12s. Maximum quality FLUX 2 generation with 16:9 defaults.

GMI Default: Seedream 5.0

gmi/seedream-5.0 — 3 credits, ~8s. High-fidelity image generation by BytePlus, routed through GMI Cloud.

Typography: Ideogram V3

fal-ai/ideogram/v3 — 5 credits, ~8s. Best choice when the image must contain readable text or strong graphic design elements.
ModelIDCreditsBadge
FLUX Schnellfal-ai/flux/schnell3Fast
AuraFlowfal-ai/aura-flow3Fast
Nano Banana 2fal-ai/nano-banana-24Fast
Flux 2 Flashfal-ai/flux-2/flash4Fast
Z-Image Turbofal-ai/z-image/turbo4Fast
FLUX Devfal-ai/flux/dev5Quality
Qwen Image 2fal-ai/qwen-image-2/text-to-image5
Ideogram V3fal-ai/ideogram/v35
Seedream 5 Litefal-ai/seedream/v5/lite/text-to-image5
Imagen 4 Fastfal-ai/imagen4/preview/fast5Fast
Flux 2 Turbofal-ai/flux-2/turbo5Fast
Stable Diffusion 3.5 Largefal-ai/stable-diffusion-v35-large4Quality
OmniGen V1fal-ai/omnigen-v15
Recraft V3fal-ai/recraft-v35Quality
Flux 2fal-ai/flux-26Quality
Qwen Image 2512fal-ai/qwen-image-25126
HiDream I1fal-ai/hidream-i1-full6Premium
Flux 2 Flexfal-ai/flux-2-flex6
Nano Banana Profal-ai/nano-banana-pro7Quality
Qwen Image 2 Profal-ai/qwen-image-2/pro/text-to-image7Premium
FLUX Kontext Profal-ai/flux-pro/kontext/text-to-image7Quality
Imagen 4fal-ai/imagen4/preview7Quality
Grok Imagine Imagexai/grok-imagine-image7
FLUX Pro Ultrafal-ai/flux-pro/v1.1-ultra8Premium
Flux 2 Profal-ai/flux-2-pro8Premium
GPT-Image 1.5fal-ai/gpt-image-1.58Premium
Flux 2 Maxfal-ai/flux-2-max10Premium
Imagen 4 Ultrafal-ai/imagen4/preview/ultra10Premium
ModelIDCreditsWorkflow
Nano Banana 2 Editfal-ai/nano-banana-2/edit5image-edit
IC-Light V2 (Relighting)fal-ai/iclight-v25image-edit
Creative Upscalerfal-ai/creative-upscaler4image-edit
Clarity Upscalerfal-ai/clarity-upscaler4image-edit
Qwen Image 2 Editfal-ai/qwen-image-2/edit6image-edit
FLUX Dev Image-to-Imagefal-ai/flux/dev/image-to-image6image-to-image
Seedream 5 Lite Editfal-ai/seedream/v5/lite/edit6image-edit
Qwen Image Edit 2509fal-ai/qwen-image-edit-25097image-edit
Qwen Multiple Angles 2511fal-ai/qwen-image-edit-2511-multiple-angles7image-edit
Nano Banana Pro Editfal-ai/nano-banana-pro/edit8image-edit
Qwen Image 2 Pro Editfal-ai/qwen-image-2/pro/edit8image-edit
FLUX Pro Ultra Reduxfal-ai/flux-pro/v1.1-ultra/redux9image-to-image

Video Models

Video models are divided into generation (text-to-video and image-to-video) and advanced (reference-to-video, video editing, video utilities). Most generation models support a generate_audio flag for automatic soundtrack creation.

Default T2V: Kling O3 Standard

fal-ai/kling-video/o3/standard/text-to-video — 20 credits, ~45s. Balanced Omni text-to-video with audio support.

Default I2V: Kling O3 Standard

fal-ai/kling-video/o3/standard/image-to-video — 24 credits, ~60s. Default for animating a shot image.

Premium: Sora 2 Pro

fal-ai/sora-2/text-to-video/pro — 50 credits, ~150s. OpenAI Sora 2 at maximum quality settings.

Fastest: LTX 2.3 Fast

fal-ai/ltx-2.3/text-to-video/fast — 16 credits, ~35s. Best choice when iteration speed matters more than fidelity.
ModelIDCreditsWorkflow
LTX Videofal-ai/ltx-video16T2V
LTX 2.3 Fast T2Vfal-ai/ltx-2.3/text-to-video/fast16T2V
Seedance Lite T2Vfal-ai/bytedance/seedance/v1/lite/text-to-video18T2V
Wan 2.1 T2Vfal-ai/wan/v2.1/1.3b/text-to-video18T2V
Kling O3 Standard T2Vfal-ai/kling-video/o3/standard/text-to-video20T2V
Kling O3 Standard I2Vfal-ai/kling-video/o3/standard/image-to-video24I2V
LTX 2 19B T2Vfal-ai/ltx-2-19b/text-to-video24T2V
MiniMax Video-01 Livefal-ai/minimax/video-01-live25T2V
Veo 3 Fastfal-ai/veo3/fast25T2V
Kling 2.5 Turbo Pro I2Vfal-ai/kling-video/v2.5-turbo/pro/image-to-video22I2V
LTX 2.3 Pro T2Vfal-ai/ltx-2.3/text-to-video22T2V
Seedance Pro T2Vfal-ai/bytedance/seedance/v1/pro/text-to-video30T2V
Kling O3 Pro T2Vfal-ai/kling-video/o3/pro/text-to-video30T2V
Kling 3.0 Pro I2Vfal-ai/kling-video/v3/pro/image-to-video30I2V
Veo 3.1 Fastfal-ai/veo3.1/fast30T2V
Kling O3 Pro I2Vfal-ai/kling-video/o3/pro/image-to-video32I2V
Kling 3.0 Pro T2Vfal-ai/kling-video/v3/pro/text-to-video32T2V
Veo 3fal-ai/veo335T2V
Sora 2fal-ai/sora-2/text-to-video35T2V
Veo 3.1fal-ai/veo3.140T2V
Veo 3.1 I2Vfal-ai/veo3.1/image-to-video42I2V
Sora 2 Profal-ai/sora-2/text-to-video/pro50T2V
ModelIDCreditsWorkflow
LTX-2 Fast I2Vgmi/ltx-fast-i2v5I2V
PixVerse V5 T2Vgmi/pixverse-v5-t2v16T2V
Wan 2.6 T2Vgmi/wan2.6-t2v18T2V
Google Veo 3 Fastgmi/veo3-fast20T2V
Minimax Hailuo 2.3gmi/minimax-hailuo-2.322T2V
Kling I2V V2.1 Mastergmi/kling-i2v-v2.1-master24I2V
Kling T2V V2.1 Mastergmi/kling-t2v-v2.1-master24T2V
Kling V3 Omnigmi/kling-v3-omni28T2V/I2V
Luma Ray 2gmi/luma-ray230T2V
Seedance 2.0 Fastgmi/seedance-2.0-fast-t2v20T2V
Seedance 2.0gmi/seedance-2.0-t2v30T2V
Google Veo 3gmi/veo340T2V
ModelIDCreditsWorkflow
FFmpeg Metadatafal-ai/ffmpeg-api/metadata4analysis
FFmpeg Extract Framefal-ai/ffmpeg-api/extract-frame6video-to-image
Trim Videofal-ai/workflow-utilities/trim-video8video-to-video
Scale Videofal-ai/workflow-utilities/scale-video8video-to-video
FFmpeg Merge Videosfal-ai/ffmpeg-api/merge-videos10video-to-video
FFmpeg Merge Audio+Videofal-ai/ffmpeg-api/merge-audio-video10video-to-video
LTX Extend Videofal-ai/ltx-2-19b/distilled/extend-video22video-edit
Kling O3 Standard V2V Editfal-ai/kling-video/o3/standard/video-to-video/edit28video-edit
FFmpeg Compose (Director’s Cut)fal-ai/ffmpeg-api/compose12video-compose
Kling O3 Pro V2V Editfal-ai/kling-video/o3/pro/video-to-video/edit40video-edit
Sora 2 Remixfal-ai/sora-2/video-to-video/remix36video-edit

Audio Models

Audio models cover text-to-speech (TTS), voice cloning, voice design, music generation, sound effects (SFX), speech-to-text (STT), and audio utilities.

Default TTS: ElevenLabs Turbo

fal-ai/elevenlabs/tts/turbo-v2.5 — 4 credits. Premium natural-sounding TTS. Accepts a voice_id parameter for custom voices.

Music: Lyria 2

fal-ai/lyria2 — 6 credits. Google DeepMind’s music generation model. Supports prompt and duration_seconds.

SFX: CassetteAI

cassetteai/sound-effects-generator — 3 credits. Prompt-driven sound effect synthesis.

STT: Whisper

fal-ai/whisper — 2 credits. OpenAI Whisper for transcribing audio assets to text.
ModelIDCreditsCategory
Chatterboxfal-ai/chatterbox/text-to-speech2TTS
Qwen 3 TTSfal-ai/qwen-3-tts/text-to-speech/1.7b2TTS
MiniMax Turbofal-ai/minimax/speech-02-turbo2TTS
MiniMax 2.8 Turbofal-ai/minimax/speech-2.8-turbo2TTS
Whisper STTfal-ai/whisper2STT
MiniMax Speech HDfal-ai/minimax/speech-02-hd3TTS
Kling TTSfal-ai/kling-video/v1/tts3TTS
Index TTS 2fal-ai/index-tts-2/text-to-speech3TTS
Lux TTSfal-ai/lux-tts3TTS
Dia TTSfal-ai/dia-tts3TTS
Orpheus TTSfal-ai/orpheus-tts3TTS
ElevenLabs STTfal-ai/elevenlabs/speech-to-text3STT
CassetteAI SFXcassetteai/sound-effects-generator3SFX
ElevenLabs TTS Turbofal-ai/elevenlabs/tts/turbo-v2.54TTS
VibeVoice 7Bfal-ai/vibevoice/7b4TTS
xAI TTSxai/tts/v14TTS
Pixverse SFXfal-ai/pixverse/sound-effects4SFX
Video SFXcassetteai/video-sound-effects-generator4SFX
Maya1 TTSfal-ai/maya4TTS
MiniMax Voice Clonefal-ai/minimax/voice-clone5Voice Clone
CassetteAI Musiccassetteai/music-generator5Music
ACE-Stepfal-ai/ace-step/audio-to-audio5Music
YuE: Lyrics to Songfal-ai/yue5Music
Lyria 2fal-ai/lyria26Music

Text Models

Text models power storyline generation, shot descriptions, and any prompt-augmentation workflow. The default text model is DeepSeek R1 (gmi/deepseek-r1, 4 credits), routed through GMI Cloud.
ModelIDCreditsProvider
Gemini 3.1 Flash-Litegmi/gemini-3.1-flash-lite1GMI Cloud
Llama 3.3 70B Versatilellama-3.3-70b-versatile1Groq
Llama 3.1 8B Instantllama-3.1-8b-instant1Groq
GLM 5.1gmi/glm-5.12GMI Cloud
OpenAI o4 Minigmi/openai-o4-mini3GMI Cloud
DeepSeek R1 (default)gmi/deepseek-r14GMI Cloud
Claude Opus 4.7gmi/claude-opus-4.75GMI Cloud
Gemini 2.5 Flashgoogle/gemini-2.5-flash1Gemini
Gemini 2.5 Progoogle/gemini-2.5-pro5Gemini
GPT-5 Miniopenai/gpt-5-mini3Gemini proxy
GPT-5openai/gpt-58Gemini proxy

Querying the Model Catalog

The full live catalog is available via a Supabase Edge Function endpoint. This is also the data source for the list_models MCP tool.
GET https://<project>.supabase.co/functions/v1/model-catalog
Authorization: Bearer <supabase-jwt>   # optional — enables user-tier sorting
Response:
{
  "models": [
    {
      "id": "gmi/seedream-5.0-lite",
      "name": "Seedream 5 Lite",
      "credits": 2,
      "media_type": "image",
      "provider": "gmi-cloud"
    }
  ]
}
The list_models MCP tool wraps this endpoint. Call it from any MCP-compatible agent to enumerate available models with their credit costs before constructing a generation request.

Feature Flags

Two environment variables control generation streaming behavior:
FlagEffect
VITE_ENABLE_SHOT_STREAMEnables SSE (Server-Sent Events) streaming for shot generation. When true, progress events are pushed to the client in real time rather than polling.
VITE_ENABLE_STREAM_TELEMETRYEnables telemetry collection for streaming generation events. Used to track latency and error rates in production.
Enable VITE_ENABLE_SHOT_STREAM=true in development to see real-time generation progress in the Timeline shot panel. The progress callback receives { percent, message } objects at key generation milestones (queued → generating → complete).

Usage Examples

Image Generation

import { unifiedGenerationService } from '@/services/unifiedGenerationService';

const result = await unifiedGenerationService.generateImage(
  'A cinematic wide shot of a futuristic city at sunset',
  {
    model: 'fal-ai/nano-banana-2',
    parameters: { aspect_ratio: '16:9', num_images: 1 },
    projectId: 'my-project-id',
    source: 'studio',
    autoStore: true,
  }
);

console.log(result.url);    // Supabase Storage URL
console.log(result.status); // 'completed'

Video Generation (with progress)

const result = await unifiedGenerationService.generate(
  {
    model: 'fal-ai/kling-video/o3/standard/text-to-video',
    prompt: 'A drone flyover of a tropical island, golden hour lighting',
    parameters: { duration: '5', aspect_ratio: '16:9', generate_audio: true },
    metadata: { source: 'timeline', projectId: 'my-project-id' },
  },
  (progress) => {
    console.log(`${progress.percent}% — ${progress.message}`);
  }
);

console.log(result.metadata.credits);         // 20
console.log(result.metadata.durationSeconds); // video duration

Image-to-Video

const result = await unifiedGenerationService.generate({
  model: 'fal-ai/kling-video/v3/pro/image-to-video',
  prompt: 'The camera slowly zooms in as leaves blow in the wind',
  referenceAssets: [
    { url: 'https://storage.example.com/scene.png', type: 'image', role: 'input_image' }
  ],
  parameters: { duration_seconds: 5, fps: 24, generate_audio: true },
  metadata: { source: 'editor', projectId: 'my-project-id' },
});

Audio (TTS)

const ttsResult = await unifiedGenerationService.generateAudio(
  'Welcome to WZRD Studio, your AI filmmaking platform.',
  {
    model: 'fal-ai/elevenlabs/tts/turbo-v2.5',
    parameters: { voiceId: 'JBFqnCBsd6RMkjVDRZzb' },
    source: 'project-setup',
  }
);

Text (Storyline Generation)

const result = await unifiedGenerationService.generateText(
  'Write a 3-sentence storyline for a sci-fi short film about time travel.',
  {
    model: 'gmi/deepseek-r1',
    source: 'project-setup',
  }
);

// Generated text is in raw metadata
const text = (result.metadata.raw as { text: string }).text;

Build docs developers (and LLMs) love