Skip to main content
The Vision API powers two capabilities: previewing exact generation prompts before committing to a job, and triggering AI variation generation or re-analysis on existing images. Variations are generated via fal.ai’s nano-banana-pro/edit model. Analysis is performed by a VLM (vision-language model) via OpenRouter.

Modification modes

Each mode produces a different type of variation. The modificationMode argument in generation functions accepts one of these values.
ModeDescription
shot-variationDifferent camera angle or framing of the same subject. When no variationDetail is provided, cycles through 16 shot types (close-up, wide shot, bird’s eye view, etc.).
b-rollSame location and environment with no people visible. Useful for cutaway coverage.
action-shotA dramatic narrative or performance moment featuring the same person.
style-variationSame person and face in a different scene, outfit, and location.
subtle-variationA later beat in the same scene — same person, same setting, different moment.
coverageA detail or object shot within the same environment. No people.

Group context

For shot-variation and action-shot modes, the prompt is automatically adjusted based on the image’s group value:
  • Music Video → prefixes with “Later in the music video.”
  • Commercial → prefixes with “Later in the commercial.”
  • Film, TV Series, Web Series → prefixes action-shot with “Later in the scene.”

Aspect ratios

The aspectRatio argument accepts: 16:9 (default), 9:16, 1:1, 4:3, 3:4.

Queries

api.vision.getVariationPrompts

Preview the exact prompt strings that will be sent to fal.ai for the given settings. Use this to show users what will be generated before they commit.
modificationMode
string
required
The modification mode. See modification modes above.
variationCount
number
required
Number of prompts to generate. Range: 0–12. A count of 0 returns an empty array.
variationDetail
string
Optional detail instruction injected into the prompt (e.g. "extreme close-up of her hands"). When omitted for shot-variation, the system cycles through shot types automatically.
group
string
Project group context (e.g. "Film", "Commercial"). Adjusts the prompt prefix for supported modes.
Returns string[] — the exact prompts that would be sent, one per requested variation.
const prompts = await convex.query(api.vision.getVariationPrompts, {
  modificationMode: "shot-variation",
  variationCount: 3,
  group: "Film",
});
// ["Later in the scene. close-up of the same subject.",
//  "Later in the scene. extreme close-up of the same subject.",
//  "Later in the scene. close-up profile of the same subject."]

api.vision.getVariationPromptsForImage

Same as getVariationPrompts, but automatically uses the group context already stored on a specific image. Requires authentication and ownership of the image.
imageId
Id<'images'>
required
ID of the image whose group field will be used as context.
modificationMode
string
required
The modification mode.
variationCount
number
required
Number of prompts to generate. Range: 0–12.
variationDetail
string
Optional detail instruction.
Returns string[] — the exact prompts that would be sent for this image.

Mutations

api.vision.generateVariations

Trigger AI variation generation for an existing image. The image’s aiStatus is set to processing immediately, and generation is scheduled asynchronously via fal.ai.
imageId
Id<'images'>
required
ID of the image to generate variations from. Must be owned by the authenticated user and have an imageUrl or storageId.
variationCount
number
required
Number of variations to generate. Range: 1–12.
modificationMode
string
required
The modification mode to apply. See modification modes.
variationDetail
string
Optional detail instruction to inject into the prompt.
aspectRatio
string
Output aspect ratio. One of 16:9, 9:16, 1:1, 4:3, 3:4. Defaults to 16:9.
Returns
success
boolean
required
true when generation was successfully scheduled.
Generated images are created as child Image records with parentImageId set to the source image ID and sourceType: "ai". Monitor their status by polling api.images.getProcessingImages.

api.vision.rerunSmartAnalysis

Re-run VLM analysis on an existing image. The image’s aiStatus is reset to processing. Analysis updates description, tags, colors, category, group, projectName, and moodboardName. Optionally triggers variation generation after analysis completes.
imageId
Id<'images'>
required
ID of the image to re-analyze. Must be owned by the authenticated user.
title
string
required
Title to provide as context for the VLM.
tags
string[]
required
Current tags to pass as context.
category
string
required
Current category to pass as context.
storageId
Id<'_storage'>
Convex storage ID to use as the image source. Falls back to the stored storageId on the image record.
imageUrl
string
URL to use as the image source. Falls back to the stored imageUrl.
description
string
Current description to pass as context.
source
string
Source attribution.
sref
string
Style reference string.
group
string
Project group override.
projectName
string
Project name override.
moodboardName
string
Moodboard name override.
variationCount
number
default:"0"
Number of variations to auto-generate after analysis. Defaults to 0 (no auto-generation).
modificationMode
string
Modification mode to use if variationCount is greater than 0. Falls back to the mode stored on the image.
Returns
success
boolean
required
true when analysis was successfully scheduled.

Build docs developers (and LLMs) love