Vision API

The Vision API powers two capabilities: previewing exact generation prompts before committing to a job, and triggering AI variation generation or re-analysis on existing images. Variations are generated via fal.ai’s nano-banana-pro/edit model. Analysis is performed by a VLM (vision-language model) via OpenRouter.

Modification modes

Each mode produces a different type of variation. The modificationMode argument in generation functions accepts one of these values.

Mode	Description
`shot-variation`	Different camera angle or framing of the same subject. When no `variationDetail` is provided, cycles through 16 shot types (close-up, wide shot, bird’s eye view, etc.).
`b-roll`	Same location and environment with no people visible. Useful for cutaway coverage.
`action-shot`	A dramatic narrative or performance moment featuring the same person.
`style-variation`	Same person and face in a different scene, outfit, and location.
`subtle-variation`	A later beat in the same scene — same person, same setting, different moment.
`coverage`	A detail or object shot within the same environment. No people.

Group context

For shot-variation and action-shot modes, the prompt is automatically adjusted based on the image’s group value:

Music Video → prefixes with “Later in the music video.”
Commercial → prefixes with “Later in the commercial.”
Film, TV Series, Web Series → prefixes action-shot with “Later in the scene.”

Aspect ratios

The aspectRatio argument accepts: 16:9 (default), 9:16, 1:1, 4:3, 3:4.

Queries

`api.vision.getVariationPrompts`

Preview the exact prompt strings that will be sent to fal.ai for the given settings. Use this to show users what will be generated before they commit.

modificationMode

string

required

The modification mode. See modification modes above.

variationCount

number

required

Number of prompts to generate. Range: 0–12. A count of 0 returns an empty array.

variationDetail

string

Optional detail instruction injected into the prompt (e.g. "extreme close-up of her hands"). When omitted for shot-variation, the system cycles through shot types automatically.

group

string

Project group context (e.g. "Film", "Commercial"). Adjusts the prompt prefix for supported modes.

Returns string[] — the exact prompts that would be sent, one per requested variation.

const prompts = await convex.query(api.vision.getVariationPrompts, {
  modificationMode: "shot-variation",
  variationCount: 3,
  group: "Film",
});
// ["Later in the scene. close-up of the same subject.",
//  "Later in the scene. extreme close-up of the same subject.",
//  "Later in the scene. close-up profile of the same subject."]

`api.vision.getVariationPromptsForImage`

Same as getVariationPrompts, but automatically uses the group context already stored on a specific image. Requires authentication and ownership of the image.

imageId

Id<'images'>

required

ID of the image whose group field will be used as context.

modificationMode

string

required

The modification mode.

variationCount

number

required

Number of prompts to generate. Range: 0–12.

variationDetail

string

Optional detail instruction.

Returns string[] — the exact prompts that would be sent for this image.

Mutations

`api.vision.generateVariations`

Trigger AI variation generation for an existing image. The image’s aiStatus is set to processing immediately, and generation is scheduled asynchronously via fal.ai.

imageId

Id<'images'>

required

ID of the image to generate variations from. Must be owned by the authenticated user and have an imageUrl or storageId.

variationCount

number

required

Number of variations to generate. Range: 1–12.

modificationMode

string

required

The modification mode to apply. See modification modes.

variationDetail

string

Optional detail instruction to inject into the prompt.

aspectRatio

string

Output aspect ratio. One of 16:9, 9:16, 1:1, 4:3, 3:4. Defaults to 16:9.

Returns

success

boolean

required

true when generation was successfully scheduled.

Generated images are created as child Image records with parentImageId set to the source image ID and sourceType: "ai". Monitor their status by polling api.images.getProcessingImages.

`api.vision.rerunSmartAnalysis`

Re-run VLM analysis on an existing image. The image’s aiStatus is reset to processing. Analysis updates description, tags, colors, category, group, projectName, and moodboardName. Optionally triggers variation generation after analysis completes.

imageId

Id<'images'>

required

ID of the image to re-analyze. Must be owned by the authenticated user.

title

string

required

Title to provide as context for the VLM.

Convex Functions

HTTP Endpoints

Modification modes

Group context

Aspect ratios

Queries

`api.vision.getVariationPrompts`

`api.vision.getVariationPromptsForImage`

Mutations

`api.vision.generateVariations`

`api.vision.rerunSmartAnalysis`

Build docs developers (and LLMs) love

Convex Functions

HTTP Endpoints

​Modification modes

​Group context

​Aspect ratios

​Queries

​api.vision.getVariationPrompts

​api.vision.getVariationPromptsForImage

​Mutations

​api.vision.generateVariations

​api.vision.rerunSmartAnalysis

Build docs developers (and LLMs) love

Modification modes

Group context

Aspect ratios

Queries

`api.vision.getVariationPrompts`

`api.vision.getVariationPromptsForImage`

Mutations

`api.vision.generateVariations`

`api.vision.rerunSmartAnalysis`