Overview
The Thumbnail Studio API provides a complete workflow for creating viral YouTube thumbnails and optimized titles using Gemini AI. The API analyzes your video content and generates:- AI-suggested titles based on video content and virality patterns
- Custom thumbnails with text overlays and visual effects
- YouTube descriptions with automatic chapter generation
- Title refinement through conversational AI
Workflow
POST /api/thumbnail/upload
Uploads a video and starts background Whisper transcription immediately. This pre-processes the video so subsequent operations are faster.Request Parameters
Video file to upload (multipart/form-data)
YouTube URL to download (alternative to file upload)
Provide either
file or url, not both.Response
Unique session identifier for this thumbnail studio session
Example
POST /api/thumbnail/analyze
Analyzes a video and suggests viral YouTube titles using Gemini AI. Optionally uses pre-transcribed audio from/upload endpoint.
Authentication
Your Google Gemini API key
Request Parameters
Session ID from
/api/thumbnail/upload (for pre-transcribed videos)Video file to analyze (if no session_id)
YouTube URL to analyze (if no session_id)
If you used
/upload first, only provide session_id. Otherwise, provide file or url.Response
Session ID for continuing the workflow
Array of AI-suggested viral titles (typically 5-10 options)
Summary of video content used for title generation
Detected video language (e.g., “en”, “es”)
Indices of recommended titles (highest virality potential)
Example
POST /api/thumbnail/titles
Refines title suggestions through conversational AI or accepts a manual title.Authentication
Your Google Gemini API key
Request Body
Session ID from previous analyze call (required for refinement mode)
User message for refinement (e.g., “make them shorter”, “more clickbaity”)
Manual title to use (skips AI refinement)
Provide either
message (for AI refinement) or title (for manual entry).Response
Session ID (created if not provided)
New array of refined titles or the manual title
Example - Refinement Mode
Example - Manual Title
POST /api/thumbnail/generate
Generates YouTube thumbnails with AI-powered text overlays and visual effects using Gemini image generation.Authentication
Your Google Gemini API key
Request Parameters (Form Data)
Session ID from previous steps
Title text to overlay on thumbnails
Additional prompt for customization (e.g., “dark background”, “neon colors”)
Number of thumbnail variations to generate (1-6)
Face image to composite into thumbnail (optional)
Background image to use (optional)
Response
Array of generated thumbnail URLs
Example
POST /api/thumbnail/describe
Generates a YouTube description with automatic chapters based on video transcript.Authentication
Your Google Gemini API key
Request Body
Session ID from analyze step (must have transcript)
Video title to use in description
Response
Generated YouTube description with chapters
Example
POST /api/thumbnail/publish
Publishes the video to YouTube with the generated thumbnail and description via Upload-Post API. Returns immediately while upload happens in background.Request Parameters (Form Data)
Session ID (must have original video)
YouTube video title
YouTube video description
URL of the thumbnail to use (from
/generate response)Upload-Post API key
Upload-Post user/profile username
Response
Unique ID for tracking this publish job
Initial status (always “uploading”)
Example
GET /api/thumbnail/publish/status/
Polls the status of a background publish job.Request
Publish ID from
/api/thumbnail/publishResponse
Current status:
uploading, done, or failedUpload-Post API response (only if status is
done)Error message (only if status is
failed)Example
Complete Workflow Example
Python SDK
Error Codes
| Code | Description |
|---|---|
| 400 | Missing X-Gemini-Key header |
| 400 | Missing required parameters (file/url/session_id) |
| 404 | Session not found |
| 400 | No transcript segments available (analyze first) |
| 404 | Video file not found in session |
| 500 | Transcription failed |
| 500 | Gemini API error (quota, invalid key, etc.) |
| 500 | Upload-Post API error |
Session Lifecycle
Sessions are stored in-memory and cleaned up after 1 hour of inactivity. A session contains:- Original video file path
- Whisper transcript and segments
- Video duration and language
- Generated titles and conversation history
- Video context/summary
Performance Notes
- Upload: Instant response, transcription runs in background
- Analyze: 30-60 seconds (includes Whisper if not pre-transcribed)
- Titles Refinement: 5-10 seconds per iteration
- Thumbnail Generation: 20-40 seconds for 3 thumbnails
- Description: 10-15 seconds
- Publish: Instant response, upload runs in background (5-10 minutes)
Best Practices
- Use Upload First: Call
/uploadbefore/analyzeto pre-transcribe and save time - Iterate on Titles: Use multiple
/titlesrefinement calls to perfect your title - Test Thumbnails: Generate 3-6 variations and A/B test
- Custom Prompts: Use
extra_promptfor brand-specific styling - Face Overlays: Upload a consistent face image for channel branding
Next Steps
- Process short clips for multi-platform distribution
- Add subtitles to improve watch time
- Translate videos to reach global audiences