Thumbnail Studio - OpenShorts

Overview

The Thumbnail Studio API provides a complete workflow for creating viral YouTube thumbnails and optimized titles using Gemini AI. The API analyzes your video content and generates:

AI-suggested titles based on video content and virality patterns
Custom thumbnails with text overlays and visual effects
YouTube descriptions with automatic chapter generation
Title refinement through conversational AI

Workflow

POST /api/thumbnail/upload

Uploads a video and starts background Whisper transcription immediately. This pre-processes the video so subsequent operations are faster.

Request Parameters

file

Video file to upload (multipart/form-data)

url

string

YouTube URL to download (alternative to file upload)

Provide either file or url, not both.

Response

session_id

string

required

Unique session identifier for this thumbnail studio session

{
  "session_id": "thumb-550e8400-e29b-41d4-a716-446655440000"
}

Example

# Upload local file
curl -X POST http://localhost:8000/api/thumbnail/upload \
  -F "file=@/path/to/video.mp4"

# Or YouTube URL
curl -X POST http://localhost:8000/api/thumbnail/upload \
  -F "url=https://youtube.com/watch?v=VIDEO_ID"

POST /api/thumbnail/analyze

Analyzes a video and suggests viral YouTube titles using Gemini AI. Optionally uses pre-transcribed audio from /upload endpoint.

Authentication

X-Gemini-Key

string

required

Your Google Gemini API key

Request Parameters

session_id

string

Session ID from /api/thumbnail/upload (for pre-transcribed videos)

file

Video file to analyze (if no session_id)

url

string

YouTube URL to analyze (if no session_id)

If you used /upload first, only provide session_id. Otherwise, provide file or url.

Response

session_id

string

required

Session ID for continuing the workflow

titles

array

required

Array of AI-suggested viral titles (typically 5-10 options)

context

string

required

Summary of video content used for title generation

language

string

required

Detected video language (e.g., “en”, “es”)

recommended

array

Indices of recommended titles (highest virality potential)

{
  "session_id": "thumb-550e8400-e29b-41d4-a716-446655440000",
  "titles": [
    "I Built an AI That Can Read Minds (It Actually Works)",
    "This AI Invention Will Change Everything in 2026",
    "Mind-Reading AI: The Future is Here",
    "You Won't Believe What This AI Can Do",
    "I Spent 30 Days Building a Mind-Reading AI"
  ],
  "context": "Video demonstrates a machine learning project that uses EEG signals to predict user intentions...",
  "language": "en",
  "recommended": [0, 2, 4]
}

Example

# With pre-uploaded session
curl -X POST http://localhost:8000/api/thumbnail/analyze \
  -H "X-Gemini-Key: AIzaSy..." \
  -F "session_id=thumb-abc-123"

# Analyze new video
curl -X POST http://localhost:8000/api/thumbnail/analyze \
  -H "X-Gemini-Key: AIzaSy..." \
  -F "url=https://youtube.com/watch?v=VIDEO_ID"

POST /api/thumbnail/titles

Refines title suggestions through conversational AI or accepts a manual title.

Authentication

X-Gemini-Key

string

required

Your Google Gemini API key

Request Body

session_id

string

Session ID from previous analyze call (required for refinement mode)

message

string

User message for refinement (e.g., “make them shorter”, “more clickbaity”)

title

string

Manual title to use (skips AI refinement)

Provide either message (for AI refinement) or title (for manual entry).

Response

session_id

string

Session ID (created if not provided)

titles

array

required

New array of refined titles or the manual title

{
  "session_id": "thumb-550e8400-e29b-41d4-a716-446655440000",
  "titles": [
    "Mind-Reading AI That Actually Works",
    "I Built an AI Mind Reader in 30 Days",
    "This AI Reads Your Thoughts (No Clickbait)"
  ]
}

curl -X POST http://localhost:8000/api/thumbnail/titles \
  -H "X-Gemini-Key: AIzaSy..." \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "thumb-abc-123",
    "message": "make them shorter and more dramatic"
  }'

Example - Manual Title

curl -X POST http://localhost:8000/api/thumbnail/titles \
  -H "X-Gemini-Key: AIzaSy..." \
  -H "Content-Type: application/json" \
  -d '{
    "title": "My Custom YouTube Title"
  }'

POST /api/thumbnail/generate

Generates YouTube thumbnails with AI-powered text overlays and visual effects using Gemini image generation.

Authentication

X-Gemini-Key

string

required

Your Google Gemini API key

Request Parameters (Form Data)

session_id

string

required

Session ID from previous steps

title

string

required

Title text to overlay on thumbnails

extra_prompt

string

default:""

Additional prompt for customization (e.g., “dark background”, “neon colors”)

count

integer

default:"3"

Number of thumbnail variations to generate (1-6)

face

file

Face image to composite into thumbnail (optional)

background

file

Background image to use (optional)

Response

thumbnails

array

required

Array of generated thumbnail URLs

{
  "thumbnails": [
    "/thumbnails/thumb_abc123_1.jpg",
    "/thumbnails/thumb_abc123_2.jpg",
    "/thumbnails/thumb_abc123_3.jpg"
  ]
}

Example

curl -X POST http://localhost:8000/api/thumbnail/generate \
  -H "X-Gemini-Key: AIzaSy..." \
  -F "session_id=thumb-abc-123" \
  -F "title=Mind-Reading AI" \
  -F "extra_prompt=futuristic, neon blue and purple, dramatic lighting" \
  -F "count=3" \
  -F "face=@/path/to/face.jpg"

POST /api/thumbnail/describe

Generates a YouTube description with automatic chapters based on video transcript.

Authentication

X-Gemini-Key

string

required

Your Google Gemini API key

Request Body

session_id

string

required

Session ID from analyze step (must have transcript)

title

string

required

Video title to use in description

Response

description

string

required

Generated YouTube description with chapters

{
  "description": "In this video, I show you how I built an AI that can read minds using EEG signals and machine learning.\n\nChapters:\n0:00 - Introduction\n1:23 - How EEG Works\n3:45 - Building the AI Model\n7:12 - Testing Results\n9:30 - Conclusion\n\n#ai #machinelearning #science"
}

Example

curl -X POST http://localhost:8000/api/thumbnail/describe \
  -H "X-Gemini-Key: AIzaSy..." \
  -H "Content-Type: application/json" \
  -d '{
    "session_id": "thumb-abc-123",
    "title": "I Built an AI Mind Reader"
  }'

POST /api/thumbnail/publish

Publishes the video to YouTube with the generated thumbnail and description via Upload-Post API. Returns immediately while upload happens in background.

Request Parameters (Form Data)

session_id

string

required

Session ID (must have original video)

title

string

required

YouTube video title

description

string

required

YouTube video description

thumbnail_url

string

required

URL of the thumbnail to use (from /generate response)

api_key

string

required

Upload-Post API key

user_id

string

required

Upload-Post user/profile username

Response

publish_id

string

required

Unique ID for tracking this publish job

status

string

required

Initial status (always “uploading”)

{
  "publish_id": "pub-xyz789",
  "status": "uploading"
}

Example

curl -X POST http://localhost:8000/api/thumbnail/publish \
  -F "session_id=thumb-abc-123" \
  -F "title=I Built an AI Mind Reader" \
  -F "description=Check out this amazing project..." \
  -F "thumbnail_url=/thumbnails/thumb_abc123_1.jpg" \
  -F "api_key=YOUR_UPLOAD_POST_KEY" \
  -F "user_id=myusername"

GET /api/thumbnail/publish/status/

Polls the status of a background publish job.

Request

publish_id

string

required

Publish ID from /api/thumbnail/publish

Response

status

string

required

Current status: uploading, done, or failed

result

object

Upload-Post API response (only if status is done)

error

string

Error message (only if status is failed)

{
  "status": "done",
  "result": {
    "success": true,
    "uploadId": "up_abc123",
    "youtube_url": "https://youtube.com/watch?v=NEW_VIDEO_ID"
  },
  "error": null
}

Example

curl http://localhost:8000/api/thumbnail/publish/status/pub-xyz789

Complete Workflow Example

Python SDK

import requests
import time

API_BASE = "http://localhost:8000"
GEMINI_KEY = "AIzaSy..."
UPLOAD_POST_KEY = "YOUR_KEY"
USER_ID = "myusername"

# Step 1: Upload video (background transcription starts)
print("1. Uploading video...")
upload_resp = requests.post(
    f"{API_BASE}/api/thumbnail/upload",
    files={"file": open("video.mp4", "rb")}
)
session_id = upload_resp.json()["session_id"]
print(f"Session: {session_id}")

# Step 2: Analyze for titles (waits for transcription)
print("2. Analyzing content...")
analyze_resp = requests.post(
    f"{API_BASE}/api/thumbnail/analyze",
    headers={"X-Gemini-Key": GEMINI_KEY},
    data={"session_id": session_id}
)
titles = analyze_resp.json()["titles"]
print(f"Suggested titles: {titles}")

# Step 3: Refine titles
print("3. Refining titles...")
refine_resp = requests.post(
    f"{API_BASE}/api/thumbnail/titles",
    headers={"X-Gemini-Key": GEMINI_KEY},
    json={
        "session_id": session_id,
        "message": "make them shorter and more engaging"
    }
)
refined_titles = refine_resp.json()["titles"]
selected_title = refined_titles[0]
print(f"Selected: {selected_title}")

# Step 4: Generate thumbnails
print("4. Generating thumbnails...")
thumb_resp = requests.post(
    f"{API_BASE}/api/thumbnail/generate",
    headers={"X-Gemini-Key": GEMINI_KEY},
    data={
        "session_id": session_id,
        "title": selected_title,
        "extra_prompt": "dramatic, high contrast",
        "count": 3
    }
)
thumbnails = thumb_resp.json()["thumbnails"]
selected_thumbnail = thumbnails[0]
print(f"Generated: {len(thumbnails)} thumbnails")

# Step 5: Generate description
print("5. Generating description...")
desc_resp = requests.post(
    f"{API_BASE}/api/thumbnail/describe",
    headers={"X-Gemini-Key": GEMINI_KEY},
    json={
        "session_id": session_id,
        "title": selected_title
    }
)
description = desc_resp.json()["description"]
print(f"Description ready")

# Step 6: Publish to YouTube
print("6. Publishing to YouTube...")
publish_resp = requests.post(
    f"{API_BASE}/api/thumbnail/publish",
    data={
        "session_id": session_id,
        "title": selected_title,
        "description": description,
        "thumbnail_url": selected_thumbnail,
        "api_key": UPLOAD_POST_KEY,
        "user_id": USER_ID
    }
)
publish_id = publish_resp.json()["publish_id"]

# Step 7: Poll publish status
print("7. Waiting for upload...")
while True:
    status_resp = requests.get(
        f"{API_BASE}/api/thumbnail/publish/status/{publish_id}"
    )
    status_data = status_resp.json()
    
    if status_data["status"] == "done":
        print(f"✅ Published! {status_data['result']}")
        break
    elif status_data["status"] == "failed":
        print(f"❌ Failed: {status_data['error']}")
        break
    
    time.sleep(5)

print("Complete!")

Error Codes

Code	Description
400	Missing X-Gemini-Key header
400	Missing required parameters (file/url/session_id)
404	Session not found
400	No transcript segments available (analyze first)
404	Video file not found in session
500	Transcription failed
500	Gemini API error (quota, invalid key, etc.)
500	Upload-Post API error

Session Lifecycle

Sessions are stored in-memory and cleaned up after 1 hour of inactivity. A session contains:

Original video file path
Whisper transcript and segments
Video duration and language
Generated titles and conversation history
Video context/summary

Performance Notes

Upload: Instant response, transcription runs in background
Analyze: 30-60 seconds (includes Whisper if not pre-transcribed)
Titles Refinement: 5-10 seconds per iteration
Thumbnail Generation: 20-40 seconds for 3 thumbnails
Description: 10-15 seconds
Publish: Instant response, upload runs in background (5-10 minutes)

Best Practices

Use Upload First: Call /upload before /analyze to pre-transcribe and save time
Iterate on Titles: Use multiple /titles refinement calls to perfect your title
Test Thumbnails: Generate 3-6 variations and A/B test
Custom Prompts: Use extra_prompt for brand-specific styling
Face Overlays: Upload a consistent face image for channel branding

Next Steps

Process short clips for multi-platform distribution
Add subtitles to improve watch time
Translate videos to reach global audiences

Endpoints

Core Modules

Documentation Index

​Overview

​Workflow

​POST /api/thumbnail/upload

​Request Parameters

​Response

​Example

​POST /api/thumbnail/analyze

​Authentication

​Request Parameters

​Response

​Example

​POST /api/thumbnail/titles

​Authentication

​Request Body

​Response

​Example - Refinement Mode

​Example - Manual Title

​POST /api/thumbnail/generate

​Authentication

​Request Parameters (Form Data)

​Response

​Example

​POST /api/thumbnail/describe

​Authentication

​Request Body

​Response

​Example

​POST /api/thumbnail/publish

​Request Parameters (Form Data)

​Response

​Example

​GET /api/thumbnail/publish/status/

​Request

​Response

​Example

​Complete Workflow Example

​Python SDK

​Error Codes

​Session Lifecycle

​Performance Notes

​Best Practices

​Next Steps

Build docs developers (and LLMs) love

Overview

Workflow

POST /api/thumbnail/upload

Request Parameters

Response

Example

POST /api/thumbnail/analyze

Authentication

Request Parameters

Response

Example

POST /api/thumbnail/titles

Authentication

Request Body

Response

Example - Refinement Mode

Example - Manual Title

POST /api/thumbnail/generate

Authentication

Request Parameters (Form Data)

Response

Example

POST /api/thumbnail/describe

Authentication

Request Body

Response

Example

POST /api/thumbnail/publish

Request Parameters (Form Data)

Response

Example

GET /api/thumbnail/publish/status/

Request

Response

Example

Complete Workflow Example

Python SDK

Error Codes

Session Lifecycle

Performance Notes

Best Practices

Next Steps