Skip to main content

Overview

OpenShorts provides numerous customization options to tailor the processing pipeline to your needs. These settings control aspect ratios, scene detection sensitivity, fonts, subtitle styling, and more.

Aspect Ratio & Crop Dimensions

The output aspect ratio is controlled by a global constant:
# main.py:29
ASPECT_RATIO = 9 / 16  # Standard vertical video (1080x1920)

Output Resolution Calculation

# main.py:613-618
original_width, original_height = get_video_resolution(input_video)

OUTPUT_HEIGHT = original_height  # Preserve original height
OUTPUT_WIDTH = int(OUTPUT_HEIGHT * ASPECT_RATIO)
if OUTPUT_WIDTH % 2 != 0:
    OUTPUT_WIDTH += 1  # Ensure even dimensions for codec compatibility
How it works:
  • Input: 1920x1080 (landscape) → Output: 607x1080
  • Input: 3840x2160 (4K) → Output: 1215x2160
  • Input: 1280x720 (HD) → Output: 405x720
The system always preserves input height and calculates width based on the aspect ratio. This ensures no vertical quality loss while maximizing the output resolution.

Custom Aspect Ratios

Modify the constant for different formats:
ASPECT_RATIO = 1 / 1  # 1080x1080 for Instagram posts
Important: After changing ASPECT_RATIO, rebuild the Docker container:
docker compose down
docker compose up --build

Scene Detection Parameters

PySceneDetect’s ContentDetector analyzes pixel changes between frames:
# main.py:423-433
def detect_scenes(video_path):
    video_manager = VideoManager([video_path])
    scene_manager = SceneManager()
    scene_manager.add_detector(ContentDetector())  # Uses default threshold
    video_manager.set_downscale_factor()           # Auto-downscale for speed
    video_manager.start()
    scene_manager.detect_scenes(frame_source=video_manager)
    scene_list = scene_manager.get_scene_list()

Default Settings

  • Threshold: 30.0 (default in PySceneDetect)
  • Downscale Factor: Automatic (based on resolution)
  • Min Scene Length: 0.6 seconds (15 frames at 25fps)

Adjust Sensitivity

For more/fewer scene cuts, modify the threshold:
# main.py:426
# Lower = more sensitive (more cuts)
# Higher = less sensitive (fewer cuts)
scene_manager.add_detector(ContentDetector(threshold=25.0))  # More cuts
scene_manager.add_detector(ContentDetector(threshold=40.0))  # Fewer cuts
Recommended values:
  • Action/Sports: 20-25 (detect quick movements)
  • Interviews/Podcasts: 35-45 (reduce false cuts)
  • Mixed Content: 30 (default, balanced)

Scene Strategy Thresholds

# main.py:404-418
avg_faces = sum(face_counts) / len(face_counts)

if avg_faces > 1.2 or avg_faces < 0.5:
    strategies.append('GENERAL')  # Multiple people or landscapes
else:
    strategies.append('TRACK')    # Single subject
Customize face thresholds:
# More aggressive tracking (track even with 2 people)
if avg_faces > 2.0 or avg_faces < 0.5:
    strategies.append('GENERAL')

# Stricter tracking (require clear single subject)
if avg_faces > 0.8 or avg_faces < 0.3:
    strategies.append('GENERAL')

Camera Movement Configuration

The SmoothedCameraman class controls how the camera follows subjects:

Safe Zone Radius

# main.py:102-104
# How far subject can move before camera reacts
self.safe_zone_radius = self.crop_width * 0.25  # 25% of crop width
Larger safe zone (0.35-0.40):
  • More stable (“tripod-like”)
  • Less reactive to small movements
  • Better for talking heads
Smaller safe zone (0.15-0.20):
  • More dynamic tracking
  • Follows subject closely
  • Better for moving subjects

Camera Speed

# main.py:126-135
if abs(diff) > self.crop_width * 0.5:
    speed = 15.0  # Fast re-frame (scene change)
else:
    speed = 3.0   # Slow, steady pan (gradual movement)
Adjust pan speed:
# Slower, cinematic movement
speed = 2.0   # Gentle pan
speed = 10.0  # Fast reframe

# Faster, more responsive
speed = 5.0   # Active pan
speed = 20.0  # Snap reframe

Speaker Tracking Configuration

Prevents rapid switching between multiple speakers:
# main.py:169-178
class SpeakerTracker:
    def __init__(self, stabilization_frames=15, cooldown_frames=30):
        self.stabilization_threshold = stabilization_frames  # Frames to confirm new speaker
        self.switch_cooldown = cooldown_frames               # Min frames before switching

Stabilization Parameters

stabilization_frames=15  # 0.5s at 30fps
cooldown_frames=30       # 1.0s at 30fps
Best for: Interviews, podcasts, debates

Hysteresis Factor

# main.py:251-252
if pid == self.active_speaker_id:
    total_score *= 3.0  # Sticky factor (preference for current speaker)
Adjust stickiness:
  • 2.0 - Switch more easily
  • 3.0 - Default (balanced)
  • 5.0 - Very sticky (avoid switching)

Font Customization

Hook Text Font

# hooks.py:7-9
FONT_URL = "https://github.com/googlefonts/noto-fonts/raw/main/hinted/ttf/NotoSerif/NotoSerif-Bold.ttf"
FONT_DIR = "fonts"
FONT_PATH = os.path.join(FONT_DIR, "NotoSerif-Bold.ttf")
Use a custom font:
1

Download Font

Place your .ttf file in the fonts/ directory:
mkdir -p fonts
cp MyCustomFont-Bold.ttf fonts/
2

Update Path

# hooks.py:9
FONT_PATH = os.path.join(FONT_DIR, "MyCustomFont-Bold.ttf")
3

Rebuild Container

docker compose up --build
Recommended fonts:
  • Montserrat-Bold: Modern, clean sans-serif
  • Bebas Neue: Condensed, impactful
  • Oswald-Bold: Strong, professional
  • Roboto Condensed: Tech-focused

Font Sizing

# hooks.py:44-46
base_font_size = int(target_width * 0.05)  # 5% of video width
font_size = int(base_font_size * font_scale)
Adjust base percentage:
base_font_size = int(target_width * 0.04)  # Smaller (4%)
base_font_size = int(target_width * 0.07)  # Larger (7%)

Hook Box Styling

# hooks.py:36-42
padding_x = 30           # Horizontal padding
padding_y = 25           # Vertical padding
line_spacing = 20        # Space between lines
cornerradius = 20        # Rounded corners
shadow_offset = (5, 5)   # Shadow position
shadow_blur = 10         # Shadow softness

Subtitle Positioning & Sizing

Position Presets

# subtitles.py:154-167
align_lower = str(alignment).lower()
if align_lower == 'top': 
    ass_alignment = 6   # Top-Center
elif align_lower == 'middle': 
    ass_alignment = 10  # Mid-Center
elif align_lower == 'bottom': 
    ass_alignment = 2   # Bottom-Center (default)

Font Size Scaling

# subtitles.py:175-176
final_fontsize = int(fontsize * 0.5)  # Scale down by 50%
if final_fontsize < 8: final_fontsize = 8  # Minimum size
Custom scaling:
# Larger subtitles (70% of input)
final_fontsize = int(fontsize * 0.7)

# No scaling (use exact size)
final_fontsize = fontsize

Vertical Margin

# subtitles.py:194
MarginV=25  # Pixels from top/bottom edge
Adjust margins:
MarginV=50   # More space from edge
MarginV=10   # Closer to edge

Subtitle Box Opacity

# subtitles.py:192
OutlineColour=&H60000000  # &H60 = ~40% opacity
Opacity values (hex):
  • &H00 = 100% transparent
  • &H40 = ~25% opacity (subtle)
  • &H60 = ~40% opacity (default)
  • &H80 = ~50% opacity
  • &HFF = 100% opaque (solid black)

Word Grouping

# subtitles.py:62
def generate_srt(transcript, clip_start, clip_end, output_path, 
                 max_chars=20, max_duration=2.0):
Customize grouping:
max_chars=15        # Shorter lines (3-4 words)
max_chars=30        # Longer lines (5-7 words)
max_duration=1.5    # Faster pacing
max_duration=3.0    # Slower pacing

Environment Variables

Configure runtime behavior without code changes:
# .env
MAX_CONCURRENT_JOBS=10  # Process 10 videos simultaneously

AI Model Configuration

Gemini Model Selection

# editor.py:12
self.model_name = "gemini-3-flash-preview"

# Alternative models
self.model_name = "gemini-2.5-flash"  # Faster, cheaper
self.model_name = "gemini-2.5-pro"    # More creative, expensive

Whisper Model Size

# main.py:758
model = WhisperModel("base", device="cpu", compute_type="int8")

# Alternatives
model = WhisperModel("tiny", ...)   # Faster, less accurate
model = WhisperModel("small", ...)  # Balanced
model = WhisperModel("medium", ...) # Better accuracy, slower
model = WhisperModel("large", ...)  # Best accuracy, very slow
Performance vs Accuracy:
  • tiny = 32M params, ~1x speed, ~85% accuracy
  • base = 74M params, ~1.5x speed, ~90% accuracy (default)
  • small = 244M params, ~3x speed, ~94% accuracy
  • medium = 769M params, ~6x speed, ~96% accuracy
  • large = 1550M params, ~10x speed, ~98% accuracy

Face Detection Confidence

# main.py:76
face_detection = mp_face_detection.FaceDetection(
    model_selection=1,           # 0=short-range, 1=full-range
    min_detection_confidence=0.5 # 0.0-1.0
)
Adjust confidence:
min_detection_confidence=0.3  # Detect more faces (more false positives)
min_detection_confidence=0.7  # Only confident detections (may miss some)

FFmpeg Encoding Settings

Control output quality and speed:
# main.py:633-634
command = [
    'ffmpeg', '-y', '-f', 'rawvideo', '-vcodec', 'rawvideo',
    '-s', f'{OUTPUT_WIDTH}x{OUTPUT_HEIGHT}', '-pix_fmt', 'bgr24',
    '-r', str(fps), '-i', '-', '-c:v', 'libx264',
    '-preset', 'fast',  # Encoding speed
    '-crf', '23',       # Quality (18=high, 28=low)
    '-an', temp_video_output
]
Preset options (speed vs compression):
  • ultrafast - Fastest, largest files
  • superfast
  • veryfast
  • faster
  • fast (default)
  • medium - Balanced
  • slow - Better compression
  • slower
  • veryslow - Best compression, slowest
CRF values (quality):
  • 18 - Visually lossless (large files)
  • 23 - High quality (default)
  • 28 - Medium quality
  • 32 - Low quality (small files)

Next Steps

Build docs developers (and LLMs) love