Overview
OpenShorts provides numerous customization options to tailor the processing pipeline to your needs. These settings control aspect ratios, scene detection sensitivity, fonts, subtitle styling, and more.
Aspect Ratio & Crop Dimensions
The output aspect ratio is controlled by a global constant:
# main.py:29
ASPECT_RATIO = 9 / 16 # Standard vertical video (1080x1920)
Output Resolution Calculation
# main.py:613-618
original_width, original_height = get_video_resolution(input_video)
OUTPUT_HEIGHT = original_height # Preserve original height
OUTPUT_WIDTH = int ( OUTPUT_HEIGHT * ASPECT_RATIO )
if OUTPUT_WIDTH % 2 != 0 :
OUTPUT_WIDTH += 1 # Ensure even dimensions for codec compatibility
How it works:
Input: 1920x1080 (landscape) → Output: 607x1080
Input: 3840x2160 (4K) → Output: 1215x2160
Input: 1280x720 (HD) → Output: 405x720
The system always preserves input height and calculates width based on the aspect ratio. This ensures no vertical quality loss while maximizing the output resolution.
Custom Aspect Ratios
Modify the constant for different formats:
Square (1:1)
4:5 (Instagram Portrait)
2:3 (Pinterest)
Ultra-Vertical
ASPECT_RATIO = 1 / 1 # 1080x1080 for Instagram posts
Important: After changing ASPECT_RATIO, rebuild the Docker container:docker compose down
docker compose up --build
Scene Detection Parameters
PySceneDetect’s ContentDetector analyzes pixel changes between frames:
# main.py:423-433
def detect_scenes ( video_path ):
video_manager = VideoManager([video_path])
scene_manager = SceneManager()
scene_manager.add_detector(ContentDetector()) # Uses default threshold
video_manager.set_downscale_factor() # Auto-downscale for speed
video_manager.start()
scene_manager.detect_scenes( frame_source = video_manager)
scene_list = scene_manager.get_scene_list()
Default Settings
Threshold : 30.0 (default in PySceneDetect)
Downscale Factor : Automatic (based on resolution)
Min Scene Length : 0.6 seconds (15 frames at 25fps)
Adjust Sensitivity
For more/fewer scene cuts, modify the threshold:
# main.py:426
# Lower = more sensitive (more cuts)
# Higher = less sensitive (fewer cuts)
scene_manager.add_detector(ContentDetector( threshold = 25.0 )) # More cuts
scene_manager.add_detector(ContentDetector( threshold = 40.0 )) # Fewer cuts
Recommended values:
Action/Sports : 20-25 (detect quick movements)
Interviews/Podcasts : 35-45 (reduce false cuts)
Mixed Content : 30 (default, balanced)
Scene Strategy Thresholds
# main.py:404-418
avg_faces = sum (face_counts) / len (face_counts)
if avg_faces > 1.2 or avg_faces < 0.5 :
strategies.append( 'GENERAL' ) # Multiple people or landscapes
else :
strategies.append( 'TRACK' ) # Single subject
Customize face thresholds:
# More aggressive tracking (track even with 2 people)
if avg_faces > 2.0 or avg_faces < 0.5 :
strategies.append( 'GENERAL' )
# Stricter tracking (require clear single subject)
if avg_faces > 0.8 or avg_faces < 0.3 :
strategies.append( 'GENERAL' )
Camera Movement Configuration
The SmoothedCameraman class controls how the camera follows subjects:
Safe Zone Radius
# main.py:102-104
# How far subject can move before camera reacts
self .safe_zone_radius = self .crop_width * 0.25 # 25% of crop width
Larger safe zone (0.35-0.40):
More stable (“tripod-like”)
Less reactive to small movements
Better for talking heads
Smaller safe zone (0.15-0.20):
More dynamic tracking
Follows subject closely
Better for moving subjects
Camera Speed
# main.py:126-135
if abs (diff) > self .crop_width * 0.5 :
speed = 15.0 # Fast re-frame (scene change)
else :
speed = 3.0 # Slow, steady pan (gradual movement)
Adjust pan speed:
# Slower, cinematic movement
speed = 2.0 # Gentle pan
speed = 10.0 # Fast reframe
# Faster, more responsive
speed = 5.0 # Active pan
speed = 20.0 # Snap reframe
Speaker Tracking Configuration
Prevents rapid switching between multiple speakers:
# main.py:169-178
class SpeakerTracker :
def __init__ ( self , stabilization_frames = 15 , cooldown_frames = 30 ):
self .stabilization_threshold = stabilization_frames # Frames to confirm new speaker
self .switch_cooldown = cooldown_frames # Min frames before switching
Stabilization Parameters
Default (Balanced)
Aggressive Tracking
Locked Camera
stabilization_frames = 15 # 0.5s at 30fps
cooldown_frames = 30 # 1.0s at 30fps
Best for: Interviews, podcasts, debates stabilization_frames = 5 # 0.17s - quick response
cooldown_frames = 15 # 0.5s - switches faster
Best for: Fast-paced conversations, multiple speakers stabilization_frames = 45 # 1.5s - very stable
cooldown_frames = 90 # 3.0s - rarely switches
Best for: Single presenter, minimal movement
Hysteresis Factor
# main.py:251-252
if pid == self .active_speaker_id:
total_score *= 3.0 # Sticky factor (preference for current speaker)
Adjust stickiness:
2.0 - Switch more easily
3.0 - Default (balanced)
5.0 - Very sticky (avoid switching)
Font Customization
Hook Text Font
# hooks.py:7-9
FONT_URL = "https://github.com/googlefonts/noto-fonts/raw/main/hinted/ttf/NotoSerif/NotoSerif-Bold.ttf"
FONT_DIR = "fonts"
FONT_PATH = os.path.join( FONT_DIR , "NotoSerif-Bold.ttf" )
Use a custom font:
Download Font
Place your .ttf file in the fonts/ directory: mkdir -p fonts
cp MyCustomFont-Bold.ttf fonts/
Update Path
# hooks.py:9
FONT_PATH = os.path.join( FONT_DIR , "MyCustomFont-Bold.ttf" )
Rebuild Container
docker compose up --build
Recommended fonts:
Montserrat-Bold : Modern, clean sans-serif
Bebas Neue : Condensed, impactful
Oswald-Bold : Strong, professional
Roboto Condensed : Tech-focused
Font Sizing
# hooks.py:44-46
base_font_size = int (target_width * 0.05 ) # 5% of video width
font_size = int (base_font_size * font_scale)
Adjust base percentage:
base_font_size = int (target_width * 0.04 ) # Smaller (4%)
base_font_size = int (target_width * 0.07 ) # Larger (7%)
Hook Box Styling
# hooks.py:36-42
padding_x = 30 # Horizontal padding
padding_y = 25 # Vertical padding
line_spacing = 20 # Space between lines
cornerradius = 20 # Rounded corners
shadow_offset = ( 5 , 5 ) # Shadow position
shadow_blur = 10 # Shadow softness
Subtitle Positioning & Sizing
Position Presets
# subtitles.py:154-167
align_lower = str (alignment).lower()
if align_lower == 'top' :
ass_alignment = 6 # Top-Center
elif align_lower == 'middle' :
ass_alignment = 10 # Mid-Center
elif align_lower == 'bottom' :
ass_alignment = 2 # Bottom-Center (default)
Font Size Scaling
# subtitles.py:175-176
final_fontsize = int (fontsize * 0.5 ) # Scale down by 50%
if final_fontsize < 8 : final_fontsize = 8 # Minimum size
Custom scaling:
# Larger subtitles (70% of input)
final_fontsize = int (fontsize * 0.7 )
# No scaling (use exact size)
final_fontsize = fontsize
Vertical Margin
# subtitles.py:194
MarginV = 25 # Pixels from top/bottom edge
Adjust margins:
MarginV = 50 # More space from edge
MarginV = 10 # Closer to edge
Subtitle Box Opacity
# subtitles.py:192
OutlineColour =& H60000000 # &H60 = ~40% opacity
Opacity values (hex):
&H00 = 100% transparent
&H40 = ~25% opacity (subtle)
&H60 = ~40% opacity (default)
&H80 = ~50% opacity
&HFF = 100% opaque (solid black)
Word Grouping
# subtitles.py:62
def generate_srt ( transcript , clip_start , clip_end , output_path ,
max_chars = 20 , max_duration = 2.0 ):
Customize grouping:
max_chars = 15 # Shorter lines (3-4 words)
max_chars = 30 # Longer lines (5-7 words)
max_duration = 1.5 # Faster pacing
max_duration = 3.0 # Slower pacing
Environment Variables
Configure runtime behavior without code changes:
Concurrency Control
Job Retention
File Size Limit
AWS S3 Region
# .env
MAX_CONCURRENT_JOBS = 10 # Process 10 videos simultaneously
AI Model Configuration
Gemini Model Selection
# editor.py:12
self .model_name = "gemini-3-flash-preview"
# Alternative models
self .model_name = "gemini-2.5-flash" # Faster, cheaper
self .model_name = "gemini-2.5-pro" # More creative, expensive
Whisper Model Size
# main.py:758
model = WhisperModel( "base" , device = "cpu" , compute_type = "int8" )
# Alternatives
model = WhisperModel( "tiny" , ... ) # Faster, less accurate
model = WhisperModel( "small" , ... ) # Balanced
model = WhisperModel( "medium" , ... ) # Better accuracy, slower
model = WhisperModel( "large" , ... ) # Best accuracy, very slow
Performance vs Accuracy:
tiny = 32M params, ~1x speed, ~85% accuracy
base = 74M params, ~1.5x speed, ~90% accuracy (default)
small = 244M params, ~3x speed, ~94% accuracy
medium = 769M params, ~6x speed, ~96% accuracy
large = 1550M params, ~10x speed, ~98% accuracy
Face Detection Confidence
# main.py:76
face_detection = mp_face_detection.FaceDetection(
model_selection = 1 , # 0=short-range, 1=full-range
min_detection_confidence = 0.5 # 0.0-1.0
)
Adjust confidence:
min_detection_confidence = 0.3 # Detect more faces (more false positives)
min_detection_confidence = 0.7 # Only confident detections (may miss some)
FFmpeg Encoding Settings
Control output quality and speed:
# main.py:633-634
command = [
'ffmpeg' , '-y' , '-f' , 'rawvideo' , '-vcodec' , 'rawvideo' ,
'-s' , f ' { OUTPUT_WIDTH } x { OUTPUT_HEIGHT } ' , '-pix_fmt' , 'bgr24' ,
'-r' , str (fps), '-i' , '-' , '-c:v' , 'libx264' ,
'-preset' , 'fast' , # Encoding speed
'-crf' , '23' , # Quality (18=high, 28=low)
'-an' , temp_video_output
]
Preset options (speed vs compression):
ultrafast - Fastest, largest files
superfast
veryfast
faster
fast (default)
medium - Balanced
slow - Better compression
slower
veryslow - Best compression, slowest
CRF values (quality):
18 - Visually lossless (large files)
23 - High quality (default)
28 - Medium quality
32 - Low quality (small files)
Next Steps