Documentation Index
Fetch the complete documentation index at: https://mintlify.com/AymanMahfuz27/tiktok-auto-collection-sorter/llms.txt
Use this file to discover all available pages before exploring further.
Overview
FastAPI backend that serves videos, provides predictions, and handles sorting videos into folders. Includes endpoints for listing videos, managing folders, sorting videos, and triggering model retraining. Location:source/server.py
Server: FastAPI with Uvicorn
Default Port: 8000
Configuration Constants
Directory containing videos and category subfolders.
Directory containing predictions and trained models.
Path to the frontend HTML file.
Pydantic Models
SortRequest
Request body schema for sorting a video into a folder.Video filename (must match pattern
\d+\.mp4, e.g., 7234567890123456.mp4)Target folder name (must exist as a subfolder in DATA_DIR)
Functions
get_folders()
Returns list of category folders with video counts. Returns:list[dict] - List of folders with structure:
load_predictions()
Startup event handler that loads predictions fromartifacts/predictions.json into memory.
Populates the global predictions dictionary mapping filename → prediction data.
_run_retrain()
Background function that runs the full ML pipeline in sequence:extract_features.py- Extract features from all videostrain.py- Train classifier on labeled datapredict.py- Generate predictions for unsorted videos
retrain_status global state.
Timeout: 900 seconds (15 minutes) per script
API Endpoints
GET /
Serves the main HTML interface. Response: HTML file (index.html)
GET /api/videos
Lists all unsorted videos (in root directory) with their predictions. Response:Array of unsorted videos with prediction data
Total number of unsorted videos
Video filename
Predicted category folder, or null if no prediction available
Prediction confidence score (0-1)
Array of top predictions with folder names and confidence scores
GET /api/folders
Lists all category folders with video counts. Response:Array of folder objects
Folder name
Number of videos in this folder
POST /api/sort
Moves a video from root directory to a category folder. Request Body:SortRequest
Response:
Whether the sort operation succeeded
Filename that was sorted
Folder the video was moved to
Updated list of all folders with new counts
400 Bad Request- Invalid filename format or folder name404 Not Found- Video file not found (may already be sorted)409 Conflict- File already exists in target folder
- Validates filename matches pattern
\d+\.mp4 - Prevents path traversal (rejects folders containing
..or/) - Checks that destination folder exists and is a directory
- Validates source file exists and is a file (not already moved)
POST /api/retrain
Triggers a full model retraining pipeline in the background. Request Body: None Response:Either
"started" if retraining began, or "already_running" if a retrain is in progress- Checks if retraining is already in progress
- Launches background thread to run pipeline
- Returns immediately (doesn’t wait for completion)
- Pipeline runs:
extract_features.py→train.py→predict.py - Reloads predictions into memory when complete
GET /api/retrain/status
Checks the status of the retraining pipeline. Response:Whether retraining is currently in progress
Result of last retrain attempt:
"success"- Completed successfully"Failed at [script]: [error]"- Failed with error messagenull- No retrain has completed yet
running: true to show progress.
GET /videos/
Serves a video file for playback. Path Parameters:Video filename (must match pattern
\d+\.mp4)Content-Type: video/mp4
File Lookup:
- First checks root directory (unsorted videos)
- If not found, searches all category subfolders
- Returns 404 if not found anywhere
400 Bad Request- Invalid filename format404 Not Found- Video file not found
Running the Server
Development
http://0.0.0.0:8000
Production
With Auto-Reload
State Management
The server maintains two global state variables:predictions
artifacts/predictions.json at startup and after retraining.
retrain_status
running: Boolean indicating if retrain is in progresslast_result: String with success/failure message from last retrain
Dependencies
Required Python packages:fastapi- Web frameworkuvicorn- ASGI serverpydantic- Request/response validation
extract_features.pytrain.pypredict.py
Architecture Notes
Thread Safety: The retraining pipeline runs in a daemon thread to avoid blocking API requests. Only one retrain can run at a time. File System: All file operations useshutil.move() for atomic moves. Path traversal is prevented through validation.
Caching: Predictions are cached in memory and only reloaded after successful retraining.
Error Handling: Invalid filenames, missing files, and folder conflicts return appropriate HTTP error codes.