TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/AymanMahfuz27/tiktok-auto-collection-sorter/llms.txt
Use this file to discover all available pages before exploring further.
predict.py script uses your trained model to predict folder assignments for unsorted videos, with configurable confidence thresholds and automatic file organization.
How Predictions Work
The inference pipeline:- Load trained model from
artifacts/model.pt(or.pkl) - Load unlabeled embeddings from
artifacts/unlabeled_embeddings.pt - Generate predictions with confidence scores (probability distribution)
- Display top-k predictions per video
- Optionally move files to predicted folders
- Save detailed predictions to
predictions.json
Running Predictions
Ensure Prerequisites
You need:
artifacts/model.ptandartifacts/model_config.jsonfrom trainingartifacts/unlabeled_embeddings.ptfrom feature extraction
Command-Line Arguments
—move
Actually move files to predicted folders (default: false).—threshold
Minimum confidence (0.0 to 1.0) required to auto-assign a folder.--threshold 0.5: Only auto-sort videos the model is reasonably confident about--threshold 0.8: Very conservative, only high-confidence predictions--threshold 0.0: Sort everything (default)
—top-k
Number of predictions to show per video (default: 3).Understanding Confidence Scores
Confidence scores are softmax probabilities (sum to 1.0 across all classes):Interpreting Confidence
| Confidence | Interpretation | Action |
|---|---|---|
| 90-100% | Very confident | Trust the prediction |
| 70-90% | Confident | Usually correct, verify if important |
| 50-70% | Uncertain | Review manually, might be ambiguous |
| <50% | Very uncertain | Definitely review, likely wrong or ambiguous |
High Confidence (>90%)
Example:Moderate Confidence (50-70%)
Example:- Be a funny TikTok (overlap between categories)
- Have features of both categories
- Be mislabeled or genuinely ambiguous
--threshold to skip auto-sorting.
Confused Predictions (Close Split)
Example:- Video shows cooking while traveling
- Weak category definition
- Insufficient training data for this edge case
Predictions Output File
All predictions are saved toartifacts/predictions.json for review:
- Audit predictions before moving files
- Find low-confidence videos for manual review
- Analyze which categories the model confuses
Finding Low-Confidence Predictions
Finding Confused Categories
Confidence Threshold Strategy
Conservative Strategy (High Precision)
- High precision (few errors)
- Many videos skipped (low recall)
- Manual labeling required for uncertain cases
Balanced Strategy
- Good precision (~85-90%)
- High recall (~70-80% sorted)
- Occasional errors on ambiguous videos
Aggressive Strategy (High Recall)
- Lower precision (~80-90%)
- 100% recall (all videos sorted)
- Requires post-sorting review
Active Learning Workflow
-
Initial sorting with high threshold:
-
Review skipped videos (low confidence):
- Manually label uncertain videos via labeling interface
-
Re-extract and retrain with new labels:
- Repeat until all videos are sorted or accuracy plateaus
Each iteration improves the model by teaching it edge cases. After 2-3 cycles, you typically reach 90%+ accuracy.
Troubleshooting
FileNotFoundError: artifacts/unlabeled_embeddings.pt
FileNotFoundError: artifacts/unlabeled_embeddings.pt
No unlabeled videos found during feature extraction. This happens when all videos are already in category folders.Solution: Move some videos to
data/Favorites/videos/ root:All predictions go to one category
All predictions go to one category
Model is biased toward the majority class. Possible causes:
- Severe class imbalance: One category has 90%+ of data
- Weak features: Categories aren’t visually/audibly distinct
- Training issue: Class weighting didn’t work
- Balance your dataset (add more examples to minority classes)
- Verify categories have distinct content
- Check training metrics for signs of mode collapse
Video already exists in target folder
Video already exists in target folder
The video was already sorted (manually or by a previous prediction run).Not an error - the script skips to avoid overwriting. If you want to re-sort:
Predictions seem random (all ~same confidence)
Predictions seem random (all ~same confidence)
Model hasn’t learned meaningful patterns. Check:
- Training accuracy: Was it >60%? If not, model didn’t learn
- Data quality: Are videos correctly labeled?
- Feature extraction: Did it complete successfully?
Batch Processing Large Collections
For 500+ unlabeled videos, process in batches to review incrementally:Next Steps
Labeling Interface
Use the interactive web UI to review predictions and manually label videos