Documentation Index Fetch the complete documentation index at: https://mintlify.com/AymanMahfuz27/tiktok-auto-collection-sorter/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The TikTok Auto Collection Sorter compares three model types during training and selects the best performer via cross-validation:
k-Nearest Neighbors (k-NN) : Non-parametric baseline
Logistic Regression : Linear classifier with L2 regularization
Multi-Layer Perceptron (MLP) : Two-layer neural network
This guide covers when to use each model, how to modify the MLP architecture, and how to add custom models.
Model Comparison
k-Nearest Neighbors
How it works (train.py:146-152):
k = min ( 5 , len (X_train) - 1 )
knn = KNeighborsClassifier( n_neighbors = k, metric = "cosine" )
knn.fit(X_train, y_train)
knn_preds = knn.predict(X_val)
Characteristics :
No training required (stores all training data)
Uses cosine similarity between embeddings
k=5 neighbors by default
When to use :
Best for small datasets (<100 samples)
When classes have tight, well-separated clusters
When you want instant “training” (no optimization step)
Limitations :
Slow inference on large datasets (compares against all training data)
No learned decision boundaries
Sensitive to noisy features
Logistic Regression
How it works (train.py:155-160):
lr = LogisticRegression( max_iter = 1000 , C = 1.0 , class_weight = "balanced" )
lr.fit(X_train, y_train)
lr_preds = lr.predict(X_val)
Characteristics :
Linear decision boundaries
L2 regularization (C=1.0 controls strength)
Built-in class balancing
When to use :
When classes are linearly separable
For interpretability (can inspect feature weights)
When you need fast, reliable inference
Limitations :
Cannot learn non-linear patterns
May underfit complex relationships
Multi-Layer Perceptron (MLP)
Architecture (train.py:31-45):
class MLP ( nn . Module ):
def __init__ ( self , input_dim , num_classes , hidden_dim = 256 ):
super (). __init__ ()
self .net = nn.Sequential(
nn.Linear(input_dim, hidden_dim), # 1024 → 256
nn.ReLU(),
nn.Dropout( 0.3 ),
nn.Linear(hidden_dim, hidden_dim // 2 ), # 256 → 128
nn.ReLU(),
nn.Dropout( 0.2 ),
nn.Linear(hidden_dim // 2 , num_classes), # 128 → N
)
def forward ( self , x ):
return self .net(x)
Characteristics :
Two hidden layers (256 → 128 neurons)
ReLU activations
Dropout regularization (0.3 and 0.2)
Adam optimizer with weight decay
When to use :
When classes have non-linear decision boundaries
With sufficient training data (>50 samples per class)
When logistic regression underfits
Limitations :
Requires more data than linear models
Slower training than k-NN or logistic regression
Risk of overfitting on very small datasets
Modifying MLP Hyperparameters
Hidden Layer Size
Increase capacity for complex datasets:
class MLP ( nn . Module ):
def __init__ ( self , input_dim , num_classes , hidden_dim = 512 ): # Was 256
super (). __init__ ()
self .net = nn.Sequential(
nn.Linear(input_dim, hidden_dim), # 1024 → 512
nn.ReLU(),
nn.Dropout( 0.3 ),
nn.Linear(hidden_dim, hidden_dim // 2 ), # 512 → 256
nn.ReLU(),
nn.Dropout( 0.2 ),
nn.Linear(hidden_dim // 2 , num_classes), # 256 → N
)
Larger networks require more training data. If you have <200 labeled samples, stick with hidden_dim=256 or smaller to avoid overfitting.
Dropout Rates
Reduce overfitting by increasing dropout:
nn.Dropout( 0.5 ), # Was 0.3 - more aggressive regularization
Or decrease for small datasets where model is underfitting:
nn.Dropout( 0.1 ), # Was 0.3 - less regularization
Learning Rate and Optimizer
Modify train_mlp function (train.py:48-51):
def train_mlp ( X_train , y_train , X_val , y_val , num_classes , device ,
epochs = 100 , lr = 5e-4 ): # Was 1e-3
model = MLP(input_dim, num_classes).to(device)
optimizer = optim.Adam(model.parameters(), lr = lr, weight_decay = 1e-3 ) # Was 1e-4
Guidelines :
Lower learning rate (5e-4) for more stable training
Higher weight decay (1e-3) for stronger L2 regularization
More epochs (200) if training stops improving early
Batch Size
Change in train.py:64:
loader = DataLoader(train_ds, batch_size = 64 , shuffle = True ) # Was 32
Larger batches (64) → more stable gradients, faster training
Smaller batches (16) → more noise, better generalization (useful for small datasets)
Adding a Third Hidden Layer
For very complex classification tasks:
class DeepMLP ( nn . Module ):
def __init__ ( self , input_dim , num_classes , hidden_dim = 256 ):
super (). __init__ ()
self .net = nn.Sequential(
nn.Linear(input_dim, hidden_dim), # 1024 → 256
nn.ReLU(),
nn.Dropout( 0.3 ),
nn.Linear(hidden_dim, hidden_dim), # 256 → 256
nn.ReLU(),
nn.Dropout( 0.3 ),
nn.Linear(hidden_dim, hidden_dim // 2 ), # 256 → 128
nn.ReLU(),
nn.Dropout( 0.2 ),
nn.Linear(hidden_dim // 2 , num_classes), # 128 → N
)
def forward ( self , x ):
return self .net(x)
Replace the MLP class in both train.py and predict.py with DeepMLP.
Deeper networks need significantly more data. Only use 3+ hidden layers if you have >500 labeled samples.
Custom Model: Attention-Based MLP
Add an attention mechanism to weight feature importance:
import torch
import torch.nn as nn
import torch.nn.functional as F
class AttentionMLP ( nn . Module ):
def __init__ ( self , input_dim , num_classes , hidden_dim = 256 ):
super (). __init__ ()
# Attention layer
self .attention = nn.Sequential(
nn.Linear(input_dim, input_dim),
nn.Tanh(),
nn.Linear(input_dim, input_dim),
nn.Softmax( dim = 1 )
)
# Main network
self .net = nn.Sequential(
nn.Linear(input_dim, hidden_dim),
nn.ReLU(),
nn.Dropout( 0.3 ),
nn.Linear(hidden_dim, hidden_dim // 2 ),
nn.ReLU(),
nn.Dropout( 0.2 ),
nn.Linear(hidden_dim // 2 , num_classes),
)
def forward ( self , x ):
# Compute attention weights
attn_weights = self .attention(x)
# Apply attention to input features
x_attended = x * attn_weights
# Pass through main network
return self .net(x_attended)
This model learns which features (visual vs. audio) are most important for classification.
Integrating Custom Models
Add model class to train.py
Update training loop in main() function:
# After line 166 in train.py, add:
# 4. Custom Attention MLP
attn_model, attn_acc = train_custom_mlp(
X_train, y_train, X_val, y_val, num_classes, device
)
attn_preds = attn_model(torch.FloatTensor(X_val).to(device)).argmax( dim = 1 ).cpu().numpy()
results[ "attention_mlp" ].append((attn_preds == y_val).mean())
all_preds[ "attention_mlp" ][val_idx] = attn_preds
Update prediction script (predict.py) to handle new model type
Update model config to save model type metadata
Cross-Validation Strategy
The system uses Stratified K-Fold to ensure balanced folds (train.py:136):
skf = StratifiedKFold( n_splits = n_splits, shuffle = True , random_state = 42 )
This guarantees each fold has proportional class representation. For custom models, this happens automatically.
Key parameters :
n_splits: Adjusted based on smallest class size (min 2, max 5)
shuffle=True: Randomizes data before splitting
random_state=42: Ensures reproducibility
Hyperparameter Tuning Example
Systematic grid search for best MLP configuration:
import itertools
# Define hyperparameter grid
hidden_dims = [ 128 , 256 , 512 ]
dropout_rates = [( 0.2 , 0.1 ), ( 0.3 , 0.2 ), ( 0.4 , 0.3 )]
learning_rates = [ 1e-4 , 5e-4 , 1e-3 ]
best_acc = 0
best_config = None
for hidden_dim, (drop1, drop2), lr in itertools.product(
hidden_dims, dropout_rates, learning_rates
):
print ( f " \n Testing: hidden= { hidden_dim } , dropout=( { drop1 } , { drop2 } ), lr= { lr } " )
# Modify MLP class with current hyperparameters
# (you'd need to pass these as arguments to MLP.__init__)
# Run cross-validation
cv_results = []
for train_idx, val_idx in skf.split(X, y):
X_train, X_val = X[train_idx], X[val_idx]
y_train, y_val = y[train_idx], y[val_idx]
model, acc = train_mlp(X_train, y_train, X_val, y_val,
num_classes, device, lr = lr)
cv_results.append(acc)
mean_acc = np.mean(cv_results)
if mean_acc > best_acc:
best_acc = mean_acc
best_config = (hidden_dim, (drop1, drop2), lr)
print ( f "Mean CV accuracy: { mean_acc :.1%} " )
print ( f " \n Best config: { best_config } with { best_acc :.1%} accuracy" )
Hyperparameter tuning requires many training runs. Each configuration multiplied by K folds can take 10-20 minutes on CPU. Consider using a GPU or reducing the search space.
Model Selection Insights
From train.py:176-179, the system automatically picks the best model:
mean_accs = {name: np.mean(accs) for name, accs in results.items()}
best_name = max (mean_accs, key = mean_accs.get)
print ( f " \n Best model: { best_name } ( { mean_accs[best_name] :.1%} )" )
Typical outcomes :
k-NN wins : Very small dataset (<50 samples) or highly clustered embeddings
Logistic Regression wins : Linearly separable classes, medium dataset (50-200 samples)
MLP wins : Complex boundaries, sufficient data (>200 samples), multimodal signals
If all models perform poorly (<70% accuracy):
Check feature quality : Visualize embeddings with t-SNE/UMAP
Verify labels : Ensure folder assignments are consistent
Increase data : Collect more labeled samples per class
Adjust class weights : See Class Imbalance
Try different architectures : Add/remove layers, change activations
Class Imbalance Handle skewed class distributions
Active Learning Efficiently collect training data