Overview
All evaluation metrics in bun-scikit follow consistent patterns:- Input validation - Ensures arrays are non-empty and properly shaped
- Sample weighting - Optional weights to give different importance to samples
- Multioutput support - Handle multiple target variables (regression)
- Efficient computation - Optimized implementations
Regression Metrics
Regression metrics measure how well your model predicts continuous values.Mean Squared Error (MSE)
The average squared difference between predictions and actual values.MSE = (1/n) * Σ(yᵢ - ŷᵢ)²
Lower MSE is better. MSE = 0 means perfect predictions.
Mean Absolute Error (MAE)
The average absolute difference between predictions and actual values.MAE = (1/n) * Σ|yᵢ - ŷᵢ|
MAE is less sensitive to outliers than MSE.
R² Score (Coefficient of Determination)
Measures the proportion of variance in the target variable that’s explained by the model.- R² = 1.0 - Perfect predictions
- R² = 0.0 - Model performs as well as predicting the mean
- R² < 0.0 - Model performs worse than predicting the mean
View Implementation
View Implementation
Other Regression Metrics
- MAPE
- Explained Variance
Mean Absolute Percentage Error - measures error as a percentage.Useful for understanding error relative to the scale of the target.
Sample Weights
Give different importance to different samples:Classification Metrics
Classification metrics evaluate how well your model predicts discrete labels.Accuracy Score
The fraction of correctly classified samples.Accuracy can be misleading for imbalanced datasets. Consider precision, recall, and F1 score instead.
Precision, Recall, and F1 Score
These metrics provide deeper insight into classification performance:Confusion Matrix
Visualize the performance of a classification model:- Diagonal - Correct predictions
- Off-diagonal - Misclassifications
Classification Report
Get a comprehensive summary of all metrics:Probability-Based Metrics
- Log Loss
- Brier Score
- ROC AUC
Measures the performance of probability predictions.Lower log loss indicates better probability estimates.
Advanced Classification Metrics
Clustering Metrics
Evaluate unsupervised clustering algorithms:- Silhouette Score
- Calinski-Harabasz
- Davies-Bouldin
Measures how similar objects are to their own cluster vs. other clusters.
Model Scoring Methods
All models have a built-inscore() method:
Complete Evaluation Example
Here’s a complete workflow for evaluating a classification model:Best Practices
Choose metrics appropriate for your task
- Regression: MSE for penalty on large errors, MAE for robustness to outliers
- Balanced classification: Accuracy is sufficient
- Imbalanced classification: Precision, recall, F1, or balanced accuracy
Use multiple metrics
No single metric tells the whole story. Combine metrics to get a complete picture:
Always evaluate on held-out test data
Never evaluate on training data - it will give overly optimistic results:
Related Topics
- Model Training - Learn how to train models
- Pipelines - Build evaluation into your workflow
- Model Selection - Cross-validation and hyperparameter tuning