Evaluating an insider threat model requires careful attention to class imbalance and operational trade-offs. A model that predicts “benign” for every record can achieve high accuracy while catching zero threats. ThreatDetect therefore tracks multiple complementary metrics and tunes its classification threshold explicitly against recall on the validation set, accepting a modest precision cost in exchange for fewer missed threats.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/jazbengu/ThreatDetect/llms.txt
Use this file to discover all available pages before exploring further.
Classification metrics
The following metrics are computed usingsklearn.metrics on the held-out test set after all training and threshold tuning is complete.
| Metric | Function | What it measures |
|---|---|---|
| Accuracy | accuracy_score | Fraction of all records correctly classified |
| Precision | precision_score | Fraction of threat predictions that are correct |
| Recall | recall_score | Fraction of actual threats that are detected |
| F1 | f1_score | Harmonic mean of precision and recall |
| Confusion matrix | confusion_matrix | Counts of true positives, false positives, true negatives, and false negatives |
| Cross-validation | cross_val_score | Mean F1 over k folds on the full dataset, used to check for overfitting |
Insider threat datasets are heavily class-imbalanced — malicious records typically represent a small minority of all observations. In this context, F1 is the primary metric because it penalises both missing threats (low recall) and generating excessive false alarms (low precision). Accuracy alone is misleading on imbalanced data.
Precision-recall curve and threshold selection
A standard binary classifier predicts the positive class when the predicted probability exceeds 0.5. For insider threat detection, recall matters more than precision — a missed threat is a worse outcome than an unnecessary investigation. ThreatDetect selectsbest_threshold by evaluating the precision-recall curve on the validation set and choosing the threshold that best balances the two.
PrecisionRecallDisplay is used to plot the curve and visually confirm the selected operating point. The chosen threshold is stored in the best_threshold key of the model package and applied at inference time — predictions use this threshold rather than 0.5.
Cross-validation
cross_val_score is run with scoring="f1" across k stratified folds on the full dataset after the final model is selected. This confirms that the F1 score is stable across different data splits and that the model has not overfitted to the specific train/validation/test partition used during development.
SHAP global summary
ThreatDetect usesshap.TreeExplainer to compute SHAP values for the XGBoost model. The global summary plot ranks features by their mean absolute SHAP value across the test set, which reveals which behavioural signals the model relies on most heavily. Reviewing this plot during model validation helps confirm that the model is responding to genuinely suspicious behaviour rather than spurious correlations in the training data.
The shap_explainer object is pre-built and stored in the model package. At inference time, SHAP values are computed per-record and surfaced in the Streamlit UI to explain individual predictions.
SHAP values are computed on the augmented feature matrix — the 26 behavioural and engineered features plus the Isolation Forest anomaly score column. The anomaly score often appears in the top features, confirming that unsupervised outlier detection adds signal beyond the labelled features alone.