Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jazbengu/ThreatDetect/llms.txt

Use this file to discover all available pages before exploring further.

The Organisational Search via CSV page lets you analyse an entire workforce at once. You upload a CSV containing employee behavioral data, ThreatDetect runs every record through the XGBoost model and IsolationForest scorer, and you receive a complete breakdown of risk by employee — including visualisations, SHAP explanations, and a downloadable results file.

Run a batch analysis

1

Navigate to Organisational Search via CSV

Open ThreatDetect in your browser and use the sidebar dropdown to select Organisational Search via CSV.
2

Upload your CSV file

Click Browse files (or drag and drop) in the file uploader. ThreatDetect accepts .csv files only. Once loaded, the app displays a preview of the first 10 rows and two summary metrics: Total Records and, if the employee_campus column is present, Unique Campuses.
3

Click Run Threat Detection

Click the Run Threat Detection button. The app processes every record — encoding categorical columns, scaling numeric columns, engineering derived features, and computing both XGBoost probabilities and IsolationForest anomaly scores. A spinner indicates that analysis is in progress.
4

Review the results

Once complete, the app displays an Organisational Threat Summary with four metrics, two charts, a feature importance chart, and a SHAP summary plot. See Understanding the results below for details on each output.
5

Download the results

Expand the Detailed Results Table section and click Download results as CSV to save threat_analysis_results.csv to your machine. This file contains all original columns plus Prediction, Risk_Prob, Anomaly_Score, and Confidence for every employee.

Required CSV columns

Your CSV must include the following columns. ThreatDetect raises an error if any are missing.
ColumnTypeDescription
employee_campusstringCampus or office location of the employee (must match training set values)
total_printed_pagesnumericTotal pages printed by the employee
num_printed_pages_off_hoursnumericPages printed outside standard hours
total_files_burnednumericNumber of files written to removable media
has_criminal_recordbinary (0/1)Whether the employee has a criminal record
is_contractorbinary (0/1)Whether the employee is a contractor
has_foreign_citizenshipbinary (0/1)Whether the employee holds foreign citizenship
entry_during_weekendbinary (0/1)Whether the employee accessed the building on weekends
late_exit_flagbinary (0/1)Whether the employee regularly exits late
Any additional columns present in your CSV beyond these are carried through to the results file unchanged.
The employee_campus column is encoded using a LabelEncoder fitted on the training dataset. Any campus value not seen during training will cause a ValueError and halt processing. Validate all campus values before uploading.

Understanding the results

Summary metrics

After detection runs, four metrics appear at the top of the results section:
MetricDescription
Total EmployeesTotal number of records processed
MaliciousCount of employees predicted as malicious, with percentage in the delta label
NormalCount of employees predicted as normal, with percentage in the delta label
Avg. ConfidenceMean confidence score across all employees (see confidence formula)

Charts

Threat Prediction Count — A bar chart showing how many employees were classified as Malicious versus Normal. Use this for a quick visual split. Risk Probability Distribution — A histogram of Risk_Prob values (0–1) for all employees. A vertical dashed red line marks the model’s decision threshold. Employees to the right of the line are classified as Malicious. The distribution shape reveals whether most employees cluster far from the threshold (clear cases) or are concentrated near it (uncertain cases). Global Feature Importance (Top 15) — A horizontal bar chart of the top 15 XGBoost feature importances (F-score). These are the features that most frequently split the decision trees across the entire model, giving you a global view of what drives predictions at the organisational level. Global SHAP Summary Plot — A SHAP beeswarm plot computed over a random sample of up to 100 records. Each dot represents one employee for one feature. The horizontal position shows the SHAP value (positive = pushes toward Malicious), and the colour shows the raw feature value (red = high, blue = low). This plot reveals how feature values relate to risk direction across the organisation.

Organisational risk insight

Below the charts, ThreatDetect displays one of two messages:
  • Warning — if at least one employee is predicted Malicious, listing the count, percentage, and the three features with the highest global importance.
  • Success — if no employees are flagged, confirming the organisation appears clean.

Per-employee explanation

After running detection, expand Explain a specific employee (SHAP per instance) to drill into any individual record.
  1. Select an employee from the dropdown. Each option shows the employee index, their prediction, and their confidence score.
  2. The app displays three metrics: Prediction, Confidence, and Anomaly Score for that employee.
  3. A human-readable list explains the top features pushing toward Malicious (increases risk) and toward Normal (reduces risk), showing the original feature value for each.
  4. A SHAP bar chart plots the top 10 features by SHAP value. Red bars push toward Malicious; green bars push toward Normal.
Use the per-employee explainer on any record near the threshold — where Risk_Prob is close to the threshold value — to understand whether the prediction is well-supported or borderline.

Downloading results

The Download results as CSV button inside the Detailed Results Table expander saves a file named threat_analysis_results.csv. This file includes all original columns from your upload plus:
  • Prediction"Malicious" or "Normal"
  • Risk_Prob — probability of being malicious (0–1)
  • Anomaly_Score — IsolationForest decision function output
  • Confidence — model certainty for the assigned class
Confidence formula: Confidence = Risk_Prob when the prediction is Malicious, and Confidence = 1 − Risk_Prob when the prediction is Normal. This means Confidence always represents how certain the model is about whichever class it chose, not the raw probability of being malicious.

Build docs developers (and LLMs) love