Analyse your organisation for insider threats

The Organisational Search via CSV page lets you analyse an entire workforce at once. You upload a CSV containing employee behavioral data, ThreatDetect runs every record through the XGBoost model and IsolationForest scorer, and you receive a complete breakdown of risk by employee — including visualisations, SHAP explanations, and a downloadable results file.

Run a batch analysis

Navigate to Organisational Search via CSV

Open ThreatDetect in your browser and use the sidebar dropdown to select Organisational Search via CSV.

Upload your CSV file

Click Browse files (or drag and drop) in the file uploader. ThreatDetect accepts .csv files only. Once loaded, the app displays a preview of the first 10 rows and two summary metrics: Total Records and, if the employee_campus column is present, Unique Campuses.

Click Run Threat Detection

Click the Run Threat Detection button. The app processes every record — encoding categorical columns, scaling numeric columns, engineering derived features, and computing both XGBoost probabilities and IsolationForest anomaly scores. A spinner indicates that analysis is in progress.

Review the results

Once complete, the app displays an Organisational Threat Summary with four metrics, two charts, a feature importance chart, and a SHAP summary plot. See Understanding the results below for details on each output.

Download the results

Expand the Detailed Results Table section and click Download results as CSV to save threat_analysis_results.csv to your machine. This file contains all original columns plus Prediction, Risk_Prob, Anomaly_Score, and Confidence for every employee.

Required CSV columns

Your CSV must include the following columns. ThreatDetect raises an error if any are missing.

Column	Type	Description
`employee_campus`	string	Campus or office location of the employee (must match training set values)
`total_printed_pages`	numeric	Total pages printed by the employee
`num_printed_pages_off_hours`	numeric	Pages printed outside standard hours
`total_files_burned`	numeric	Number of files written to removable media
`has_criminal_record`	binary (0/1)	Whether the employee has a criminal record
`is_contractor`	binary (0/1)	Whether the employee is a contractor
`has_foreign_citizenship`	binary (0/1)	Whether the employee holds foreign citizenship
`entry_during_weekend`	binary (0/1)	Whether the employee accessed the building on weekends
`late_exit_flag`	binary (0/1)	Whether the employee regularly exits late

Any additional columns present in your CSV beyond these are carried through to the results file unchanged.

The employee_campus column is encoded using a LabelEncoder fitted on the training dataset. Any campus value not seen during training will cause a ValueError and halt processing. Validate all campus values before uploading.

Understanding the results

Summary metrics

After detection runs, four metrics appear at the top of the results section:

Metric	Description
Total Employees	Total number of records processed
Malicious	Count of employees predicted as malicious, with percentage in the delta label
Normal	Count of employees predicted as normal, with percentage in the delta label
Avg. Confidence	Mean confidence score across all employees (see confidence formula)

Charts

Threat Prediction Count — A bar chart showing how many employees were classified as Malicious versus Normal. Use this for a quick visual split. Risk Probability Distribution — A histogram of Risk_Prob values (0–1) for all employees. A vertical dashed red line marks the model’s decision threshold. Employees to the right of the line are classified as Malicious. The distribution shape reveals whether most employees cluster far from the threshold (clear cases) or are concentrated near it (uncertain cases). Global Feature Importance (Top 15) — A horizontal bar chart of the top 15 XGBoost feature importances (F-score). These are the features that most frequently split the decision trees across the entire model, giving you a global view of what drives predictions at the organisational level. Global SHAP Summary Plot — A SHAP beeswarm plot computed over a random sample of up to 100 records. Each dot represents one employee for one feature. The horizontal position shows the SHAP value (positive = pushes toward Malicious), and the colour shows the raw feature value (red = high, blue = low). This plot reveals how feature values relate to risk direction across the organisation.

Organisational risk insight

Below the charts, ThreatDetect displays one of two messages:

Warning — if at least one employee is predicted Malicious, listing the count, percentage, and the three features with the highest global importance.
Success — if no employees are flagged, confirming the organisation appears clean.

Per-employee explanation

After running detection, expand Explain a specific employee (SHAP per instance) to drill into any individual record.

Select an employee from the dropdown. Each option shows the employee index, their prediction, and their confidence score.
The app displays three metrics: Prediction, Confidence, and Anomaly Score for that employee.
A human-readable list explains the top features pushing toward Malicious (increases risk) and toward Normal (reduces risk), showing the original feature value for each.
A SHAP bar chart plots the top 10 features by SHAP value. Red bars push toward Malicious; green bars push toward Normal.

Use the per-employee explainer on any record near the threshold — where Risk_Prob is close to the threshold value — to understand whether the prediction is well-supported or borderline.

Downloading results

The Download results as CSV button inside the Detailed Results Table expander saves a file named threat_analysis_results.csv. This file includes all original columns from your upload plus:

Prediction — "Malicious" or "Normal"
Risk_Prob — probability of being malicious (0–1)
Anomaly_Score — IsolationForest decision function output
Confidence — model certainty for the assigned class

Confidence formula: Confidence = Risk_Prob when the prediction is Malicious, and Confidence = 1 − Risk_Prob when the prediction is Normal. This means Confidence always represents how certain the model is about whichever class it chose, not the raw probability of being malicious.

Get Started

Core Concepts

Using ThreatDetect

Data & Model

Development

Analyse your organisation for insider threats

Run a batch analysis

Required CSV columns

Understanding the results

Summary metrics

Charts

Organisational risk insight

Per-employee explanation

Downloading results

Build docs developers (and LLMs) love

Get Started

Core Concepts

Using ThreatDetect

Data & Model

Development

Documentation Index

​Run a batch analysis

​Required CSV columns

​Understanding the results

​Summary metrics

​Charts

​Organisational risk insight

​Per-employee explanation

​Downloading results

Build docs developers (and LLMs) love

Run a batch analysis

Required CSV columns

Understanding the results

Summary metrics

Charts

Organisational risk insight

Per-employee explanation

Downloading results