Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/jazbengu/ThreatDetect/llms.txt

Use this file to discover all available pages before exploring further.

The test suite for ThreatDetect lives in the unit testing/ directory and covers every major stage of the pipeline: feature preparation, model inference, SHAP explainability, confidence scoring, and input validation. All tests are written for pytest and rely on lightweight in-memory fixtures so you do not need a real trained model to run them.

Install test dependencies

The core app dependencies are listed in requirements.txt. To run the tests you also need pytest and pytest-mock:
pip install -r requirements.txt
pip install pytest pytest-mock

Run the tests

Run all tests from the repository root by pointing pytest at the unit testing/ directory:
pytest "unit testing/"
To run a specific file:
pytest "unit testing/test_prep_features.py"

Test structure

FileWhat it covers
test_prep_features.pyprepare_features() — normal case, unseen categories, inf ratio handling
test_integrate.pyEnd-to-end prediction path with a mocked load_model
test_explainability.pyresults_explainability() output shape and SHAP extraction
confidence_test.pyModel package construction and confidence scoring
testing_validation.pyvalidate_input_columns() — success and missing-column error
conftest.pyShared fixtures used across all test files

Shared fixtures

conftest.py provides two fixtures that most tests depend on. sample_raw_data returns a minimal two-row DataFrame with all required input columns:
@pytest.fixture
def sample_raw_data():
    """Minimal valid input DataFrame with required columns."""
    return pd.DataFrame({
        'employee_id': [1, 2],
        'employee_seniority_years': [2, 10],
        'total_printed_pages': [100, 50],
        'num_printed_pages_off_hours': [10, 5],
        'total_files_burned': [0, 20],
        'has_criminal_record': [0, 1],
        'is_contractor': [1, 0],
        'has_foreign_citizenship': [0, 0],
        'entry_during_weekend': [0, 1],
        'late_exit_flag': [1, 0],
        'employee_campus': ['A', 'B'],
        'trip_day_number': [0, 3],
        'num_entries': [5, 2],
        'num_unique_campus': [1, 2]
    })
mock_model_package builds a fully functional but lightweight model package in memory. It fits a LabelEncoder for the categorical campus column, a StandardScaler for numerical columns, an IsolationForest, and a two-estimator XGBClassifier on random data — enough for the real pipeline code to run without loading a saved model file. It also attaches a shap.TreeExplainer and sets best_threshold to 0.5.

Feature preparation tests

These three tests in test_prep_features.py exercise prepare_features() against the shared fixtures. Normal case — verifies output shapes and that engineered ratio columns are added:
def test_prepare_features_normal_case(sample_raw_data, mock_model_package):
    df, x_append, iso_scores = prepare_features(sample_raw_data, mock_model_package)

    assert x_append.shape[0] == len(sample_raw_data)
    expected_feat_count = len(mock_model_package['feature_columns'])
    assert x_append.shape[1] == expected_feat_count

    assert iso_scores.shape == (len(sample_raw_data),)
    assert 'print_ratio' in df.columns
    assert 'file_ratio' in df.columns
Unseen category — confirms that an unknown campus value raises a descriptive ValueError rather than silently producing a wrong encoding:
def test_prepare_features_unseen_category(sample_raw_data, mock_model_package):
    sample_raw_data.loc[0, 'employee_campus'] = 'Z'
    with pytest.raises(ValueError, match="contains unseen categories"):
        prepare_features(sample_raw_data, mock_model_package)
Inf ratio handling — forces a division-by-zero situation and checks that prepare_features replaces any resulting inf values so the feature matrix stays finite:
def test_prepare_features_handles_inf_ratios(sample_raw_data, mock_model_package):
    sample_raw_data['num_printed_pages_off_hours'] = 0
    df, x_append, iso_scores = prepare_features(sample_raw_data, mock_model_package)
    assert np.isfinite(x_append).all()

End-to-end prediction test

test_integrate.py mocks load_model so the test never touches the filesystem, then runs the full inference pipeline and checks that predictions are valid binary values:
def test_end_to_end_prediction(mock_model_package, sample_raw_data, mocker):
    mocker.patch('streamlit_app.load_model', return_value=mock_model_package)

    model = load_model()
    df, x_append, iso_scores = prepare_features(sample_raw_data, model)

    xgb_model = model['xgb_model']
    probs = xgb_model.predict_proba(x_append)[:, 1]
    preds = (probs >= model['best_threshold']).astype(int)

    assert len(preds) == len(sample_raw_data)
    assert set(preds).issubset({0, 1})
pytest-mock must be installed for the mocker fixture used in test_end_to_end_prediction. Install it with pip install pytest-mock.

Build docs developers (and LLMs) love