After training your triage models, you can run batch predictions on new ticket data. This guide explains how to use the prediction script to classify tickets and export results.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/JoAmps/rgt-assignment/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
- Trained models in
artifacts/directory (see Training Models) - Test data:
tickets_test.csvin the project root - Required columns:
subject,body
Prediction Command
Prepare test data
Create or verify
tickets_test.csv with at minimum these columns:subject: Ticket subject linebody: Ticket description or body text
Run predictions
- Load trained models from
artifacts/ - Preprocess test data
- Run category and priority predictions
- Compute confidence scores
- Save results to CSV
Output Format
The prediction script generates a CSV file with the original test data plus four additional columns:| Column | Description |
|---|---|
predicted_category | Predicted support category (e.g., “billing”, “technical”) |
category_confidence | Confidence score (0.0 to 1.0) for category prediction |
predicted_priority | Predicted priority level (e.g., “low”, “medium”, “high”) |
priority_confidence | Confidence score (0.0 to 1.0) for priority prediction |
Example Output
How Predictions Work
The prediction pipeline executes the following steps (seesrc/ml/predict.py:125):
- Load Models: Loads trained pipelines from
artifacts/directory - Read Test Data: Loads
tickets_test.csvinto a DataFrame - Preprocess: Normalizes text (lowercase, strip whitespace, fill NaN values)
- Feature Extraction: Applies TF-IDF transformations via trained pipelines
- Predict: Generates category and priority predictions
- Confidence Scores: Computes max probability from
predict_proba - Export: Saves augmented DataFrame to
reports/predictions.csv
Confidence Scores
Confidence scores indicate model certainty (seesrc/ml/predict.py:73):
- High confidence (> 0.8): Model is very certain
- Medium confidence (0.5 - 0.8): Moderate certainty
- Low confidence (< 0.5): Uncertain prediction, may require manual review
ConstantPredictor (single-class fallback), confidence is always 1.0.
Custom Prediction Paths
You can customize input/output paths programmatically:Using Predictions in API
The trained models are automatically loaded by the/triage API endpoint:
Preprocessing Behavior
The prediction script applies lightweight preprocessing (seesrc/ml/predict.py:46):
- Convert text to lowercase
- Strip leading/trailing whitespace
- Fill NaN values with empty strings
Troubleshooting
FileNotFoundError: Model file not found
FileNotFoundError: Model file not found
Train models first using:
KeyError: 'subject' or 'body' column missing
KeyError: 'subject' or 'body' column missing
Ensure your test CSV has both
subject and body columns. Other columns are optional.Low confidence scores across all predictions
Low confidence scores across all predictions
This may indicate:
- Test data distribution differs from training data
- Models need retraining with more diverse examples
- Feature vocabulary mismatch
All predictions return the same value
All predictions return the same value
This occurs when models were trained on single-class data and fell back to
ConstantPredictor.