Dataset Overview
The Bank Marketing dataset contains information about direct marketing campaigns (phone calls) of a Portuguese banking institution. Features:- Demographics:
age,job,marital,education - Financial:
default,balance,housing,loan - Campaign:
contact,day,month,duration,campaign - Previous campaigns:
pdays,previous,poutcome
deposit: Whether the client subscribed to a term deposit (yes/no)
Basic Usage
Run a classification experiment with default settings:What Happens
- Data Profiling: Analyzes schema, categorical features, class distribution
- Baseline Model: Trains Logistic Regression to establish baseline accuracy
- Iteration Loop: Gemini designs experiments considering class imbalance
- Report Generation: Creates narrative report with classification insights
Expected Output
With Classification Constraints
Optimize for specific classification metrics:- Command
- constraints_classification.md
- Precision-Focused
Impact of Constraints
With F1-focused constraints, Gemini will:- Balance precision and recall optimization
- Apply techniques like SMOTE or class weights
- Test different decision thresholds
- Focus on ensemble methods
Advanced Configuration
- Handle Imbalance
- Feature Engineering
- Cross-Validation
Interpreting Classification Results
Confusion Matrix
The final report includes a confusion matrix:- True Negatives (8,234): Correctly predicted “no”
- False Positives (145): Predicted “yes” but actual “no”
- False Negatives (856): Predicted “no” but actual “yes” (costly!)
- True Positives (677): Correctly predicted “yes”
Key Metrics
Metric Definitions
| Metric | Formula | Interpretation |
|---|---|---|
| Accuracy | (TP + TN) / Total | Overall correctness (misleading with imbalance) |
| Precision | TP / (TP + FP) | Of predicted “yes”, how many are correct? |
| Recall | TP / (TP + FN) | Of actual “yes”, how many did we catch? |
| F1-Score | 2 × (P × R) / (P + R) | Harmonic mean of precision and recall |
| ROC-AUC | Area under ROC curve | Overall discrimination ability |
Feature Importance
Gemini’s analysis typically reveals:Common Results
Typical progression for this dataset:| Iteration | Model | F1-Score | Precision | Recall | Strategy |
|---|---|---|---|---|---|
| Baseline | LogisticRegression | 0.3421 | 0.5234 | 0.2567 | Baseline |
| 1 | RandomForest | 0.4523 | 0.6234 | 0.3521 | Class weights |
| 2 | XGBClassifier | 0.5012 | 0.6432 | 0.4098 | scale_pos_weight |
| 3 | XGBClassifier + tuning | 0.5234 | 0.6521 | 0.4412 | Hyperparameter opt |
| 4 | LGBMClassifier | 0.5156 | 0.6289 | 0.4378 | Alternative booster |
| 5 | Ensemble | 0.5312 | 0.6678 | 0.4456 | Voting classifier |
Why These Results?
- Class imbalance is challenging: Only 13.7% positive class
- Duration is strong predictor: But may not be available before call
- Boosting handles imbalance well: XGBoost and LightGBM with class weights
- F1-score around 0.50-0.53: Typical for this dataset with proper handling
- Precision-recall tradeoff: Can tune threshold based on business needs
Threshold Optimization
For classification, you can optimize the decision threshold:Viewing Results in MLflow
- Compare F1-scores across iterations
- View confusion matrices
- Analyze precision-recall curves
- Download classification reports
- Compare feature importance across models
Next Steps
Regression Example
Learn about regression experiments
Advanced Constraints
Complex constraint configurations
Metrics
Understanding evaluation metrics
Class Imbalance
Handling imbalanced datasets