Chapter 3 dives deep into classification using the MNIST handwritten-digit dataset as its primary playground. You will train a binary classifier to detect the digit “5”, then extend the techniques to multiclass and multilabel classification. The chapter covers the full evaluation toolkit: confusion matrices, precision and recall, the F1 score, precision/recall trade-offs, and ROC curves.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ageron/handson-ml3/llms.txt
Use this file to discover all available pages before exploring further.
What you’ll learn
- Loading the MNIST dataset with
fetch_openml - Splitting into train and test sets (the first 60,000 / last 10,000 instances)
- Training a binary classifier with
SGDClassifier - Evaluating accuracy with
cross_val_scoreand understanding why accuracy alone is misleading - Building and interpreting confusion matrices
- Computing precision, recall, and the F1 score
- Plotting precision/recall curves and ROC curves
- Comparing classifiers using the area under the ROC curve (AUC)
- Multiclass classification with
OvRandOvOstrategies - Error analysis: examining the confusion matrix at a per-class level
- Multilabel classification with
KNeighborsClassifier - Multioutput classification
Key concepts
MNIST dataset. MNIST contains 70,000 28×28 pixel grayscale images of handwritten digits (0–9). Scikit-Learn’sfetch_openml returns the data as a 70,000 × 784 array (each pixel is a feature) and a target array of digit labels as strings. The first 60,000 images form the training set; the last 10,000 form the test set and are already shuffled.
Confusion matrix. A confusion matrix tabulates the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) for a classifier. Precision = TP / (TP + FP) measures what fraction of positive predictions are correct. Recall = TP / (TP + FN) measures what fraction of actual positives were detected. The F1 score is the harmonic mean of precision and recall.
Precision/recall trade-off. Every classifier has an internal decision score threshold that determines whether an instance is labelled positive or negative. Lowering the threshold increases recall but decreases precision; raising it does the opposite. The precision/recall curve visualises this trade-off across all possible thresholds.
ROC curve. The Receiver Operating Characteristic (ROC) curve plots true positive rate (recall) against false positive rate (1 – specificity) across all thresholds. The area under the ROC curve (AUC) summarises overall classifier performance; random guessing yields AUC = 0.5 and a perfect classifier yields AUC = 1.0.
Code examples
Loading MNIST and preparing train/test sets:Running this notebook
Open in Colab
Download MNIST
The first code cell calls
fetch_openml('mnist_784'), which downloads ~55 MB from OpenML. This is cached automatically on subsequent runs.