Chapter 3: Classification

Chapter 3 dives deep into classification using the MNIST handwritten-digit dataset as its primary playground. You will train a binary classifier to detect the digit “5”, then extend the techniques to multiclass and multilabel classification. The chapter covers the full evaluation toolkit: confusion matrices, precision and recall, the F1 score, precision/recall trade-offs, and ROC curves.

What you’ll learn

Loading the MNIST dataset with fetch_openml
Splitting into train and test sets (the first 60,000 / last 10,000 instances)
Training a binary classifier with SGDClassifier
Evaluating accuracy with cross_val_score and understanding why accuracy alone is misleading
Building and interpreting confusion matrices
Computing precision, recall, and the F1 score
Plotting precision/recall curves and ROC curves
Comparing classifiers using the area under the ROC curve (AUC)
Multiclass classification with OvR and OvO strategies
Error analysis: examining the confusion matrix at a per-class level
Multilabel classification with KNeighborsClassifier
Multioutput classification

Key concepts

MNIST dataset. MNIST contains 70,000 28×28 pixel grayscale images of handwritten digits (0–9). Scikit-Learn’s fetch_openml returns the data as a 70,000 × 784 array (each pixel is a feature) and a target array of digit labels as strings. The first 60,000 images form the training set; the last 10,000 form the test set and are already shuffled. Confusion matrix. A confusion matrix tabulates the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) for a classifier. Precision = TP / (TP + FP) measures what fraction of positive predictions are correct. Recall = TP / (TP + FN) measures what fraction of actual positives were detected. The F1 score is the harmonic mean of precision and recall. Precision/recall trade-off. Every classifier has an internal decision score threshold that determines whether an instance is labelled positive or negative. Lowering the threshold increases recall but decreases precision; raising it does the opposite. The precision/recall curve visualises this trade-off across all possible thresholds. ROC curve. The Receiver Operating Characteristic (ROC) curve plots true positive rate (recall) against false positive rate (1 – specificity) across all thresholds. The area under the ROC curve (AUC) summarises overall classifier performance; random guessing yields AUC = 0.5 and a perfect classifier yields AUC = 1.0.

Code examples

Loading MNIST and preparing train/test sets:

from sklearn.datasets import fetch_openml

mnist = fetch_openml('mnist_784', as_frame=False)
X, y = mnist.data, mnist.target

X_train, X_test, y_train, y_test = X[:60000], X[60000:], y[:60000], y[60000:]

Training a binary SGD classifier:

from sklearn.linear_model import SGDClassifier

y_train_5 = (y_train == '5')  # True for all 5s, False for all other digits
y_test_5 = (y_test == '5')

sgd_clf = SGDClassifier(random_state=42)
sgd_clf.fit(X_train, y_train_5)

Cross-validation accuracy:

from sklearn.model_selection import cross_val_score

cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="accuracy")
# array([0.95035, 0.96035, 0.9604 ])

Confusion matrix:

from sklearn.model_selection import cross_val_predict
from sklearn.metrics import confusion_matrix

y_train_pred = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3)
cm = confusion_matrix(y_train_5, y_train_pred)
# array([[53892,   687],
#        [ 1891,  3530]])

High accuracy can be misleading for imbalanced datasets. Only ~10% of MNIST images are “5”s, so a classifier that always predicts “not 5” achieves ~90% accuracy while detecting no 5s at all. Always examine precision and recall.

Running this notebook

Open in Colab

Download MNIST

The first code cell calls fetch_openml('mnist_784'), which downloads ~55 MB from OpenML. This is cached automatically on subsequent runs.

Run cells in order

Cells in the notebook build on previous results, so run them top-to-bottom.

Exercises

The chapter’s exercises ask you to (1) build a classifier that achieves over 97% accuracy on the MNIST test set, (2) write a function to shift MNIST images by one pixel in any direction to augment the training set, and (3) tackle the Titanic survival dataset from Kaggle. Solutions are provided in the notebook.

Part I: The Fundamentals

Part II: Neural Networks & Deep Learning

Chapter 3: Classification

What you’ll learn

Key concepts

Code examples

Running this notebook

Exercises

Build docs developers (and LLMs) love

Part I: The Fundamentals

Part II: Neural Networks & Deep Learning

Documentation Index

​What you’ll learn

​Key concepts

​Code examples

​Running this notebook

​Exercises

Build docs developers (and LLMs) love

What you’ll learn

Key concepts

Code examples

Running this notebook

Exercises