Chapter 6 covers decision trees—among the most interpretable ML algorithms. You will train trees for classification and regression, visualise their structure, understand the CART algorithm that builds them, and learn how to control their complexity through pruning parameters.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ageron/handson-ml3/llms.txt
Use this file to discover all available pages before exploring further.
What you’ll learn
- Training a
DecisionTreeClassifieron the iris dataset - Visualising a tree with
export_graphvizand Graphviz - How CART builds trees: recursive binary splitting using a purity criterion
- Gini impurity versus entropy as splitting criteria
- Predicting class probabilities with
predict_proba - Regularisation parameters:
max_depth,min_samples_split,min_samples_leaf, andmax_leaf_nodes - Instability of decision trees and their sensitivity to rotation
DecisionTreeRegressorfor regression tasks
Key concepts
The CART algorithm. CART (Classification and Regression Trees) builds trees by recursively splitting the training data. At each node it searches for the feature and threshold that produce the purest split, minimising a weighted sum of impurity in the child nodes. The process continues until a stopping criterion is met (e.g., maximum depth reached, or the node is already pure). Gini impurity and entropy. Gini impurity measures the probability that a randomly chosen instance would be misclassified if labelled according to the class distribution at a node. Entropy measures the average information content. In practice, both criteria produce similar trees; Gini impurity is faster to compute. Pruning. Unpruned decision trees overfit the training data. The most direct way to regularise a tree is to restrict its depth withmax_depth. Other useful constraints include min_samples_split (minimum number of samples required to split a node) and max_leaf_nodes (maximum number of leaf nodes allowed).
Decision tree regression. DecisionTreeRegressor works identically to the classifier but predicts a value at each leaf (the mean of training instances at that leaf) rather than a class. The same overfitting risk applies, and max_depth is again the primary regularisation knob.
Code examples
Training and visualising a decision tree on iris:Decision trees are highly sensitive to small variations in the training data—adding or removing a single instance can produce a very different tree. This instability is one motivation for ensemble methods like random forests (Chapter 7).
Running this notebook
Open in Colab
Install Graphviz (optional)
To render the exported
.dot files as images, install Graphviz: apt-get install graphviz (Linux / Colab) or follow the Graphviz download page. The notebook includes inline SVG renderings that work without Graphviz.