Beyond the 19 main chapter notebooks, the handson-ml3 repository includes supplementary notebooks that go deeper on selected topics. This page describes two of them — automatic differentiation and extra neural network architectures — plus one additional notebook on gradient descent comparisons. These notebooks are labelled as appendix material in the book.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ageron/handson-ml3/llms.txt
Use this file to discover all available pages before exploring further.
Automatic differentiation (Appendix D)
Toy implementations of numeric differentiation, forward-mode autodiff using dual numbers, and reverse-mode autodiff (backpropagation). Includes a full TensorFlow
GradientTape example. Open in Colab to run interactively.Extra ANN architectures
Quick overviews of historically important neural network architectures: Hopfield networks, Boltzmann machines, restricted Boltzmann machines (RBMs), and deep belief nets. Open in Colab to run interactively.
Gradient descent comparison
Visual comparison of gradient descent variants — batch, stochastic, and mini-batch — on the same loss surface. Useful for building intuition before studying optimizers in depth.
Automatic differentiation
Theextra_autodiff.ipynb notebook explains how modern deep-learning frameworks compute gradients automatically. It starts from first principles and builds up to TensorFlow’s GradientTape.
The problem
Computing gradients analytically for a neural network with millions of parameters is impractical. Consider even a simple function:1. Numeric differentiation
Approximate the derivative using the finite-difference formula. Easy to implement but requires one forward pass per parameter:2. Forward-mode autodiff (dual numbers)
Represent every number asa + bε where ε² = 0. The b component automatically carries the derivative through every arithmetic operation. Efficient when there are few inputs and many outputs.
3. Reverse-mode autodiff (backpropagation)
Evaluate the function forward and record operations in a computation graph. Then propagate gradients backwards using the chain rule. This requires only one forward pass and one backward pass, regardless of the number of parameters — which is why it is used in all major deep-learning frameworks.TensorFlow GradientTape
TensorFlow implements reverse-mode autodiff through tf.GradientTape:
Extra neural network architectures
Theextra_ann_architectures.ipynb notebook surveys architectures that predate modern deep learning but are still referenced in the literature and occasionally used in practice.
Hopfield networks
Introduced by W. A. Little (1974) and popularised by J. Hopfield (1982). Fully connected associative memory networks that can store and recall patterns. Memory capacity is approximately 14% of the number of neurons, and spurious (unlearned) patterns can emerge. Largely superseded for practical tasks but historically important.Boltzmann machines
Invented in 1985 by Geoffrey Hinton and Terrence Sejnowski. Fully connected stochastic ANNs that learn a probability distribution over binary inputs. Training is computationally expensive due to the need to reach thermal equilibrium.Restricted Boltzmann machines (RBMs)
A simplified Boltzmann machine with no connections within the visible layer or within the hidden layer — only connections between layers. The restriction makes training tractable via contrastive divergence. RBMs were the building block of deep belief nets.Deep belief nets (DBNs)
Stack of RBMs trained greedily one layer at a time. DBNs were state of the art in deep learning until around 2012, when backpropagation-trained deep networks trained with large datasets and GPUs overtook them. Still the subject of active research.These architectures are covered in the extra notebook rather than the main chapters because they are less commonly used in contemporary ML practice. However, understanding them provides useful historical context and helps explain why modern architectures were designed the way they are.