Gradient descent is the core optimization loop in autograd: compute a forward pass to get a loss value, callDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/itsubaki/autograd/llms.txt
Use this file to discover all available pages before exploring further.
Backward() to populate gradients, update the variables, and repeat.
The training loop
Every gradient descent iteration follows the same three-step pattern:Clear gradients
Call
Cleargrad() on each parameter before computing a new backward pass. Without this, gradients accumulate across iterations instead of reflecting only the current forward pass.Forward and backward pass
Evaluate the function to get a scalar output, then call
Backward() to propagate gradients back through the computation graph.Cleargrad() resets variable.Grad to nil. Forgetting this causes gradients to accumulate across iterations, producing incorrect updates.Rosenbrock function
The Rosenbrock function is a classic benchmark for optimization algorithms. Its global minimum is at(1, 1) with a value of 0.
(0, 2), gradient descent with lr=0.001 converges toward the minimum over 10,000 iterations.
update function uses tensor.F2 to apply an element-wise operation over two tensors — here, the gradient descent step a - lr*b.
Output at every 1,000th iteration:
(1, 1).
Matyas function
The Matyas function has a global minimum at(0, 0). It is useful for testing because both partial derivatives can be verified analytically.
(1, 1) are ∂f/∂x = 2*0.26*1 - 0.48*1 = 0.04 and ∂f/∂y = 0.04, which matches the output.
Why Cleargrad matters
Variables accumulate gradients by addition when they appear more than once in a computation graph. Consider:Cleargrad() at the start of each training iteration.
Next steps
Deep Learning
Use MLP and LSTM models with SGD and Adam optimizers for end-to-end training.
Higher-Order Gradients
Compute gradients of gradients with CreateGraph for Newton’s method and meta-learning.
Optimizers
Reference for SGD, Adam, AdamW, and Momentum optimizers.
Variables
Understand how variables hold data and gradients in the computation graph.