Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/itsubaki/autograd/llms.txt

Use this file to discover all available pages before exploring further.

The hook package provides gradient-processing functions that run just before an optimizer updates model parameters. Hooks let you apply regularization and gradient clipping without modifying optimizer or model code. Both functions return a value of type optimizer.Hookfunc(params []layer.Parameter) — and can be attached to any optimizer’s Hook slice.

WeightDecay

func WeightDecay(lambda float64) func(params []layer.Parameter)
Adds L2 regularization to each parameter’s gradient before the optimizer update:
grad = grad + λ × param
This is equivalent to penalizing large weight values, which discourages overfitting.
lambda
float64
required
The regularization coefficient. Typical values range from 1e-4 to 1e-2. A larger value applies stronger regularization.
WeightDecay modifies gradients in place (gradient-based L2 regularization). For decoupled weight decay — where the penalty is applied directly to the parameters rather than the gradients — use optimizer.AdamW instead.

ClipGrad

func ClipGrad(max float64) func(params []layer.Parameter)
Rescales all gradients so that their global L2 norm does not exceed max:
norm = √(Σ grad²)
if norm > max:
    grad = grad × (max / norm)
If the global norm is already within max, gradients are left unchanged.
max
float64
required
The maximum allowed global gradient norm. Typical values: 1.0 or 5.0.
Gradient clipping is especially useful when training RNNs and LSTMs, where exploding gradients are a common problem.

Attaching hooks to optimizers

Hooks are added to the Hook field present on each optimizer struct. They are applied in order by the optimizer.Params helper before the parameter update step.

SGD with weight decay

import (
    "github.com/itsubaki/autograd/hook"
    "github.com/itsubaki/autograd/optimizer"
)

opt := &optimizer.SGD{
    LearningRate: 0.01,
    Hook: []optimizer.Hook{
        hook.WeightDecay(1e-4),
    },
}

Adam with gradient clipping

opt := &optimizer.Adam{
    Alpha: 0.001,
    Beta1: 0.9,
    Beta2: 0.999,
    Hook: []optimizer.Hook{
        hook.ClipGrad(1.0),
    },
}

Combining multiple hooks

Hooks run in the order they appear in the slice.
opt := &optimizer.Adam{
    Alpha: 0.001,
    Beta1: 0.9,
    Beta2: 0.999,
    Hook: []optimizer.Hook{
        hook.WeightDecay(1e-4), // regularize first
        hook.ClipGrad(5.0),     // then clip
    },
}

Complete example

package main

import (
    "fmt"

    F "github.com/itsubaki/autograd/function"
    "github.com/itsubaki/autograd/hook"
    "github.com/itsubaki/autograd/model"
    "github.com/itsubaki/autograd/optimizer"
    "github.com/itsubaki/autograd/variable"
)

func main() {
    mlp := model.NewMLP([]int{64, 64, 1})

    opt := &optimizer.Adam{
        Alpha: 0.001,
        Beta1: 0.9,
        Beta2: 0.999,
        Hook: []optimizer.Hook{
            hook.WeightDecay(1e-4),
            hook.ClipGrad(1.0),
        },
    }

    x := variable.New(1.0, 2.0, 3.0)
    t := variable.New(0.0, 1.0, 0.0)

    for i := range 200 {
        y := mlp.Forward(x)
        loss := F.MeanSquaredError(y, t)

        mlp.Cleargrads()
        loss.Backward()
        opt.Update(mlp)

        if i%20 == 0 {
            fmt.Println("loss:", loss.Data)
        }
    }
}

See also

Build docs developers (and LLMs) love