Choosing Zoobot Finetuning Parameters for Best Results

learning_rate (default: 1e-4)

Learning rate sets how fast the model parameters are updated during training.Zoobot uses the adaptive optimizer AdamW. Adaptive optimizers adjust the learning rate for each parameter based on the mean and variance of previous gradients, which means you don’t need to tune the learning rate as carefully as you would with a fixed-rate optimizer like SGD.1e-4 is a good starting point for most tasks.

If the model is not learning, try increasing the learning rate.
If the training loss varies wildly, or the train loss decreases much faster than the validation loss (a sign of overfitting), try decreasing it.
Using training_mode='full' often requires a lower learning rate than training_mode='head_only', because more parameters are being updated per batch.

model = finetune.FinetuneableZoobotClassifier(
    name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano',
    num_classes=2,
    learning_rate=1e-4  # default
)

training_mode ('full' vs 'head_only')

Deep learning models are often divided into an encoder (which extracts features from images) and a head (which makes predictions from those features). In Zoobot, when you load FinetuneableZoobotClassifier(name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano', ...), the encoder is the ConvNeXt model.training_mode controls which parts of the model are updated during training:

Mode	Description	Also known as
`'full'` (default)	Trains both encoder and head end-to-end	End-to-end finetuning
`'head_only'`	Freezes the encoder; trains only the new head	Transfer learning, linear probing

End-to-end finetuning ('full') can give better results, but often requires more labelled data (or a smaller pretrained model) and more careful tuning of the learning rate and other hyperparameters.Linear probing ('head_only') is a useful starting point when you have very little data, or as a quick sanity check before committing to full finetuning.

model = finetune.FinetuneableZoobotClassifier(
    name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano',
    num_classes=2,
    training_mode='full'  # or 'head_only'
)

layer_decay (default: 0.75)

The common intuition in deep learning is that lower layers (closer to the input) learn simple, general features, while higher layers (closer to the output) learn more complex, task-specific features. It is often beneficial to use a lower learning rate for lower layers that have already learned to recognise basic galaxy features.Layer decay reduces the learning rate for each successive encoder block from the top down.For example, with learning_rate=1e-4 and layer_decay=0.75 (the default):

Block	Learning Rate
Highest (nearest output)	`1e-4 × (0.75 ** 0)` = `1e-4`
Second-highest	`1e-4 × (0.75 ** 1)` = `7.5e-5`
Third-highest	`1e-4 × (0.75 ** 2)` = `5.6e-5`
… and so on	…

The head always uses the full learning rate, regardless of layer decay.In the extreme cases:

layer_decay=0 — disables learning in all encoder blocks except the topmost (0 ** 0 = 1).
layer_decay=1 — gives every block the same learning rate (no decay).

This is slightly counterintuitive: a lower layer_decay value means a faster learning rate reduction for lower blocks.

model = finetune.FinetuneableZoobotClassifier(
    name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano',
    num_classes=2,
    layer_decay=0.75  # default
)

weight_decay (default: 0.05)

Weight decay is a regularization term that penalizes large weight values. When using Zoobot’s default AdamW optimizer, it is closely related to L2 regularization (see Decoupled Weight Decay Regularization for the subtlety).

Increasing weight decay strengthens the penalty on large weights, which can help prevent overfitting.
Decreasing it can help if the model is underfitting or training too slowly.

By default, Zoobot uses a small weight decay of 0.05. The head does not use weight decay.

model = finetune.FinetuneableZoobotClassifier(
    name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano',
    num_classes=2,
    weight_decay=0.05  # default
)

head_dropout_prob (default: 0.5)

Dropout is a regularization technique that randomly sets a fraction of activations to zero during training. This prevents the model from becoming overly dependent on any single feature, which helps guard against overfitting.Zoobot applies dropout before the final linear output layer in the head. The default probability is 0.5 (i.e. 50% of activations are zeroed per forward pass during training).

If the model overfits, try increasing head_dropout_prob.
If the model underfits or the head is not learning, try decreasing it.

model = finetune.FinetuneableZoobotClassifier(
    name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano',
    num_classes=2,
    head_dropout_prob=0.5  # default
)

scheduler_kwargs (default: None)

Gradually reducing the learning rate during training can slightly improve results by finding a better minimum near convergence. This is called learning rate scheduling.Zoobot supports the full suite of timm learning rate schedulers. Pass a dict of scheduler arguments to scheduler_kwargs:

scheduler_kwargs = {
    'name': 'cosine',
    'warmup_epochs': 5,
    'max_epochs': 100
}

model = finetune.FinetuneableZoobotClassifier(
    name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano',
    num_classes=2,
    scheduler_kwargs=scheduler_kwargs
)

By default, no scheduler is used (scheduler_kwargs=None). We recommend only adding a scheduler after you have already tuned the other parameters, as it adds another degree of freedom to your search.

Get Started

Finetuning Guide

Pretrained Models

Training from Scratch

Choosing Zoobot Finetuning Parameters for Best Results

Parameters

Build docs developers (and LLMs) love

Get Started

Finetuning Guide

Pretrained Models

Training from Scratch

Documentation Index

​Parameters

Build docs developers (and LLMs) love

Parameters