Finetuning Zoobot for Custom Galaxy Morphology Tasks

Galaxy Zoo answers the most common morphology questions: does this galaxy have spiral arms, is it merging, and so on. But what if you want to answer a different question? You can finetune a pretrained Zoobot model to solve new tasks on new galaxy images. Zoobot has been trained to simultaneously answer all of the Galaxy Zoo questions. This provides an excellent starting point for other morphology-related tasks using very little new data. You will likely only need a few hundred labelled images — retraining (finetuning) this model requires much less time and data than training from scratch.

If you do want to train from scratch, Zoobot supports that too — see zoobot/benchmarks. But with many pretrained models available, you probably won’t need to.

What is Finetuning?

Fine-tuning (also known as transfer learning) is when a model trained on one task is partially retrained for a related task. This can drastically reduce the amount of labelled data needed. The high-level approach is:

Load the pretrained Zoobot model, replacing the output head to match your new task.
Retrain the model on your new task, typically with a low learning rate outside the new head.

The Three FinetuneableZoobot Classes

Zoobot provides three ready-to-use finetuning classes, each suited to a different type of prediction problem:

Class	Loss	Use Case
`FinetuneableZoobotClassifier`	Cross-entropy	Binary or multi-class classification
`FinetuneableZoobotRegressor`	MSE or MAE	Continuous value regression
`FinetuneableZoobotTree`	Dirichlet-Multinomial	Galaxy Zoo-style vote count decision trees

All three classes share a common set of finetuning parameters. See Choosing Parameters for tuning advice.

Working Example

The following example walks through finetuning Zoobot to find ringed galaxies (binary classification). You can also follow along in the interactive Google Colab notebook.

Load the Pretrained Model

FinetuneableZoobotClassifier downloads the weights of a pretrained Zoobot model from HuggingFace and automatically replaces the head layer to suit a classification problem.

from zoobot.pytorch.training import finetune

model = finetune.FinetuneableZoobotClassifier(
  name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano',  # which pretrained model to download
  num_classes=2
)

name — the HuggingFace Hub identifier for the pretrained encoder. See the Pretrained Models page for available options.
num_classes=2 — the number of output classes (2 for binary classification).

Prepare Galaxy Data

Download and prepare a catalog of galaxy images with labels. The train_catalog is a pandas DataFrame with three required columns:

id_str — any string uniquely identifying each galaxy
file_loc — path to the image file (jpg, png, or FITS)
ring — the label column (0 or 1 in this binary example)

Then wrap it in a CatalogDataModule so PyTorch can load images and labels in batches:

from galaxy_datasets.pytorch.galaxy_datamodule import CatalogDataModule

datamodule = CatalogDataModule(
  label_cols=['ring'],
  catalog=train_catalog,
  batch_size=32
)

label_cols — a list of column names to use as labels. Must be a list even for a single label.
batch_size=32 — number of images per training batch. Reduce if you hit out-of-memory errors; increase if training is slow.

CatalogDataModule applies image augmentations (rotations, crops, etc.) by default. See Loading Data for customization options.

Run the Finetuning

Create a Lightning Trainer and call fit:

save_dir = '/path/to/save/results'
trainer = finetune.get_trainer(save_dir, accelerator='cpu', max_epochs=100)
trainer.fit(model, datamodule)

The trainer uses early stopping — training ends automatically if validation loss stops improving. It saves the best checkpoint to save_dir/checkpoints/.By default, get_trainer uses:

AdamW optimizer with cross-entropy loss (for the classifier)
Early stopping with patience=10 epochs
Top-1 checkpoint saving based on validation loss

Load the Saved Checkpoint

After training, reload the best model from its checkpoint for inference or further use:

best_checkpoint = trainer.checkpoint_callback.best_model_path
finetuned_model = finetune.FinetuneableZoobotClassifier.load_from_checkpoint(best_checkpoint)

The checkpoint contains all the new head weights as well as the finetuned encoder weights. You can share or redeploy it at any time.

Examples and Further Resources

Zoobot ships with several working finetuning examples:

Google Colab notebook — recommended starting point
finetune_binary_classification.py — script version of the Colab notebook
finetune_counts_full_tree.py — finetuning on a GZ-style decision tree

Next Steps

Choosing Parameters — tune learning rate, layer decay, dropout, and schedulers
Loading Data — work with catalogs, HuggingFace datasets, FITS files, and custom augmentations

Get Started

Finetuning Guide

Pretrained Models

Training from Scratch

Finetuning Zoobot for Custom Galaxy Morphology Tasks

What is Finetuning?

The Three FinetuneableZoobot Classes

Working Example

Examples and Further Resources

Next Steps

Build docs developers (and LLMs) love

Get Started

Finetuning Guide

Pretrained Models

Training from Scratch

Documentation Index

​What is Finetuning?

​The Three FinetuneableZoobot Classes

​Working Example

​Examples and Further Resources

​Next Steps

Build docs developers (and LLMs) love

What is Finetuning?

The Three FinetuneableZoobot Classes

Working Example

Examples and Further Resources

Next Steps