Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/mwalmsley/zoobot/llms.txt

Use this file to discover all available pages before exploring further.

Galaxy Zoo answers the most common morphology questions: does this galaxy have spiral arms, is it merging, and so on. But what if you want to answer a different question? You can finetune a pretrained Zoobot model to solve new tasks on new galaxy images. Zoobot has been trained to simultaneously answer all of the Galaxy Zoo questions. This provides an excellent starting point for other morphology-related tasks using very little new data. You will likely only need a few hundred labelled images — retraining (finetuning) this model requires much less time and data than training from scratch.
If you do want to train from scratch, Zoobot supports that too — see zoobot/benchmarks. But with many pretrained models available, you probably won’t need to.

What is Finetuning?

Fine-tuning (also known as transfer learning) is when a model trained on one task is partially retrained for a related task. This can drastically reduce the amount of labelled data needed. The high-level approach is:
  1. Load the pretrained Zoobot model, replacing the output head to match your new task.
  2. Retrain the model on your new task, typically with a low learning rate outside the new head.

The Three FinetuneableZoobot Classes

Zoobot provides three ready-to-use finetuning classes, each suited to a different type of prediction problem:
ClassLossUse Case
FinetuneableZoobotClassifierCross-entropyBinary or multi-class classification
FinetuneableZoobotRegressorMSE or MAEContinuous value regression
FinetuneableZoobotTreeDirichlet-MultinomialGalaxy Zoo-style vote count decision trees
All three classes share a common set of finetuning parameters. See Choosing Parameters for tuning advice.

Working Example

The following example walks through finetuning Zoobot to find ringed galaxies (binary classification). You can also follow along in the interactive Google Colab notebook.
1

Load the Pretrained Model

FinetuneableZoobotClassifier downloads the weights of a pretrained Zoobot model from HuggingFace and automatically replaces the head layer to suit a classification problem.
from zoobot.pytorch.training import finetune

model = finetune.FinetuneableZoobotClassifier(
  name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano',  # which pretrained model to download
  num_classes=2
)
  • name — the HuggingFace Hub identifier for the pretrained encoder. See the Pretrained Models page for available options.
  • num_classes=2 — the number of output classes (2 for binary classification).
2

Prepare Galaxy Data

Download and prepare a catalog of galaxy images with labels. The train_catalog is a pandas DataFrame with three required columns:
  • id_str — any string uniquely identifying each galaxy
  • file_loc — path to the image file (jpg, png, or FITS)
  • ring — the label column (0 or 1 in this binary example)
Then wrap it in a CatalogDataModule so PyTorch can load images and labels in batches:
from galaxy_datasets.pytorch.galaxy_datamodule import CatalogDataModule

datamodule = CatalogDataModule(
  label_cols=['ring'],
  catalog=train_catalog,
  batch_size=32
)
  • label_cols — a list of column names to use as labels. Must be a list even for a single label.
  • batch_size=32 — number of images per training batch. Reduce if you hit out-of-memory errors; increase if training is slow.
CatalogDataModule applies image augmentations (rotations, crops, etc.) by default. See Loading Data for customization options.
3

Run the Finetuning

Create a Lightning Trainer and call fit:
save_dir = '/path/to/save/results'
trainer = finetune.get_trainer(save_dir, accelerator='cpu', max_epochs=100)
trainer.fit(model, datamodule)
The trainer uses early stopping — training ends automatically if validation loss stops improving. It saves the best checkpoint to save_dir/checkpoints/.By default, get_trainer uses:
  • AdamW optimizer with cross-entropy loss (for the classifier)
  • Early stopping with patience=10 epochs
  • Top-1 checkpoint saving based on validation loss
4

Load the Saved Checkpoint

After training, reload the best model from its checkpoint for inference or further use:
best_checkpoint = trainer.checkpoint_callback.best_model_path
finetuned_model = finetune.FinetuneableZoobotClassifier.load_from_checkpoint(best_checkpoint)
The checkpoint contains all the new head weights as well as the finetuned encoder weights. You can share or redeploy it at any time.

Examples and Further Resources

Zoobot ships with several working finetuning examples:

Next Steps

  • Choosing Parameters — tune learning rate, layer decay, dropout, and schedulers
  • Loading Data — work with catalogs, HuggingFace datasets, FITS files, and custom augmentations

Build docs developers (and LLMs) love