Galaxy Zoo answers the most common morphology questions: does this galaxy have spiral arms, is it merging, and so on. But what if you want to answer a different question? You can finetune a pretrained Zoobot model to solve new tasks on new galaxy images. Zoobot has been trained to simultaneously answer all of the Galaxy Zoo questions. This provides an excellent starting point for other morphology-related tasks using very little new data. You will likely only need a few hundred labelled images — retraining (finetuning) this model requires much less time and data than training from scratch.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/mwalmsley/zoobot/llms.txt
Use this file to discover all available pages before exploring further.
If you do want to train from scratch, Zoobot supports that too — see
zoobot/benchmarks. But with many pretrained models available, you probably won’t need to.What is Finetuning?
Fine-tuning (also known as transfer learning) is when a model trained on one task is partially retrained for a related task. This can drastically reduce the amount of labelled data needed. The high-level approach is:- Load the pretrained Zoobot model, replacing the output head to match your new task.
- Retrain the model on your new task, typically with a low learning rate outside the new head.
The Three FinetuneableZoobot Classes
Zoobot provides three ready-to-use finetuning classes, each suited to a different type of prediction problem:| Class | Loss | Use Case |
|---|---|---|
FinetuneableZoobotClassifier | Cross-entropy | Binary or multi-class classification |
FinetuneableZoobotRegressor | MSE or MAE | Continuous value regression |
FinetuneableZoobotTree | Dirichlet-Multinomial | Galaxy Zoo-style vote count decision trees |
Working Example
The following example walks through finetuning Zoobot to find ringed galaxies (binary classification). You can also follow along in the interactive Google Colab notebook.Load the Pretrained Model
FinetuneableZoobotClassifier downloads the weights of a pretrained Zoobot model from HuggingFace and automatically replaces the head layer to suit a classification problem.name— the HuggingFace Hub identifier for the pretrained encoder. See the Pretrained Models page for available options.num_classes=2— the number of output classes (2 for binary classification).
Prepare Galaxy Data
Download and prepare a catalog of galaxy images with labels. The
train_catalog is a pandas DataFrame with three required columns:id_str— any string uniquely identifying each galaxyfile_loc— path to the image file (jpg, png, or FITS)ring— the label column (0 or 1 in this binary example)
CatalogDataModule so PyTorch can load images and labels in batches:label_cols— a list of column names to use as labels. Must be a list even for a single label.batch_size=32— number of images per training batch. Reduce if you hit out-of-memory errors; increase if training is slow.
CatalogDataModule applies image augmentations (rotations, crops, etc.) by default. See Loading Data for customization options.Run the Finetuning
Create a Lightning The trainer uses early stopping — training ends automatically if validation loss stops improving. It saves the best checkpoint to
Trainer and call fit:save_dir/checkpoints/.By default, get_trainer uses:- AdamW optimizer with cross-entropy loss (for the classifier)
- Early stopping with
patience=10epochs - Top-1 checkpoint saving based on validation loss
Examples and Further Resources
Zoobot ships with several working finetuning examples:- Google Colab notebook — recommended starting point
finetune_binary_classification.py— script version of the Colab notebookfinetune_counts_full_tree.py— finetuning on a GZ-style decision tree
Next Steps
- Choosing Parameters — tune learning rate, layer decay, dropout, and schedulers
- Loading Data — work with catalogs, HuggingFace datasets, FITS files, and custom augmentations