This guide walks you through finetuning a pretrained Zoobot encoder to detect ringed galaxies — a classic binary classification task. The same pattern applies to any morphological classification or regression problem. By the end you will have a trained model and predictions saved to disk.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/mwalmsley/zoobot/llms.txt
Use this file to discover all available pages before exploring further.
The fastest way to get started is the interactive Google Colab notebook, which provides a free GPU and requires no local setup:
Open in Colab →
Step-by-Step Walkthrough
Install Zoobot
Install Zoobot and its PyTorch dependencies with a single pip command:This installs Zoobot along with PyTorch (≥ 2.7.0), torchvision, Lightning (≥ 2.2.5), timm (≥ 1.0.15), and all other required packages.For Google Colab (where PyTorch is pre-installed), use the lighter variant instead:See the Installation guide for GPU / CUDA setup and source installation.
Prepare Your Catalog
Zoobot reads galaxy images from a pandas DataFrame (or a CSV file loaded into one). Your DataFrame must contain at least these columns:
Example CSV structure:For regression tasks, the label column should contain continuous float values. For vote-count tasks (
| Column | Type | Description |
|---|---|---|
id_str | str | Unique string identifier for each galaxy |
file_loc | str | Absolute path to the image file (.jpg, .png, or .fits) |
ring (or any label name) | int / float | Your label — e.g. 0 = not a ring, 1 = ring |
FinetuneableZoobotTree), you will need one column per answer in your decision tree schema.Load a Pretrained Model
Load a pretrained Zoobot encoder directly from HuggingFace Hub. The encoder weights are downloaded automatically and cached locally:
name— the HuggingFace Hub identifier for the pretrained encoder. See Pretrained Models for all available options.num_classes=2— binary classification (ring / not ring). Set to the number of classes in your problem.
FinetuneableZoobotRegressor instead:Create a Data Module
Zoobot uses
CatalogDataModule from the companion galaxy-datasets package to handle image loading, augmentation, and batching:label_cols— list of column names containing your labels. Must match the label column(s) expected by your model.catalog— the full DataFrame;CatalogDataModulehandles the train/validation split automatically.batch_size— number of images per batch. Reduce if you run out of GPU memory.
Finetune the Model
Create a PyTorch Lightning trainer and start finetuning. The trainer saves checkpoints and stops early if validation loss stops improving:
get_trainer configures sensible defaults out of the box:- Early stopping — stops after 10 epochs with no improvement in validation loss (configurable via
patience) - Model checkpointing — saves the best checkpoint to
./results/checkpoints/ - Learning-rate monitoring — logs the LR per epoch
- Auto device selection — automatically uses GPU if available
get_trainer:Make Predictions on New Galaxies
After training, run the finetuned model on an unlabelled catalog to generate predictions:
label_cols— used to name the output columns in the saved CSV.inference_transform— deterministic (no augmentation) transform pipeline applied to each image before passing it to the model.save_loc— path where the predictions CSV will be written. Each row corresponds to a galaxy inunlabelled_df, with softmax probabilities for each class.
Complete Script
Here is the full end-to-end finetuning script for reference:Next Steps
Finetuning Guide
Learn about training modes, learning-rate schedules, class weights, and advanced finetuning strategies.
Choosing Parameters
Guidance on selecting the right encoder architecture, batch size, and learning rate for your dataset size.
Pretrained Models
Full list of available encoder architectures and their HuggingFace Hub names.
Loading Data
How to structure your catalog, handle FITS files, and use custom data augmentations.