Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/mwalmsley/zoobot/llms.txt

Use this file to discover all available pages before exploring further.

Zoobot is a PyTorch deep learning library built for galaxy morphology classification and analysis. Trained on over 107 million volunteer votes collected through the Galaxy Zoo citizen science project, Zoobot’s pretrained encoders capture rich representations of galaxy structure that transfer remarkably well to new tasks — even when you have only a few hundred labelled examples. Whether you need to identify rings, bars, merging pairs, or predict continuous morphological quantities, Zoobot gives you a production-quality starting point without training from scratch.

What Problem Does Zoobot Solve?

Labelling galaxy images is expensive. Professional astronomers and citizen scientists can only inspect so many images by hand, yet modern sky surveys like DESI now catalogue tens of millions of galaxies. Zoobot addresses this by providing encoders that have already learned powerful, general-purpose galaxy representations. You supply a small labelled dataset; Zoobot supplies decades of collective morphological knowledge, baked into pretrained weights.
Zoobot 2.0 introduces larger, more capable pretrained models backed by the GZ Evo dataset, a unified collection of 823k galaxy images and 107M volunteer labels. See the Scaling Laws for Galaxy Images paper for details.

How Zoobot Works

Zoobot’s encoders are pretrained on the GZ Evo dataset, which aggregates volunteer classifications from five major Galaxy Zoo campaigns:
SurveyApproximate Size
Galaxy Zoo 2 (GZ2)~240k galaxies
Galaxy Zoo Hubble (GZ Hubble)~100k galaxies
Galaxy Zoo CANDELS (GZ CANDELS)~50k galaxies
Galaxy Zoo DECaLS / DESI (GZD)~310k galaxies
Galaxy Zoo Cosmic Dawn (HSC H2O)~120k galaxies
Training on this breadth of surveys — spanning optical, space-based, and near-infrared imaging — produces encoders that generalise across instruments, resolutions, and redshifts. When you finetune on your own data, the encoder’s weights are updated (alongside a small classification or regression head) using PyTorch Lightning and the AdamW optimizer with layer-wise learning-rate decay.

Finetuning Modes

Zoobot exposes three ready-to-use finetuning classes, all inheriting from a common FinetuneableZoobotAbstract base:

FinetuneableZoobotClassifier

Multi-class or binary classification with cross-entropy loss. Ideal for tasks like ring/not-ring detection or morphological type labelling. Reports accuracy during training.

FinetuneableZoobotRegressor

Single-value regression with MSE or MAE loss. Useful for predicting continuous quantities such as Sérsic index, ellipticity, or concentration. Reports RMSE during training.

FinetuneableZoobotTree

Vote-count / decision-tree prediction using the Dirichlet-Multinomial loss introduced in GZ DECaLS. Designed for reproducing full Galaxy Zoo answer distributions.
All three classes share a common set of hyperparameters — learning_rate, layer_decay, weight_decay, head_dropout_prob, training_mode — and can be loaded from a HuggingFace Hub name, a local checkpoint, or an in-memory PyTorch module.

Pretrained Architectures

Zoobot 2.0 ships pretrained weights for several modern CNN and transformer architectures, all available on HuggingFace:
ArchitectureExample Hub Name
ConvNeXT (nano / tiny / small / base)hf_hub:mwalmsley/zoobot-encoder-convnext_nano
MaxViThf_hub:mwalmsley/zoobot-encoder-maxvit_tiny_tf_224
EfficientNetV2hf_hub:mwalmsley/zoobot-encoder-efficientnet_b0
ResNethf_hub:mwalmsley/zoobot-encoder-resnet50
Greyscale variants are also available for single-channel imaging surveys — use the hf_hub:mwalmsley/zoobot-encoder-greyscale-convnext_nano family.
For most finetuning tasks, convnext_nano offers the best balance of speed and accuracy. Larger models (convnext_base, maxvit) tend to perform better when you have more labelled data or are working with higher-resolution images.

Scientific Impact

Zoobot is not a research prototype — it is actively deployed in production astronomical pipelines:
  • GZ DECaLS — classified detailed morphologies for 314,000 galaxies in the DESI Legacy Imaging Surveys
  • GZ DESI — scaled to 8.7 million galaxies, one of the largest morphology catalogues ever produced
  • Euclid pipeline — Zoobot powers the OU-MER morphology catalogue for ESA’s Euclid space mission (Q1 data, 2025), including strong-lensing discovery, bar fraction measurements, and dwarf galaxy census studies
The underlying methodology is described in the JOSS paper and has been cited across dozens of peer-reviewed publications.

Where to Go Next

Quickstart

Finetune a pretrained model to find ringed galaxies in under 20 lines of Python.

Pretrained Models

Browse all available encoder architectures and their HuggingFace Hub names.

Finetuning Guide

Deep-dive into finetuning options: training modes, schedulers, and class weights.

API Reference

Full reference for FinetuneableZoobotClassifier, Regressor, Tree, and utilities.

Build docs developers (and LLMs) love