Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/mwalmsley/zoobot/llms.txt

Use this file to discover all available pages before exploring further.

Zoobot provides FinetuneableZoobotClassifier, FinetuneableZoobotRegressor, and FinetuneableZoobotTree for the most common finetuning scenarios. But what if your task doesn’t fit neatly into classification, regression, or vote-count prediction? This guide covers three advanced integration patterns:
  1. Using Zoobot’s encoder directly in your own pipeline
  2. Extracting frozen galaxy representations at scale
  3. Subclassing FinetuneableZoobotAbstract to implement a custom head and loss

Using Zoobot’s Encoder Directly

Because Zoobot encoders are standard timm models, you can plug them into any PyTorch pipeline.

Method 1: Via a FinetuneableZoobot Class

Load any FinetuneableZoobot class and access its .encoder attribute:
from zoobot.pytorch.training.finetune import FinetuneableZoobotClassifier

model = FinetuneableZoobotClassifier(name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano')
encoder = model.encoder

Method 2: Via timm Directly

Because Zoobot encoders are published as timm-compatible HuggingFace Hub models, you can load them without Zoobot at all:
import timm

encoder = timm.create_model(
    'hf_hub:mwalmsley/zoobot-encoder-convnext_nano',
    pretrained=True,
    num_classes=0  # removes the classification head, returns raw features
)
You can then use encoder like any other timm model — wrap it in a custom head, combine it with other networks, or pass it to a contrastive learning framework.
If you only need frozen feature vectors without any fine-tuning, use the Extracting Frozen Representations approach below instead — it handles batching and I/O boilerplate automatically.

Extracting Frozen Representations

Once you have a pretrained or finetuned Zoobot encoder, you can store its output vectors as fixed-dimensional features for downstream tasks like similarity search, anomaly detection, or clustering. Zoobot includes ZoobotEncoder, a PyTorch Lightning LightningModule wrapper that lets you pass the encoder to the same predict_on_catalog.predict() utility used for full model inference — handling batching, looping, and file I/O automatically.
from zoobot.pytorch.training.representations import ZoobotEncoder
from zoobot.pytorch.predictions import predict_on_catalog

# Wrap the encoder in a LightningModule
lightning_encoder = ZoobotEncoder.load_from_name(
    'hf_hub:mwalmsley/zoobot-encoder-convnext_nano'
)

# Run inference exactly as you would with a full Zoobot model
predict_on_catalog.predict(
    catalog,
    lightning_encoder,
    n_samples=1,
    label_cols=label_cols,
    save_loc=save_loc,
    datamodule_kwargs=datamodule_kwargs,
    trainer_kwargs=trainer_kwargs
)
See zoobot/pytorch/examples/representations for a complete working example.

Dimensionality Reduction

Zoobot representations are typically high-dimensional (e.g. 1280 for EfficientNetB0) and therefore highly redundant. We recommend using PCA to compress them to a more manageable size (e.g. 40 dimensions) while retaining most of the information. This was the approach taken in the Practical Morphology Tools paper.
Pre-calculated representations for all DESI galaxies are available — see the Science Data page. HSC representations are coming soon.

Subclassing FinetuneableZoobotAbstract

For tasks that don’t fit the built-in classes (for example, multi-output regression, ordinal classification, or custom loss functions), you can subclass FinetuneableZoobotAbstract and plug in your own head and loss. Your subclass must:
  • Set self.head — a torch.nn.Module that maps encoder features to outputs.
  • Set self.loss — a callable with signature loss(y_pred, y).
  • Implement batch_to_supervised_tuple(self, batch) — extracts (x, y) from a batch dict.

Example: Custom Regression Head

Imagine you want to finetune Zoobot on a regression task with a custom loss. Here’s how you’d implement it:
import torch
from zoobot.pytorch.training.finetune import FinetuneableZoobotAbstract


class FinetuneableZoobotCustomRegression(FinetuneableZoobotAbstract):

    def __init__(
        self,
        foo,
        **super_kwargs
    ):
        super().__init__(**super_kwargs)

        self.foo = foo
        self.loss = torch.nn.SomeCrazyLoss()
        self.head = torch.nn.Sequential(my_crazy_head)

    # batch_to_supervised_tuple must be implemented
    def batch_to_supervised_tuple(self, batch):
        return batch['image'], batch['my_label']

# see zoobot/pytorch/training/finetune.py for more examples and all methods required
Once defined, you can train this custom class exactly like any built-in FinetuneableZoobot class:
from zoobot.pytorch.training import finetune

model = FinetuneableZoobotCustomRegression(
    foo='bar',
    name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano',
    learning_rate=1e-4
)

trainer = finetune.get_trainer(save_dir, accelerator='gpu', max_epochs=100)
trainer.fit(model, datamodule)
All the inherited machinery — AdamW optimization, layer decay, early stopping, checkpointing, and scheduler support — works out of the box.
Look at the source code of FinetuneableZoobotClassifier and FinetuneableZoobotRegressor in zoobot/pytorch/training/finetune.py for concrete examples of the full interface you can override.

Build docs developers (and LLMs) love