Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/mwalmsley/zoobot/llms.txt

Use this file to discover all available pages before exploring further.

FinetuneableZoobotTree adapts a pretrained Zoobot encoder to predict Galaxy Zoo-style vote count distributions across a morphology decision tree. It uses the Dirichlet-Multinomial loss introduced in the Galaxy Zoo DECaLS paper, which accounts for the variable number of volunteer responses per question due to conditional branching in the decision tree. Use this class when training on raw volunteer vote counts from a Galaxy Zoo campaign.
from zoobot.pytorch.training.finetune import FinetuneableZoobotTree
from zoobot.shared.schemas import Schema

Quick Example

finetune_tree.py
from zoobot.pytorch.training import finetune
from zoobot.pytorch.training.finetune import FinetuneableZoobotTree
from zoobot.shared.schemas import Schema
from galaxy_datasets.pytorch.galaxy_datamodule import CatalogDataModule

# Define your decision tree schema (or import a pre-built one from galaxy-datasets)
schema = Schema(
    question_answer_pairs=gz_decals_question_answer_pairs,
    dependencies=gz_decals_dependencies,
    label_cols=gz_decals_label_cols
)

datamodule = CatalogDataModule(
    label_cols=schema.label_cols,
    catalog=train_catalog,
    batch_size=64
)

model = FinetuneableZoobotTree(
    name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano',
    schema=schema
)

trainer = finetune.get_trainer(save_dir='./results')
trainer.fit(model, datamodule)

Constructor Parameters

All parameters from FinetuneableZoobotAbstract are accepted. The following is specific to the tree model.
schema
zoobot.shared.schemas.Schema
required
Describes the layout of the decision tree: which questions exist, what answers each question has, and which answers lead to which follow-up questions. See the Schemas API reference for how to construct a Schema object. Pre-built schemas for GZ DECaLS, GZ DESI, GZ2, and other campaigns are available via the galaxy-datasets package.

How the Loss Works

The model predicts a Dirichlet concentration parameter for each answer in the decision tree. The Dirichlet-Multinomial loss then compares the predicted distribution of votes (given that k volunteers were asked a question) to the actual observed vote counts. This naturally handles the fact that some questions receive many votes (asked early in the tree, of every volunteer) and others receive very few (asked late in the tree, of only a subset of volunteers).
Input image → Encoder → Dirichlet Head → α concentrations (one per answer)

                           Dirichlet-Multinomial loss vs. observed vote counts

Batch Format

batch_to_supervised_tuple extracts the image and all vote count columns from the batch:
x = batch['image']                  # tensor (batch_size, C, H, W)
y = {col: batch[col] for col in schema.label_cols}  # dict of vote count tensors
Your catalog must include a column for every entry in schema.label_cols, with integer vote counts (use 0 for questions not asked, never NaN).

Metrics Logged

Unlike the classifier and regressor, FinetuneableZoobotTree logs only loss metrics because accuracy and RMSE are not meaningful for vote count distributions:
MetricDescription
finetuning/train_lossDirichlet-Multinomial loss on training set
finetuning/val_lossDirichlet-Multinomial loss on validation set
finetuning/test_lossLoss on test set (when trainer.test() is called)

When to Use This vs. FinetuneableZoobotClassifier

  • You have raw volunteer vote counts from a Galaxy Zoo campaign
  • You want to train on a full decision tree of morphology questions simultaneously
  • You are creating a new Galaxy Zoo morphology catalog
  • You want to replicate or extend GZ DECaLS / GZ DESI style predictions

Reloading After Training

model = FinetuneableZoobotTree.load_from_checkpoint(
    'results/checkpoints/epoch=20.ckpt',
    schema=schema  # schema must be re-provided as it is not a simple hyperparameter
)
See the Training on Vote Counts guide for a full walkthrough including catalog preparation and Schema construction.

Build docs developers (and LLMs) love