Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/mwalmsley/zoobot/llms.txt

Use this file to discover all available pages before exploring further.

The schemas module defines the structure of Galaxy Zoo decision trees. A Schema maps each question to its possible answers and specifies which prior answer unlocks each subsequent question. Zoobot uses this structure to apply the Dirichlet-Multinomial loss question-by-question and to route predictions through the tree during evaluation.

Import

from zoobot.shared.schemas import Schema, Question, Answer

Schema

Schema(question_answer_pairs: dict, dependencies: dict)
Top-level container for a Galaxy Zoo decision tree. Constructs Question and Answer objects from the provided dictionaries, resolves the dependency links between them, and exposes slicing utilities used by the loss function and metric logging.

Constructor Parameters

question_answer_pairs
dict
required
Ordered dictionary mapping each question’s text to a list of answer suffix strings. The suffixes are concatenated with the question text to form full answer column names. For example:
{
    'smooth-or-featured': ['_smooth', '_featured-or-disk', '_artifact'],
    'has-spiral-arms':    ['_yes', '_no']
}
This produces label columns ['smooth-or-featured_smooth', 'smooth-or-featured_featured-or-disk', 'smooth-or-featured_artifact', 'has-spiral-arms_yes', 'has-spiral-arms_no']. The ordering of questions in the dict determines their ordering in the label space.
dependencies
dict
required
Dictionary mapping each question’s text to the full answer text that must be chosen before that question is asked, or None for the root question. For example:
{
    'smooth-or-featured': None,
    'has-spiral-arms':    'smooth-or-featured_featured-or-disk'
}
Every question in question_answer_pairs must appear as a key in dependencies.

Attributes

AttributeTypeDescription
schema.questionsList[Question]Ordered list of Question objects, one per key in question_answer_pairs
schema.label_colsList[str]Flat list of all answer column names in order
schema.question_answer_pairsdictThe question_answer_pairs dict passed to the constructor
schema.dependenciesdictThe dependencies dict passed to the constructor
schema.question_index_groupsList[Tuple[int, int]](start_index, end_index) in label_cols for each question; used for loss slicing
schema.answersList[Answer]Flat list of all Answer objects across all questions

Methods

schema.get_answer(answer_text)

Return the Answer object whose .text equals answer_text.
answer = schema.get_answer('smooth-or-featured_smooth')
print(answer.index)  # 0
Raises ValueError if the answer text is not found.

schema.get_question(question_text)

Return the Question object whose .text equals question_text.
question = schema.get_question('has-spiral-arms')
print(question.start_index, question.end_index)
Raises ValueError if the question text is not found.

schema.joint_p(prob_of_answers, answer_text)

Compute the joint probability that answer_text is both asked and chosen, given a (galaxies, answers) array of per-answer probabilities. Useful for filtering predictions by how likely a question was reached in the tree.
import numpy as np
p = np.random.dirichlet([1]*5, size=100)  # shape (100, 5)
joint = schema.joint_p(p, 'has-spiral-arms_yes')  # shape (100,)

Question

Question(question_text: str, answer_text: List[str], label_cols: List[str])
Represents a single node in the decision tree. Constructed automatically by Schema — you do not need to instantiate this class directly.

Attributes

AttributeTypeDescription
question.textstrThe question text e.g. 'smooth-or-featured'
question.answersList[Answer]Answer objects for this question
question.start_indexintIndex of the first answer in label_cols
question.end_indexintIndex of the last answer in label_cols
question.asked_afterAnswer | NoneThe Answer that leads to this question; None for the root question

Answer

Answer(text: str, question: Question, index: int)
Represents a single leaf in the decision tree. Constructed automatically by Schema — you do not need to instantiate this class directly.

Attributes

AttributeTypeDescription
answer.textstrFull answer text (question + suffix) e.g. 'smooth-or-featured_smooth'
answer.questionQuestionThe Question to which this answer belongs
answer.indexintPosition of this answer in label_cols; used to slice model outputs
answer.next_questionQuestion | NoneThe Question that follows this answer; None if the tree ends here
answer.pretty_textstrHuman-readable version of text, with hyphens and underscores replaced by spaces and title-cased

Defining a Custom Schema

from zoobot.shared.schemas import Schema

# Simple 2-question decision tree
question_answer_pairs = {
    'smooth-or-featured': ['_smooth', '_featured-or-disk', '_artifact'],
    'has-spiral-arms': ['_yes', '_no']
}

# smooth-or-featured_featured-or-disk leads to has-spiral-arms
dependencies = {
    'smooth-or-featured': None,
    'has-spiral-arms': 'smooth-or-featured_featured-or-disk'
}

schema = Schema(question_answer_pairs, dependencies)

# Inspect the schema
print(schema.label_cols)
# ['smooth-or-featured_smooth', 'smooth-or-featured_featured-or-disk',
#  'smooth-or-featured_artifact', 'has-spiral-arms_yes', 'has-spiral-arms_no']

print(schema.question_index_groups)
# [(0, 2), (3, 4)]

for q in schema.questions:
    print(q)
# smooth-or-featured, indices 0 to 2, asked after None
# has-spiral-arms, indices 3 to 4, asked after smooth-or-featured_featured-or-disk

Pre-Built Schemas

For the standard Galaxy Zoo surveys, schemas are pre-built and importable directly from zoobot.shared.schemas. There is no need to define question_answer_pairs or dependencies by hand for these datasets.
Schema objectSurvey
decals_dr5_ortho_schemaGZ DECaLS DR5
decals_dr8_ortho_schemaGZ DECaLS DR8
decals_all_campaigns_ortho_schemaGZ DECaLS all campaigns
gz2_ortho_schemaGalaxy Zoo 2
gz_candels_ortho_schemaGZ CANDELS
gz_hubble_ortho_schemaGZ Hubble
cosmic_dawn_ortho_schemaCosmic Dawn
gz_rings_schemaGZ Rings
desi_schemaGZ DESI (prediction use only — no orthogonal suffix)
gz_ukidss_schemaGZ UKIDSS
gz_jwst_schemaGZ JWST
gz_evo_v1_schemaGZ Evo v1 (current pretraining schema)
gz_evo_v2_schemaGZ Evo v2 (adds Euclid, updated Hubble)
from zoobot.shared.schemas import gz2_ortho_schema, desi_schema

# Use with ZoobotTree or FinetuneableZoobotTree
model = FinetuneableZoobotTree(
    name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano',
    schema=gz2_ortho_schema,
)
The "ortho" suffix in schema names indicates that each question carries a survey-specific suffix (e.g. -dr5, -dr8) appended to question and answer texts. This allows multi-campaign training without column-name collisions. The galaxy-datasets package provides the underlying label_metadata dictionaries used to build these schemas.
See the Training on Vote Counts guide for a full worked example.

Build docs developers (and LLMs) love