Schema, Question, and Answer — Decision Tree Definitions

The schemas module defines the structure of Galaxy Zoo decision trees. A Schema maps each question to its possible answers and specifies which prior answer unlocks each subsequent question. Zoobot uses this structure to apply the Dirichlet-Multinomial loss question-by-question and to route predictions through the tree during evaluation.

Import

from zoobot.shared.schemas import Schema, Question, Answer

`Schema`

Schema(question_answer_pairs: dict, dependencies: dict)

Top-level container for a Galaxy Zoo decision tree. Constructs Question and Answer objects from the provided dictionaries, resolves the dependency links between them, and exposes slicing utilities used by the loss function and metric logging.

Constructor Parameters

question_answer_pairs

dict

required

Ordered dictionary mapping each question’s text to a list of answer suffix strings. The suffixes are concatenated with the question text to form full answer column names. For example:

{
    'smooth-or-featured': ['_smooth', '_featured-or-disk', '_artifact'],
    'has-spiral-arms':    ['_yes', '_no']
}

This produces label columns

['smooth-or-featured_smooth', 'smooth-or-featured_featured-or-disk', 'smooth-or-featured_artifact', 'has-spiral-arms_yes', 'has-spiral-arms_no']

. The ordering of questions in the dict determines their ordering in the label space.

dependencies

dict

required

Dictionary mapping each question’s text to the full answer text that must be chosen before that question is asked, or None for the root question. For example:

{
    'smooth-or-featured': None,
    'has-spiral-arms':    'smooth-or-featured_featured-or-disk'
}

Every question in question_answer_pairs must appear as a key in dependencies.

Attributes

Attribute	Type	Description
`schema.questions`	`List[Question]`	Ordered list of `Question` objects, one per key in `question_answer_pairs`
`schema.label_cols`	`List[str]`	Flat list of all answer column names in order
`schema.question_answer_pairs`	`dict`	The `question_answer_pairs` dict passed to the constructor
`schema.dependencies`	`dict`	The `dependencies` dict passed to the constructor
`schema.question_index_groups`	`List[Tuple[int, int]]`	`(start_index, end_index)` in `label_cols` for each question; used for loss slicing
`schema.answers`	`List[Answer]`	Flat list of all `Answer` objects across all questions

Methods

`schema.get_answer(answer_text)`

Return the Answer object whose .text equals answer_text.

answer = schema.get_answer('smooth-or-featured_smooth')
print(answer.index)  # 0

Raises ValueError if the answer text is not found.

`schema.get_question(question_text)`

Return the Question object whose .text equals question_text.

question = schema.get_question('has-spiral-arms')
print(question.start_index, question.end_index)

Raises ValueError if the question text is not found.

`schema.joint_p(prob_of_answers, answer_text)`

Compute the joint probability that answer_text is both asked and chosen, given a (galaxies, answers) array of per-answer probabilities. Useful for filtering predictions by how likely a question was reached in the tree.

import numpy as np
p = np.random.dirichlet([1]*5, size=100)  # shape (100, 5)
joint = schema.joint_p(p, 'has-spiral-arms_yes')  # shape (100,)

`Question`

Question(question_text: str, answer_text: List[str], label_cols: List[str])

Represents a single node in the decision tree. Constructed automatically by Schema — you do not need to instantiate this class directly.

Attributes

Attribute	Type	Description
`question.text`	`str`	The question text e.g. `'smooth-or-featured'`
`question.answers`	`List[Answer]`	`Answer` objects for this question
`question.start_index`	`int`	Index of the first answer in `label_cols`
`question.end_index`	`int`	Index of the last answer in `label_cols`
`question.asked_after`	`Answer \| None`	The `Answer` that leads to this question; `None` for the root question

`Answer`

Answer(text: str, question: Question, index: int)

Represents a single leaf in the decision tree. Constructed automatically by Schema — you do not need to instantiate this class directly.

Attributes

Attribute	Type	Description
`answer.text`	`str`	Full answer text (question + suffix) e.g. `'smooth-or-featured_smooth'`
`answer.question`	`Question`	The `Question` to which this answer belongs
`answer.index`	`int`	Position of this answer in `label_cols`; used to slice model outputs
`answer.next_question`	`Question \| None`	The `Question` that follows this answer; `None` if the tree ends here
`answer.pretty_text`	`str`	Human-readable version of `text`, with hyphens and underscores replaced by spaces and title-cased

Defining a Custom Schema

from zoobot.shared.schemas import Schema

# Simple 2-question decision tree
question_answer_pairs = {
    'smooth-or-featured': ['_smooth', '_featured-or-disk', '_artifact'],
    'has-spiral-arms': ['_yes', '_no']
}

# smooth-or-featured_featured-or-disk leads to has-spiral-arms
dependencies = {
    'smooth-or-featured': None,
    'has-spiral-arms': 'smooth-or-featured_featured-or-disk'
}

schema = Schema(question_answer_pairs, dependencies)

# Inspect the schema
print(schema.label_cols)
# ['smooth-or-featured_smooth', 'smooth-or-featured_featured-or-disk',
#  'smooth-or-featured_artifact', 'has-spiral-arms_yes', 'has-spiral-arms_no']

print(schema.question_index_groups)
# [(0, 2), (3, 4)]

for q in schema.questions:
    print(q)
# smooth-or-featured, indices 0 to 2, asked after None
# has-spiral-arms, indices 3 to 4, asked after smooth-or-featured_featured-or-disk

Pre-Built Schemas

For the standard Galaxy Zoo surveys, schemas are pre-built and importable directly from zoobot.shared.schemas. There is no need to define question_answer_pairs or dependencies by hand for these datasets.

Schema object	Survey
`decals_dr5_ortho_schema`	GZ DECaLS DR5
`decals_dr8_ortho_schema`	GZ DECaLS DR8
`decals_all_campaigns_ortho_schema`	GZ DECaLS all campaigns
`gz2_ortho_schema`	Galaxy Zoo 2
`gz_candels_ortho_schema`	GZ CANDELS
`gz_hubble_ortho_schema`	GZ Hubble
`cosmic_dawn_ortho_schema`	Cosmic Dawn
`gz_rings_schema`	GZ Rings
`desi_schema`	GZ DESI (prediction use only — no orthogonal suffix)
`gz_ukidss_schema`	GZ UKIDSS
`gz_jwst_schema`	GZ JWST
`gz_evo_v1_schema`	GZ Evo v1 (current pretraining schema)
`gz_evo_v2_schema`	GZ Evo v2 (adds Euclid, updated Hubble)

from zoobot.shared.schemas import gz2_ortho_schema, desi_schema

# Use with ZoobotTree or FinetuneableZoobotTree
model = FinetuneableZoobotTree(
    name='hf_hub:mwalmsley/zoobot-encoder-convnext_nano',
    schema=gz2_ortho_schema,
)

The "ortho" suffix in schema names indicates that each question carries a survey-specific suffix (e.g. -dr5, -dr8) appended to question and answer texts. This allows multi-campaign training without column-name collisions. The galaxy-datasets package provides the underlying label_metadata dictionaries used to build these schemas.

See the Training on Vote Counts guide for a full worked example.

Finetuning

Predictions

Models & Estimators

Shared Utilities

Schema, Question, and Answer — Decision Tree Definitions

Import

`Schema`

Constructor Parameters

Attributes

Methods

`schema.get_answer(answer_text)`

`schema.get_question(question_text)`

`schema.joint_p(prob_of_answers, answer_text)`

`Question`

Attributes

`Answer`

Attributes

Defining a Custom Schema

Pre-Built Schemas

Build docs developers (and LLMs) love

Finetuning

Predictions

Models & Estimators

Shared Utilities

Documentation Index

​Import

​Schema

​Constructor Parameters

​Attributes

​Methods

​schema.get_answer(answer_text)

​schema.get_question(question_text)

​schema.joint_p(prob_of_answers, answer_text)

​Question

​Attributes

​Answer

​Attributes

​Defining a Custom Schema

​Pre-Built Schemas

Build docs developers (and LLMs) love

Import

`Schema`

Constructor Parameters

Attributes

Methods

`schema.get_answer(answer_text)`

`schema.get_question(question_text)`

`schema.joint_p(prob_of_answers, answer_text)`

`Question`

Attributes

`Answer`

Attributes

Defining a Custom Schema

Pre-Built Schemas