TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/mwalmsley/zoobot/llms.txt
Use this file to discover all available pages before exploring further.
schemas module defines the structure of Galaxy Zoo decision trees. A Schema maps each question to its possible answers and specifies which prior answer unlocks each subsequent question. Zoobot uses this structure to apply the Dirichlet-Multinomial loss question-by-question and to route predictions through the tree during evaluation.
Import
Schema
Question and Answer objects from the provided dictionaries, resolves the dependency links between them, and exposes slicing utilities used by the loss function and metric logging.
Constructor Parameters
Ordered dictionary mapping each question’s text to a list of answer suffix strings. The suffixes are concatenated with the question text to form full answer column names. For example:This produces label columns
['smooth-or-featured_smooth', 'smooth-or-featured_featured-or-disk', 'smooth-or-featured_artifact', 'has-spiral-arms_yes', 'has-spiral-arms_no']. The ordering of questions in the dict determines their ordering in the label space.Dictionary mapping each question’s text to the full answer text that must be chosen before that question is asked, or Every question in
None for the root question. For example:question_answer_pairs must appear as a key in dependencies.Attributes
| Attribute | Type | Description |
|---|---|---|
schema.questions | List[Question] | Ordered list of Question objects, one per key in question_answer_pairs |
schema.label_cols | List[str] | Flat list of all answer column names in order |
schema.question_answer_pairs | dict | The question_answer_pairs dict passed to the constructor |
schema.dependencies | dict | The dependencies dict passed to the constructor |
schema.question_index_groups | List[Tuple[int, int]] | (start_index, end_index) in label_cols for each question; used for loss slicing |
schema.answers | List[Answer] | Flat list of all Answer objects across all questions |
Methods
schema.get_answer(answer_text)
Return the Answer object whose .text equals answer_text.
ValueError if the answer text is not found.
schema.get_question(question_text)
Return the Question object whose .text equals question_text.
ValueError if the question text is not found.
schema.joint_p(prob_of_answers, answer_text)
Compute the joint probability that answer_text is both asked and chosen, given a (galaxies, answers) array of per-answer probabilities. Useful for filtering predictions by how likely a question was reached in the tree.
Question
Schema — you do not need to instantiate this class directly.
Attributes
| Attribute | Type | Description |
|---|---|---|
question.text | str | The question text e.g. 'smooth-or-featured' |
question.answers | List[Answer] | Answer objects for this question |
question.start_index | int | Index of the first answer in label_cols |
question.end_index | int | Index of the last answer in label_cols |
question.asked_after | Answer | None | The Answer that leads to this question; None for the root question |
Answer
Schema — you do not need to instantiate this class directly.
Attributes
| Attribute | Type | Description |
|---|---|---|
answer.text | str | Full answer text (question + suffix) e.g. 'smooth-or-featured_smooth' |
answer.question | Question | The Question to which this answer belongs |
answer.index | int | Position of this answer in label_cols; used to slice model outputs |
answer.next_question | Question | None | The Question that follows this answer; None if the tree ends here |
answer.pretty_text | str | Human-readable version of text, with hyphens and underscores replaced by spaces and title-cased |
Defining a Custom Schema
Pre-Built Schemas
For the standard Galaxy Zoo surveys, schemas are pre-built and importable directly fromzoobot.shared.schemas. There is no need to define question_answer_pairs or dependencies by hand for these datasets.
| Schema object | Survey |
|---|---|
decals_dr5_ortho_schema | GZ DECaLS DR5 |
decals_dr8_ortho_schema | GZ DECaLS DR8 |
decals_all_campaigns_ortho_schema | GZ DECaLS all campaigns |
gz2_ortho_schema | Galaxy Zoo 2 |
gz_candels_ortho_schema | GZ CANDELS |
gz_hubble_ortho_schema | GZ Hubble |
cosmic_dawn_ortho_schema | Cosmic Dawn |
gz_rings_schema | GZ Rings |
desi_schema | GZ DESI (prediction use only — no orthogonal suffix) |
gz_ukidss_schema | GZ UKIDSS |
gz_jwst_schema | GZ JWST |
gz_evo_v1_schema | GZ Evo v1 (current pretraining schema) |
gz_evo_v2_schema | GZ Evo v2 (adds Euclid, updated Hubble) |
The
"ortho" suffix in schema names indicates that each question carries a survey-specific suffix (e.g. -dr5, -dr8) appended to question and answer texts. This allows multi-campaign training without column-name collisions. The galaxy-datasets package provides the underlying label_metadata dictionaries used to build these schemas.