Zoobot’s ultimate purpose is to enable science. Beyond providing a finetunable model, the Zoobot project releases science-ready data products: compact galaxy representations suitable for unsupervised applications, and detailed volunteer-calibrated morphology catalogs covering millions of galaxies. These outputs let you do meaningful research without needing to run any deep learning yourself.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/mwalmsley/zoobot/llms.txt
Use this file to discover all available pages before exploring further.
Precalculated Representations
Zoobot v2 ships with precalculated representations for every galaxy in the Galaxy Zoo DESI data release. Rather than working with raw images, you get a compact 40-dimensional PCA-compressed vector per galaxy that summarises its visual morphology. Download: representations_pca_40_with_coords.parquet (2.5 GB)Schema
| Column | Description |
|---|---|
id_str | Unique galaxy identifier ({brickid}_{objid}) from DESI Legacy Surveys DR8 |
ra | Right ascension in degrees |
dec | Declination in degrees |
feat_pca_0 | First PCA component of the Zoobot representation |
feat_pca_1 | Second PCA component |
... | Components up to feat_pca_39 (40 total) |
id_str is formed as {brickid}_{objid}, where brickid is the unique identifier for the sky brick in the Legacy Surveys and objid is the unique identifier for the source within that brick. Use id_str to cross-match with the GZ DESI morphology catalog (below) via the dr8_id key.
Use Cases
The precalculated representations are well-suited for tasks such as:- Similarity search — find galaxies that look like a query example
- Anomaly detection — identify rare or unusual morphologies at scale
- Multi-modal models — use the representation as the vision branch alongside spectroscopic or photometric data
- Any application that needs a short vector summarising the morphology of a galaxy image
Galaxy Zoo Morphology Catalogs
GZ DESI — 8.7 Million Galaxies
Zoobot was used to produce a detailed morphology catalog for every extended galaxy brighter than r = 19 in the DESI Legacy Surveys — 8.7 million galaxies in total. The catalog and full schema are available from Zenodo: Download: https://zenodo.org/records/8360385If you are new to the catalog, start with
gz_desi_deep_learning_catalog_friendly.parquet. This file contains the most useful columns in a ready-to-use format, without requiring familiarity with the full schema.GZ DECaLS DR5 (Superseded)
A previous Zoobot-powered morphology catalog was created for DECaLS DR5: Download: https://zenodo.org/records/4573248Future Catalogs
The Zoobot team is actively working on expanding coverage to additional surveys. Planned releases, roughly in order of priority:- DESI-LS DR10 — an updated morphology catalog using the full DR10 footprint (image redownload in progress)
- HSC — Hyper Suprime-Cam morphologies at greater depth
- JWST — high-redshift morphology measurements with JWST imaging
- Euclid — wide-field morphology from the Euclid satellite
Zoobot is already deployed in the Euclid processing pipeline to produce the OU-MER morphology catalog. The first public results from Euclid Q1 are documented in Euclid preparation: Measuring detailed galaxy morphologies for Euclid with Machine Learning (2024) and the Euclid Q1 first visual morphology catalogue (2025).