System components
BOOM’s architecture consists of several key components:Kafka consumers
Read alerts from astronomical survey Kafka topics and transfer them to Redis/Valkey in-memory queues for processing.
Alert workers
Format alerts into BSON documents, enrich them with catalog crossmatches, and store them in MongoDB.
Enrichment workers
Run machine learning classifiers on alerts and write classification results back to the database.
Technology stack
Why Redis/Valkey as a cache and task queue?
BOOM uses Redis/Valkey for in-memory queueing between pipeline stages:- Well maintained: Strong community support and excellent documentation across multiple programming languages
- Performance: Fast in-memory operations that can handle high-throughput alert streams
- Versatility: Serves as cache, task queue, and message broker - reducing system complexity
- Simplicity: Unlike Celery, Dask, or RabbitMQ, Redis/Valkey avoids complexity, memory leaks, and maintenance overhead
Why MongoDB as the database?
MongoDB proved successful in Kowalski, another broker that inspired BOOM:- Multi-language support: Excellent client libraries across programming languages
- Schema flexibility: No migrations needed when adding new astronomical catalogs for crossmatching
- Rich query language: Complex aggregation pipelines perfect for alert filtering
- Performance: Fast queries and indexing for high-volume data
Unlike PostgreSQL, MongoDB doesn’t require enforcing a rigid schema or running migrations whenever new catalog fields are added.
Why Kafka as the message broker?
Kafka is the standard for astronomical alert distribution:- Industry standard: Used by ZTF, LSST, and other major astronomical surveys
- Scalability: Handles massive data volumes with fault tolerance
- Ecosystem: Rich tooling and library support
- Compatibility: Downstream services expect Kafka topics
BOOM keeps the internal cache/task queue (Redis/Valkey) separate from the public-facing message broker (Kafka) for better modularity.
Avro schema handling
Working with Avro and Rust
BOOM uses thersgen-avro crate to generate Rust structs from Avro schemas:
Download schemas
Get the latest Avro schemas for your survey:
ZTF Rust structs are already generated in
src/types.rs. The Alert struct includes a custom from_avro_bytes method for deserialization.Schema registry
For surveys like LSST that use schema registries, BOOM implements theSchemaRegistry struct (src/alert/base.rs:242-543) with:
- Caching: Stores fetched schemas to avoid repeated network requests
- GitHub fallback: Falls back to GitHub when the registry is unavailable
- Version management: Handles schema versioning and evolution
BSON serialization
BOOM serializes Rust structs to BSON before writing to MongoDB:Why not use Rust structs directly?
Why not use Rust structs directly?
While MongoDB’s Rust driver can serialize/deserialize structs directly, Rust structs cannot dynamically remove null fields. This would result in many null fields in the database.Instead, BOOM:
- Serializes Rust structs to BSON documents
- Sanitizes documents (removes null fields)
- Writes clean documents to MongoDB
Data flow
Here’s how an alert flows through BOOM:Processing
Alert workers deserialize alerts, perform catalog crossmatches, and store them in MongoDB.
Enrichment
Enrichment workers retrieve alerts, run ML classifiers, and update classification results.
Scheduler architecture
The scheduler (src/bin/scheduler.rs) manages all worker pools:
- Spawns worker pools for each stage
- Monitors worker health with heartbeat logs every 60 seconds
- Handles graceful shutdown on SIGINT (Ctrl+C)
- Initializes database indexes for the survey
Observability
BOOM includes comprehensive observability features:Metrics
Prometheus metrics are exposed on port 9090 for:- Alert processing throughput
- Worker activity and batch sizes
- Error rates by stage
- Queue depths
Logging
Configurable logging via theRUST_LOG environment variable:
The “close” span event is particularly useful as it includes execution time information.