Alpha Leak scores every signal through LightGBM gradient boosting models compiled to ONNX format. They run in-process viaDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/0xW1re/solvedocs/llms.txt
Use this file to discover all available pages before exploring further.
onnxruntime-node — no separate inference server, no network hop. Inference latency stays in the low milliseconds even at high signal throughput, and models can be swapped without restarting the pipeline.
Model families
Two families of models operate concurrently. Both use LightGBM multiclass or binary classification, compiled to ONNX and calibrated with Platt scaling. Standard models score signals produced by tracked wallet buys. They answer: given what we know about this wallet, this token, and the current market, what is the probability that the price reaches X× within Y minutes? Genesis models score newly created tokens based on their first-60-second behaviour. They answer: given the launch dynamics of this token, what is the probability it reaches X× within Y minutes? Genesis models use a separate 75-feature dataset assembled by theGenesisWatcher service.
Standard model targets
| Model file | Target | Use case |
|---|---|---|
reach_2x_1h.onnx | Probability of 2× in 1 hour | Primary strategy: reach_2x_1h |
reach_3x_30m.onnx | Probability of 3× in 30 minutes | Phase 2+ strategy: reach_3x_30m |
reach_2x_10m.onnx | Probability of 2× in 10 minutes | Phase 3 strategy: reach_2x_10m |
is_dead_soon.onnx | Probability of imminent death | Veto signal, combined with others |
ONNX deployment
Each model is stored as a pair of files insrc/ml/models/:
Hot reloading
Models are scanned every 5 minutes. If a new.onnx file appears in the models directory that is not already loaded, it is loaded and added to the active model set. ONNX sessions from superseded model versions are released to free memory.
Calibration
LightGBM models output raw class probabilities that are often miscalibrated. The model may output 0.70 for signals that actually hit the target only 50% of the time. Every model is calibrated post-training on a held-out set using Platt scaling:platt_a and platt_b constants are stored in the model’s metadata file and applied at inference time. After calibration, a model output of 0.80 means approximately 80% of signals at that score level actually hit the target — which is what makes the threshold values in strategy configs meaningful rather than arbitrary.
Inference pipeline
Every 5 seconds,MlInference fetches unscored signals from the database and runs them through all loaded models.
Assemble the feature vector
The 68-feature vector is assembled in canonical
FEATURE_ORDER, with default values substituted for any missing data. Feature order must exactly match the order used during training.Run the ONNX session
The ONNX session is executed. For classification models, the output is a
[1, 2] probability tensor. The second element — P(class=1) — is extracted as the raw score.Apply Platt calibration
The raw probability is passed through the sigmoid transform using the
platt_a and platt_b values from the model’s metadata file.Composite scoring
When multiple models are loaded, the inference service runs them all and stores their scores independently:| Field | Source model | Description |
|---|---|---|
ml_score_1h | reach_2x_1h.onnx | Calibrated probability of 2× in 1 hour |
ml_score_30m | reach_3x_30m.onnx | Calibrated probability of 3× in 30 minutes |
ml_score_10m | reach_2x_10m.onnx | Calibrated probability of 2× in 10 minutes |
dead_prob | is_dead_soon.onnx | Calibrated probability of imminent token death |
ml_score_1h = 0.85 and dead_prob = 0.03 is a strong candidate for the reach_2x_1h strategy. The same signal with dead_prob = 0.40 is rejected regardless of its ml_score_1h.
Fallback behaviour
If ONNX runtime is not installed or no models are present, the pipeline degrades gracefully:MlInferencelogs a warning and disables itself- Signals receive
ml_score = NULL - The live trader falls back to
COALESCE(ml_score_1h, rule_score, 0), using the rule-based signal score as its decision metric - All other pipeline services continue operating normally
This means the pipeline can run without ML models from day one, collecting labelled data for future training while still trading on rule-based signals.