Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/0xW1re/solvedocs/llms.txt

Use this file to discover all available pages before exploring further.

Alpha Leak scores every signal through LightGBM gradient boosting models compiled to ONNX format. They run in-process via onnxruntime-node — no separate inference server, no network hop. Inference latency stays in the low milliseconds even at high signal throughput, and models can be swapped without restarting the pipeline.

Model families

Two families of models operate concurrently. Both use LightGBM multiclass or binary classification, compiled to ONNX and calibrated with Platt scaling. Standard models score signals produced by tracked wallet buys. They answer: given what we know about this wallet, this token, and the current market, what is the probability that the price reaches X× within Y minutes? Genesis models score newly created tokens based on their first-60-second behaviour. They answer: given the launch dynamics of this token, what is the probability it reaches X× within Y minutes? Genesis models use a separate 75-feature dataset assembled by the GenesisWatcher service.

Standard model targets

Model fileTargetUse case
reach_2x_1h.onnxProbability of 2× in 1 hourPrimary strategy: reach_2x_1h
reach_3x_30m.onnxProbability of 3× in 30 minutesPhase 2+ strategy: reach_3x_30m
reach_2x_10m.onnxProbability of 2× in 10 minutesPhase 3 strategy: reach_2x_10m
is_dead_soon.onnxProbability of imminent deathVeto signal, combined with others

ONNX deployment

Each model is stored as a pair of files in src/ml/models/:
src/ml/models/
  reach_2x_1h.onnx              # Compiled model weights
  reach_2x_1h_metadata.json     # Feature list, calibration params, PR-AUC
The metadata file is essential. It contains the ordered feature list that the inference code uses to assemble the feature vector in the exact order the model was trained with. A mismatch in feature order would silently corrupt every prediction.
{
  "model_id": "reach_2x_1h_v3",
  "model_type": "classification",
  "target": "reach_2x_1h",
  "version": 3,
  "feature_names": ["alpha_score", "wallet_graduation_rate", "..."],
  "feature_count": 68,
  "calibration": {
    "method": "platt",
    "platt_a": 1.42,
    "platt_b": -0.31
  },
  "pr_auc": 0.34
}

Hot reloading

Models are scanned every 5 minutes. If a new .onnx file appears in the models directory that is not already loaded, it is loaded and added to the active model set. ONNX sessions from superseded model versions are released to free memory.
This means you can deploy a retrained model by dropping the new .onnx and _metadata.json files into the models directory. The pipeline picks them up within 5 minutes — no restart required.

Calibration

LightGBM models output raw class probabilities that are often miscalibrated. The model may output 0.70 for signals that actually hit the target only 50% of the time. Every model is calibrated post-training on a held-out set using Platt scaling:
calibrated_prob = σ(platt_a × raw_prob + platt_b)
                = 1 / (1 + exp(-(platt_a × raw_prob + platt_b)))
The platt_a and platt_b constants are stored in the model’s metadata file and applied at inference time. After calibration, a model output of 0.80 means approximately 80% of signals at that score level actually hit the target — which is what makes the threshold values in strategy configs meaningful rather than arbitrary.

Inference pipeline

Every 5 seconds, MlInference fetches unscored signals from the database and runs them through all loaded models.
1

Assemble the feature vector

The 68-feature vector is assembled in canonical FEATURE_ORDER, with default values substituted for any missing data. Feature order must exactly match the order used during training.
2

Create the input tensor

Each model’s input tensor is created as a Float32Array of shape [1, 68].
3

Run the ONNX session

The ONNX session is executed. For classification models, the output is a [1, 2] probability tensor. The second element — P(class=1) — is extracted as the raw score.
4

Apply Platt calibration

The raw probability is passed through the sigmoid transform using the platt_a and platt_b values from the model’s metadata file.
5

Write scores to the database

The calibrated score is written back to the signals table for each model that ran.

Composite scoring

When multiple models are loaded, the inference service runs them all and stores their scores independently:
FieldSource modelDescription
ml_score_1hreach_2x_1h.onnxCalibrated probability of 2× in 1 hour
ml_score_30mreach_3x_30m.onnxCalibrated probability of 3× in 30 minutes
ml_score_10mreach_2x_10m.onnxCalibrated probability of 2× in 10 minutes
dead_probis_dead_soon.onnxCalibrated probability of imminent token death
The live trader reads the score for its specific strategy target. A signal with ml_score_1h = 0.85 and dead_prob = 0.03 is a strong candidate for the reach_2x_1h strategy. The same signal with dead_prob = 0.40 is rejected regardless of its ml_score_1h.

Fallback behaviour

If ONNX runtime is not installed or no models are present, the pipeline degrades gracefully:
  • MlInference logs a warning and disables itself
  • Signals receive ml_score = NULL
  • The live trader falls back to COALESCE(ml_score_1h, rule_score, 0), using the rule-based signal score as its decision metric
  • All other pipeline services continue operating normally
This means the pipeline can run without ML models from day one, collecting labelled data for future training while still trading on rule-based signals.

Build docs developers (and LLMs) love