ML feature reference: the 68-feature signal vector
Complete reference for all 68 features used in Alpha Leak’s standard signal scoring, organized by category with source, description, and default value policy.
Use this file to discover all available pages before exploring further.
The standard model consumes a 68-feature vector assembled at inference time from multiple data sources: wallet history, signal-level event data, creator intelligence, token state, social graph scores, and live market context. Every feature is computed by the same code path in production as during training, ensuring no distribution shift between training and serving.
Features must be assembled in exactly the order defined by the model’s feature_names metadata array. The FEATURE_ORDER constant in the codebase is the canonical source of truth. Assembling features in a different order will not produce a runtime error — the model will silently generate incorrect predictions.
These features capture the wallet’s stats as they were at the moment the signal fired — not their current values. The point-in-time snapshot is what prevents lookahead bias: the model only sees information that was actually available when the trade decision would have been made.
Feature
Description
alpha_score
0–100 composite quality score at time of signal
wallet_graduation_rate
Share of tokens bought that graduated, at signal time
wallet_avg_buy_rank
Typical buy rank at signal time (lower = earlier entry)
These features describe the wallet’s trading behaviour across its full observed history. They are computed by FeatureComputer on a 30-minute rolling basis for all active wallets.
Feature
Description
tokens_traded
Total unique tokens bought
pct_top10_entries
Fraction of buys that were in the first 10 wallets
avg_seconds_after_creation
Average delay between token creation and the wallet’s buy
pct_buys_under_60s
Fraction of buys within 60 seconds of token creation
pct_buys_under_300s
Fraction of buys within 300 seconds of token creation
avg_sol_per_buy
Mean buy size in SOL
median_sol_per_buy
Median buy size in SOL
sol_size_stddev
Standard deviation of buy sizes (high = inconsistent sizing)
max_single_buy
Largest single buy ever recorded
avg_hold_time_seconds
Average time between buy and sell
pct_quick_flips
Fraction of positions closed within 5 minutes
pct_diamond_hands
Fraction of positions held over 24 hours
unique_creators_traded
Number of distinct token creators the wallet has engaged with
pct_repeat_creator_buys
Fraction of buys on tokens from creators the wallet has bought before
Computed from PnlCalculator and PeakTracker output. These features describe how profitable the wallet has been historically — both in terms of realised exits and peak opportunity captured.
Feature
Description
graduation_rate
Share of bought tokens that graduated
win_rate
Share of closed positions that exited in profit
sol_weighted_return
SOL-weighted average return across all positions
avg_realized_multiple
Average exit multiple
avg_peak_multiple
Average of peak prices seen during the hold period
capture_efficiency
Ratio of realised multiple to peak multiple (0–1)
profit_factor
Total profit divided by total loss
return_stddev
Standard deviation of per-trade returns
avg_loss_pct_on_losers
Average loss magnitude on losing trades
avg_gain_pct_on_winners
Average gain magnitude on winning trades
is_bot
1 if the wallet has been classified as a bot by BotDetector
These features describe the specific buy event that triggered the signal, not the wallet’s history. They capture what is happening at this token, at this moment, for this particular buy.
Feature
Description
buy_rank
This wallet’s buy rank on this token
sol_amount
Size of this specific buy in SOL
curve_pct_at_buy
Bonding curve fill percentage at time of buy (0–1)
curve_sol
Curve SOL reserves at time of buy
velocity_buys_60s
Number of buys on this token in the last 60 seconds
velocity_buys_300s
Number of buys on this token in the last 300 seconds
sol_volume_60s
SOL volume on this token in the last 60 seconds
buy_rank_percentile
This wallet’s buy rank divided by unique buyer count (0–1)
sol_vs_wallet_avg
This buy’s SOL amount relative to the wallet’s average (1.0 = typical size)
These features encode time-of-day and day-of-week patterns. Market dynamics on Pump.fun vary significantly by time: weekend afternoons have different volume profiles than weekday mornings.
The source documentation labels this category as containing 2 features, but three are defined in the feature order.
Feature
Description
buy_sell_ratio
Total buys divided by (total sells + 1) at time of signal
These features capture the aggregate state of the token at the moment the signal fires, describing how much activity has accumulated and what risk indicators are present.
Feature
Description
token_unique_buyers
Unique buyers on this token so far
token_total_buys
Total buy transactions on this token
token_total_sells
Total sell transactions on this token
token_risk_score
Token-level risk score (0–100) from RiskScorer
token_bot_buyer_pct
Fraction of buyers on this token classified as bots
token_top10_concentration
Share of token supply held by the top 10 wallets
token_bundle_confidence
Highest bundle confidence score detected for this token
Sourced from the CoOccurrence and GraphBuilder services, which track which wallets tend to buy the same tokens together and build a cluster graph from that data.
The source documentation labels this category as containing 4 features, but five are defined in the feature order.
Feature
Description
cluster_size
Number of wallets in this wallet’s co-occurrence cluster
co_occurrence_max_score
Highest co-occurrence edge score for this wallet
cluster_avg_grad_rate
Average graduation rate of wallets in the same cluster
tracked_wallets_already_in
Count of tracked wallets already holding this token when the signal fires
is_first_tracked_buy
1 if this is the first tracked wallet to buy this token
These features encode the broader market environment at signal time. A signal that looks identical in isolation may have very different expected outcomes depending on whether the market is in a hot creation period or a slow one.
Feature
Description
tokens_created_last_hour
Market-wide token creation rate over the last hour
Every feature has a carefully chosen default value applied when the underlying data is unavailable — for example, when a signal fires from a wallet that has never been seen before, or when a token is too new to have velocity data.
Default values represent a neutral, unknown wallet in a neutral market — not a worst-case assumption. Using worst-case defaults would introduce systematic pessimism bias: the model would learn to treat unknown signals as bad signals, which is not correct. An unseen wallet could be excellent.
Defaults are chosen to be plausible midpoints within each feature’s observed distribution. For example:
wallet_avg_buy_rank defaults to 50 (median rank)
avg_sol_per_buy defaults to 0.5 SOL (typical small buy)
lifecycle_state_encoded defaults to -1 (unknown / not yet classified)
The models were trained with these same defaults applied whenever data was missing during training. This means the model has learned to handle the default values correctly and will not produce anomalous outputs when they appear at inference time.
Model architecture
How the feature vector is assembled into an ONNX input tensor and passed through Platt calibration.
Training methodology
How the 68-feature dataset is constructed, labelled, and used to train each model.