ML feature reference: the 68-feature signal vector

The standard model consumes a 68-feature vector assembled at inference time from multiple data sources: wallet history, signal-level event data, creator intelligence, token state, social graph scores, and live market context. Every feature is computed by the same code path in production as during training, ensuring no distribution shift between training and serving.

Features must be assembled in exactly the order defined by the model’s feature_names metadata array. The FEATURE_ORDER constant in the codebase is the canonical source of truth. Assembling features in a different order will not produce a runtime error — the model will silently generate incorrect predictions.

Wallet point-in-time (4 features)

These features capture the wallet’s stats as they were at the moment the signal fired — not their current values. The point-in-time snapshot is what prevents lookahead bias: the model only sees information that was actually available when the trade decision would have been made.

Feature	Description
`alpha_score`	0–100 composite quality score at time of signal
`wallet_graduation_rate`	Share of tokens bought that graduated, at signal time
`wallet_avg_buy_rank`	Typical buy rank at signal time (lower = earlier entry)
`wallet_trades_at`	Total trades observed at signal time

Wallet behavioural (15 features)

These features describe the wallet’s trading behaviour across its full observed history. They are computed by FeatureComputer on a 30-minute rolling basis for all active wallets.

Feature	Description
`tokens_traded`	Total unique tokens bought
`pct_top10_entries`	Fraction of buys that were in the first 10 wallets
`avg_seconds_after_creation`	Average delay between token creation and the wallet’s buy
`pct_buys_under_60s`	Fraction of buys within 60 seconds of token creation
`pct_buys_under_300s`	Fraction of buys within 300 seconds of token creation
`avg_sol_per_buy`	Mean buy size in SOL
`median_sol_per_buy`	Median buy size in SOL
`sol_size_stddev`	Standard deviation of buy sizes (high = inconsistent sizing)
`max_single_buy`	Largest single buy ever recorded
`avg_hold_time_seconds`	Average time between buy and sell
`pct_quick_flips`	Fraction of positions closed within 5 minutes
`pct_diamond_hands`	Fraction of positions held over 24 hours
`unique_creators_traded`	Number of distinct token creators the wallet has engaged with
`pct_repeat_creator_buys`	Fraction of buys on tokens from creators the wallet has bought before
`active_days_30d`	Distinct active trading days in the last 30 days

Wallet returns (11 features)

Computed from PnlCalculator and PeakTracker output. These features describe how profitable the wallet has been historically — both in terms of realised exits and peak opportunity captured.

Feature	Description
`graduation_rate`	Share of bought tokens that graduated
`win_rate`	Share of closed positions that exited in profit
`sol_weighted_return`	SOL-weighted average return across all positions
`avg_realized_multiple`	Average exit multiple
`avg_peak_multiple`	Average of peak prices seen during the hold period
`capture_efficiency`	Ratio of realised multiple to peak multiple (0–1)
`profit_factor`	Total profit divided by total loss
`return_stddev`	Standard deviation of per-trade returns
`avg_loss_pct_on_losers`	Average loss magnitude on losing trades
`avg_gain_pct_on_winners`	Average gain magnitude on winning trades
`is_bot`	1 if the wallet has been classified as a bot by `BotDetector`

Wallet context (1 feature)

Feature	Description
`wallet_age_days`	Days since the wallet was first observed by the pipeline

Signal-level (10 features)

These features describe the specific buy event that triggered the signal, not the wallet’s history. They capture what is happening at this token, at this moment, for this particular buy.

Feature	Description
`buy_rank`	This wallet’s buy rank on this token
`sol_amount`	Size of this specific buy in SOL
`curve_pct_at_buy`	Bonding curve fill percentage at time of buy (0–1)
`curve_sol`	Curve SOL reserves at time of buy
`velocity_buys_60s`	Number of buys on this token in the last 60 seconds
`velocity_buys_300s`	Number of buys on this token in the last 300 seconds
`sol_volume_60s`	SOL volume on this token in the last 60 seconds
`buy_rank_percentile`	This wallet’s buy rank divided by unique buyer count (0–1)
`sol_vs_wallet_avg`	This buy’s SOL amount relative to the wallet’s average (1.0 = typical size)
`token_age_at_signal`	Token age in seconds at time of signal

Temporal (3 features)

These features encode time-of-day and day-of-week patterns. Market dynamics on Pump.fun vary significantly by time: weekend afternoons have different volume profiles than weekday mornings.

The source documentation labels this category as containing 2 features, but three are defined in the feature order.

Feature	Description
`buy_sell_ratio`	Total buys divided by (total sells + 1) at time of signal
`hour_of_day`	UTC hour of the signal (0–23)
`day_of_week`	Day of week (0 = Sunday, 6 = Saturday)

Creator intelligence (8 features)

Sourced from CreatorRiskScorer, which tracks historical performance and behaviour patterns for every token creator the pipeline has observed.

Feature	Description
`creator_graduation_rate`	This token creator’s historical graduation rate across all their tokens
`creator_rug_rate`	Quick-death rate across this creator’s tokens
`creator_risk_score`	0–100 composite creator risk score
`creator_tokens_created`	Total tokens created by this creator
`creator_serial_velocity`	Tokens created per day over the last 30 days
`creator_avg_insider_pct`	Average insider presence across the creator’s tokens
`creator_avg_bot_pct`	Average bot buyer percentage across the creator’s tokens
`creator_is_serial`	1 if the creator has been classified as a serial launcher

Token state (7 features)

These features capture the aggregate state of the token at the moment the signal fires, describing how much activity has accumulated and what risk indicators are present.

Feature	Description
`token_unique_buyers`	Unique buyers on this token so far
`token_total_buys`	Total buy transactions on this token
`token_total_sells`	Total sell transactions on this token
`token_risk_score`	Token-level risk score (0–100) from `RiskScorer`
`token_bot_buyer_pct`	Fraction of buyers on this token classified as bots
`token_top10_concentration`	Share of token supply held by the top 10 wallets
`token_bundle_confidence`	Highest bundle confidence score detected for this token

Lifecycle (1 feature)

Feature	Description
`lifecycle_state_encoded`	Token lifecycle state encoded as an integer (0 = launch … 7 = graduated)

Sourced from the CoOccurrence and GraphBuilder services, which track which wallets tend to buy the same tokens together and build a cluster graph from that data.

The source documentation labels this category as containing 4 features, but five are defined in the feature order.

Feature	Description
`cluster_size`	Number of wallets in this wallet’s co-occurrence cluster
`co_occurrence_max_score`	Highest co-occurrence edge score for this wallet
`cluster_avg_grad_rate`	Average graduation rate of wallets in the same cluster
`tracked_wallets_already_in`	Count of tracked wallets already holding this token when the signal fires
`is_first_tracked_buy`	1 if this is the first tracked wallet to buy this token

Market context (2 features)

These features encode the broader market environment at signal time. A signal that looks identical in isolation may have very different expected outcomes depending on whether the market is in a hot creation period or a slow one.

Feature	Description
`tokens_created_last_hour`	Market-wide token creation rate over the last hour
`rolling_graduation_rate_2h`	Market-wide graduation rate over the last 2 hours

Default values

Every feature has a carefully chosen default value applied when the underlying data is unavailable — for example, when a signal fires from a wallet that has never been seen before, or when a token is too new to have velocity data.

Default values represent a neutral, unknown wallet in a neutral market — not a worst-case assumption. Using worst-case defaults would introduce systematic pessimism bias: the model would learn to treat unknown signals as bad signals, which is not correct. An unseen wallet could be excellent.

Defaults are chosen to be plausible midpoints within each feature’s observed distribution. For example:

wallet_avg_buy_rank defaults to 50 (median rank)
avg_sol_per_buy defaults to 0.5 SOL (typical small buy)
lifecycle_state_encoded defaults to -1 (unknown / not yet classified)

The models were trained with these same defaults applied whenever data was missing during training. This means the model has learned to handle the default values correctly and will not produce anomalous outputs when they appear at inference time.

Model architecture

How the feature vector is assembled into an ONNX input tensor and passed through Platt calibration.

Training methodology

How the 68-feature dataset is constructed, labelled, and used to train each model.

Getting Started

The Pipeline

Intelligence

ML System

Live Trader

ML feature reference: the 68-feature signal vector

Wallet point-in-time (4 features)

Wallet behavioural (15 features)

Wallet returns (11 features)

Wallet context (1 feature)

Signal-level (10 features)

Temporal (3 features)

Creator intelligence (8 features)

Token state (7 features)

Lifecycle (1 feature)

Market context (2 features)

Default values

Model architecture

Training methodology

Build docs developers (and LLMs) love

Getting Started

The Pipeline

Intelligence

ML System

Live Trader

Documentation Index

​Wallet point-in-time (4 features)

​Wallet behavioural (15 features)

​Wallet returns (11 features)

​Wallet context (1 feature)

​Signal-level (10 features)

​Temporal (3 features)

​Creator intelligence (8 features)

​Token state (7 features)

​Lifecycle (1 feature)

​Social graph (5 features)

​Market context (2 features)

​Default values

Model architecture

Training methodology

Build docs developers (and LLMs) love

Wallet point-in-time (4 features)

Wallet behavioural (15 features)

Wallet returns (11 features)

Wallet context (1 feature)

Signal-level (10 features)

Temporal (3 features)

Creator intelligence (8 features)

Token state (7 features)

Lifecycle (1 feature)

Social graph (5 features)

Market context (2 features)

Default values