Phase 2: Signal intelligence and token lifecycle

Phase 2 transforms raw trade data and Phase 1 wallet scores into structured intelligence about tokens, creators, and wallet relationships. While Phase 1 answers how good is this wallet, Phase 2 answers what is happening around this token right now — and is that activity organic. The outputs of Phase 2 are consumed directly by the ML feature vector and the live trader’s position sizing logic.

TokenLifecycle

8-state lifecycle classifier updated every 60 seconds using real-time velocity data.

BundleDetector

5-second time-bucket clustering to identify coordinated buy activity.

CreatorRiskScorer

Rug rate, insider presence, bot buyer percentage, and serial velocity per creator.

SignalScorer

The SignalScorer computes a composite rule-based score for each signal, stored as rule_score. This score is used as a fallback when ML inference is unavailable or disabled, and it is also included as a feature in the ML model — allowing the model to learn which rule-based patterns are actually predictive vs. which are noise. The rule score is a weighted combination of five inputs:

Input	Source
Wallet alpha score	`WalletScorer` output from Phase 1
Token velocity	`buysLast60s` / elapsed minutes from `VelocityTracker`
Buy rank	Position of this wallet’s entry relative to all other buyers
Token lifecycle state	Current state from `TokenLifecycle` (numeric encoding 0–7)
Creator risk score	Risk score from `CreatorRiskScorer` (inverted — lower risk = higher contribution)

The rule score gives the system a meaningful signal quality estimate from the moment a signal is emitted, before the ML inference cycle (every 5 seconds) has had a chance to score it. For the live trader, the rule score acts as a pre-filter.

TokenLifecycle

The TokenLifecycle classifier assigns every non-graduated token a lifecycle state, updated every 60 seconds. It uses real-time velocity data from Redis — specifically buysLast60s, buysLast300s, and corresponding sell counts — alongside the token’s age in seconds to select the appropriate state.

State	Numeric	Description
`launch`	0	Token is under 60 seconds old. Any activity is expected at this stage.
`early_accumulation`	1	Under 5 minutes old with rising buy pressure and few sellers.
`momentum`	2	Sustained buy velocity with new unique buyers joining.
`euphoria`	3	High velocity, strong SOL inflow, aggressive price action — the peak of retail excitement.
`distribution`	4	Smart wallets selling into retail buying pressure. A key exit signal for the live trader.
`decline`	5	Falling velocity with increasing sell pressure. Momentum has broken.
`dead`	6	No trades recorded for 5 or more minutes. The token has stalled.
`graduated`	7	Token has crossed 85 SOL in reserves and migrated to Raydium or PumpSwap.

The lifecycle state is encoded as a numeric value (0–7) and included as a feature in both the standard and genesis ML models. A distribution state is a strong negative signal — it indicates that the wallets with the best information are exiting, not entering.

A token in distribution state should be treated as an exit signal for any open position, not an entry opportunity. The live trader’s exit monitor reads the lifecycle state on every 3-second poll and can trigger early exit when distribution is detected.

BundleDetector

The BundleDetector identifies coordinated buy activity — groups of wallets buying the same token within the same 5-second time window, potentially as part of a sniping operation or pump-and-dump scheme. It runs every 10 minutes, scanning the last 15 minutes of trades. On first startup, it performs a historical scan of the last 24 hours in 2-hour chunks.

Detection logic

For each token, trades are grouped into 5-second time buckets. Any bucket containing 3 or more distinct wallets becomes a cluster candidate. Each candidate cluster is then scored against four confidence signals:

Signal	Condition	Confidence boost
Similar buy sizes	Amount coefficient of variation < 0.3	+25%
Consecutive entries	Buy rank span ≤ wallet count in cluster	+20%
Medium cluster	5 or more wallets in the cluster	+15%
Large cluster	10 or more wallets in the cluster	+10%

Clusters that score above the 30% base confidence threshold are written to detected_bundles. Each detection is classified with one of three methods:

time_window — proximity in time is the primary signal
similar_amounts — buy size uniformity is the dominant indicator
same_slot_coordinated — multiple wallets buying in the same slot, the strongest form of coordination

Side effects on co-occurrence data

For every detected bundle, all wallet pairs in the cluster have their buy_overlap_count incremented in wallet_co_occurrence. This data feeds both the CoOccurrence analysis and the CopyTradeDetector in Phase 3.

Bundle detection runs retrospectively over a 15-minute window rather than inline, which means it will not catch bundles the instant they form. However, this design avoids false positives from normal coincidental co-buying, because the 15-minute window provides enough context to distinguish coordination from coincidence.

CreatorRiskScorer

The CreatorRiskScorer builds a risk profile for every token creator with 2 or more tokens, updated every 30 minutes. A creator’s risk score reflects their historical pattern of behaviour across all their tokens.

Metric	Description
Rug rate	Share of tokens that died within 10 minutes of launch (last trade within 600 seconds of creation, never graduated)
Avg insider presence	Average number of insider wallets detected in the early buyer set across this creator’s tokens
Avg bot buyer pct	Average percentage of buyers classified as bots across this creator’s tokens
Serial velocity	Rate of token creation over the last 30 days, expressed as tokens per day
Risk score (0–100)	Weighted composite of the above metrics

A creator who has launched 20 tokens, 80% of which died within minutes and 50% of whose buyers were bots, will score near 100. This score feeds directly into:

The ML feature vector in Phase 3
The AntiSignalEmitter in Phase 3, which uses it as one of six independent risk triggers

CoOccurrence

The CoOccurrence service builds and maintains a wallet-pair co-occurrence matrix in wallet_co_occurrence. Every time two tracked wallets buy the same token within a short window, their pair’s overlap count is incremented. The table also tracks directional timing information between wallet pairs:

Field	Description
`buy_overlap_count`	Total number of tokens both wallets have bought in the same window
`avg_buy_delta_seconds`	Average time in seconds between wallet A’s buy and wallet B’s buy
`buy_delta_stddev`	Standard deviation of the time delta — measures consistency of the pattern
`a_buys_first_ratio`	Fraction of overlapping trades where wallet A buys before wallet B

The a_buys_first_ratio field is a directional indicator: a value consistently above 0.8 suggests wallet A is a leader that wallet B is following, which is the primary input to the CopyTradeDetector in Phase 3.

GraphBuilder

The GraphBuilder runs every hour and spawns a Python subprocess (src/ml/graph_builder.py) to perform graph-level analysis on the wallet_co_occurrence data. The Python layer applies community detection algorithms to identify clusters of wallets that co-buy frequently enough to be considered a coordinated group. For each identified cluster, the following features are computed and made available to the ML feature vector:

Feature	Description
`cluster_size`	Number of wallets in the identified cluster
`cluster_avg_grad_rate`	Average graduation rate across all wallets in the cluster
`co_occurrence_max_score`	Highest co-occurrence score between the signal’s wallet and any other wallet in the dataset

These cluster features allow the ML model to assess whether a signal wallet is acting as part of a known high-quality group or appears to be trading in isolation. A wallet embedded in a high-graduation-rate cluster carries more predictive weight than an isolated wallet with the same individual alpha score.

Phase 3: ML inference

See how Phase 2 intelligence feeds into ONNX scoring, anti-signal detection, and copy-trade classification.

Adversarial detection

Deep dive into bundle detection, wash trading, and insider identification.

Getting Started

The Pipeline

Intelligence

ML System

Live Trader

Phase 2: Signal intelligence and token lifecycle

TokenLifecycle

BundleDetector

CreatorRiskScorer

SignalScorer

TokenLifecycle

BundleDetector

Detection logic

Side effects on co-occurrence data

CreatorRiskScorer

CoOccurrence

GraphBuilder

Phase 3: ML inference

Adversarial detection

Build docs developers (and LLMs) love

Getting Started

The Pipeline

Intelligence

ML System

Live Trader

Documentation Index

TokenLifecycle

BundleDetector

CreatorRiskScorer

​SignalScorer

​TokenLifecycle

​BundleDetector

​Detection logic

​Side effects on co-occurrence data

​CreatorRiskScorer

​CoOccurrence

​GraphBuilder

Phase 3: ML inference

Adversarial detection

Build docs developers (and LLMs) love

SignalScorer

TokenLifecycle

BundleDetector

Detection logic

Side effects on co-occurrence data

CreatorRiskScorer

CoOccurrence

GraphBuilder