System Design
Styx is a distributed failure detection system that uses probabilistic beliefs instead of binary true/false states. The architecture is designed around the core principle that in distributed systems, we cannot know with certainty whether a node is alive or dead.Core Architecture
High-Level Components
Component Responsibilities
Oracle (oracle/oracle.go)
The Oracle is the main interface to Styx. It:
- Receives witness reports about node liveness
- Aggregates multiple witness opinions
- Detects network partitions
- Refuses to answer when uncertain (never guesses)
- Manages the finality of death declarations
Query(target NodeID)- Ask about a node’s livenessQueryWithRequirement()- Query with specific confidence requirementsReceiveReport()- Record a witness reportRegisterWitness()- Add a trusted witness
Observer State (state/observer_state.go)
Each observer node maintains:
- Local beliefs about all known nodes
- Logical clock for causal ordering
- Evidence sets that support each belief
- Direct probes/health checks
- Causal events from the node
- Timeout observations
- Network quality measurements
Witness Registry (witness/registry.go)
Tracks all known witnesses and their trust scores:
- Correct reports: Trust increases by
RecoveryRate(0.05) - Wrong reports: Trust decays by
DecayRate(0.1) - Minimum trust: 0.1 (witnesses never reach zero weight)
- Default trust: 0.8 (new witnesses start high)
Witness Aggregator (witness/aggregator.go)
Combines multiple witness reports into a single belief:
- Trust-weighted average of all witness beliefs
- Disagreement detection (P10: preserves uncertainty when witnesses disagree)
- Correlation detection (P11: reduces confidence when witnesses are too similar)
- Uncertainty injection when disagreement is high
Finality Engine (finality/engine.go)
Handles irreversible death declarations with strict requirements:
- Minimum dead confidence: 85%
- Minimum witnesses: 3
- Maximum disagreement: 20%
- Must have non-timeout evidence (P15: silence alone cannot trigger death)
Partition Detector (partition/detector.go)
Detects network partitions by analyzing witness disagreement:
- No partition: Witnesses agree
- Suspected partition: Some disagreement detected
- Confirmed partition: Witnesses split into conflicting groups (>40% disagreement)
Data Flow
Evidence Collection Flow
Witness Report Flow
Query Flow
Belief Representation
Styx uses a three-state probability distribution:[A:90% D:5% U:5%]- High confidence node is alive[A:10% D:80% U:10%]- High confidence node is dead[A:30% D:30% U:40%]- High uncertainty (dominant state: UNKNOWN)[A:0% D:0% U:100%]- Pure uncertainty (initial state)
Key Design Principles
P4: No Evidence = No Conclusion
Lack of response (timeout) is weak evidence, not proof of death.P6: Load ≠ Failure
GC pauses, scheduling jitter, and high load are detected and do NOT trigger death.P10: Disagreement is Preserved
When witnesses disagree, uncertainty increases rather than averaging to false confidence.P11: Correlated Witnesses Weaken Confidence
If all witnesses report identical beliefs (correlation > 0.9), confidence is reduced to detect potential sybil attacks or common-mode failures.P12: Witness Trust Decays
Witnesses that provide incorrect reports lose trust over time, reducing their influence.P13: False Death is Forbidden
Death requires overwhelming evidence to prevent false positives.P14: Death is Irreversible
Once declared dead, a node can never be resurrected (matches real-world semantics).P15: Silence ≠ Death
Timeout alone cannot trigger death - requires positive evidence of failure.API Interface
Styx exposes an HTTP API (api/server.go):
Query Endpoint
Report Endpoint
Evidence Types
| Evidence Type | Suggests | Weight | Notes |
|---|---|---|---|
| DirectResponse | Alive | 0.6-1.0 | Weight decreases with latency |
| Timeout | Dead | 0.1-0.3 | Always weak (P4, P15) |
| CausalEvent | Alive | 1.0 | Strong proof of recent liveness |
| WitnessReport | Varies | Trust-based | Indirect evidence |
| SchedulingJitter | Uncertain | 0.2-0.4 | Reduces confidence in timeouts |
| NetworkInstability | Uncertain | Varies | Contextual information |
Logical Time
Styx uses Lamport logical clocks (time/logical.go) for:
- Causal ordering of events
- Evidence age calculation
- Belief decay over time
Thread Safety
All core components are thread-safe usingsync.RWMutex:
- Oracle handles concurrent queries
- Registry handles concurrent witness updates
- Aggregator can process reports in parallel
Performance Characteristics
- Query latency: O(witnesses) - linear in number of witness reports
- Evidence storage: O(nodes × evidence_per_node)
- Witness registry: O(witnesses) lookup time
- Partition detection: O(witnesses) per query
Next Steps
Learn more about specific subsystems:- Trust Scoring - How witness trust evolves
- Byzantine Fault Tolerance - Handling malicious nodes