Byzantine Fault Tolerance
Styx incorporates Byzantine Fault Tolerance (BFT) mechanisms to handle malicious or arbitrarily faulty nodes. This ensures the system can provide accurate liveness information even when some witnesses are compromised, malicious, or behaving incorrectly.Byzantine Fault Model
In a Byzantine fault scenario, nodes may:- Lie about observations - Report that alive nodes are dead or vice versa
- Send conflicting reports - Tell different observers different things
- Collude with other malicious nodes - Coordinate to mislead the system
- Behave arbitrarily - Any behavior not specified by the protocol
BFT Tolerance Threshold
n total nodes with f Byzantine nodes: n > 3f
This means:
- With 4 nodes, tolerate 1 Byzantine node
- With 7 nodes, tolerate 2 Byzantine nodes
- With 10 nodes, tolerate 3 Byzantine nodes
Why 3f+1?
With3f+1 total nodes:
- Malicious nodes:
f - Honest nodes:
2f+1(a majority) - Even if
fnodes are Byzantine: The remaining2f+1honest nodes constitute a majority - Even if
fhonest nodes are unreachable: Still havef+1honest nodes, which is more than thefmalicious ones
Cryptographic Signatures
Styx uses Ed25519 digital signatures to prevent forgery and ensure report authenticity.Signed Reports
- Content: Who is reporting (Witness), about whom (Target), what they believe (Belief), when (Timestamp)
- Signature: Cryptographic proof that the witness created this report
- Public Key: The witness’s identity used to verify the signature
Key Pair Generation
Signing Reports
Verifying Signatures
- The signature is cryptographically valid
- It was created by the witness claiming to have sent it
- The report content hasn’t been tampered with
BFT Aggregator
The BFTAggregator combines reports while tolerating Byzantine behavior.Sybil Attack Prevention
A Sybil attack involves a single malicious actor creating multiple fake identities to gain disproportionate influence. Styx prevents this by:Byzantine Tolerance Calculation
- 4 reports → tolerate 1 Byzantine node
- 7 reports → tolerate 2 Byzantine nodes
- 10 reports → tolerate 3 Byzantine nodes
Trimmed Mean Aggregation
To handle Byzantine behavior, Styx uses a trimmed mean approach:How Trimmed Mean Works
Givenn witness reports about node X:
- Sort reports by a metric (e.g., alive confidence)
- Remove
fhighest values (removes optimistic Byzantine nodes) - Remove
flowest values (removes pessimistic Byzantine nodes) - Average the remaining
n-2fvalues
f=2):
| Witness | Alive % | Dead % | Unknown % | |
|---|---|---|---|---|
| W1 | 95 | 0 | 5 | ← Remove (highest) |
| W2 | 90 | 5 | 5 | ← Remove (highest) |
| W3 | 85 | 10 | 5 | ← Keep |
| W4 | 80 | 15 | 5 | ← Keep |
| W5 | 75 | 20 | 5 | ← Keep |
| W6 | 10 | 85 | 5 | ← Remove (lowest) |
| W7 | 5 | 90 | 5 | ← Remove (lowest) |
[A:80% D:15% U:5%]
Even though W1, W2 tried to inflate alive confidence and W6, W7 tried to inflate dead confidence (Byzantine behavior), the trimmed mean gives an accurate result.
Defense Mechanisms
1. Signature Verification
Threat: Forged reports claiming to be from trusted witnesses Defense:2. Unique Public Keys
Threat: Sybil attack (one malicious actor, many fake identities) Defense:3. Trimmed Mean
Threat: Byzantine nodes reporting extreme values Defense: Removef highest and f lowest reports before averaging
4. Trust Scoring
Threat: Witnesses that are sometimes wrong or malicious Defense: Trust scores decay with wrong reports (see Trust Scoring)5. Minimum Witnesses
Threat: Insufficient witnesses for BFT guarantees Defense:6. Partition Detection
Threat: Network partition causes witnesses to see different realities Defense: Detect partitions and refuse to answer when detected (see System Design)Attack Scenarios
Scenario 1: Malicious Minority
Setup:- 10 total witnesses
- 3 malicious (within 33% threshold)
- 7 honest
Scenario 2: Malicious Majority
Setup:- 10 total witnesses
- 7 malicious (exceeds 33% threshold)
- 3 honest
Scenario 3: Sybil Attack
Attack: Single malicious actor generates 100 keypairs and sends 100 reports Defense:Scenario 4: Forgery Attack
Attack: Malicious actor tries to forge a report claiming to be from trusted witness W1 Defense:Integration with Trust Scoring
BFT and trust scoring work together:| Mechanism | Handles | Timeframe |
|---|---|---|
| BFT (trimmed mean) | Adversarial behavior | Immediate (per query) |
| Trust scoring | Mistakes and degradation | Long-term (across queries) |
| Partition detection | Network splits | Real-time (per query) |
- Byzantine witness sends false report
- BFT trimmed mean filters it out immediately
- Trust score decays over time as more false reports are detected
- Eventually, Byzantine witness reaches minimum trust (0.1)
- Even if not filtered by BFT, its reports have minimal weight
BFT Guarantees
Styx provides the following BFT guarantees:Safety (with ≤ 33% Byzantine nodes)
- No false death declarations - Byzantine nodes cannot cause incorrect death finality
- No forged reports - Cryptographic signatures prevent forgery
- No Sybil influence - One key = one vote
Liveness (with ≤ 33% Byzantine nodes)
- Queries complete - System returns a belief (or explicitly refuses)
- Honest reports matter - Honest majority’s view is reflected
Transparency
- Explicit refusal - When unable to provide confident answer, system refuses rather than guesses
- Evidence provided - Query results include which witnesses contributed
Limitations
Styx’s BFT mechanisms have limitations:1. 33% Threshold
If more than 1/3 of witnesses are Byzantine, guarantees break down. Mitigation: Choose trusted witnesses carefully, monitor trust scores.2. Public Key Infrastructure
Requires witnesses to have secure keypairs. Mitigation: Use secure key generation and storage practices.3. Collusion Detection
Styx cannot detect if Byzantine nodes are colluding (working together). Mitigation: Correlation detection reduces confidence when witnesses are too similar (see Trust Scoring).4. Adaptive Attacks
Malicious witnesses can observe system behavior and adapt their strategy. Mitigation: Trust scoring makes sustained attacks costly (trust decays with wrong reports).Configuration
- minWitnesses: 3-7 depending on security requirements
- Lower → faster bootstrap, lower BFT tolerance
- Higher → stronger BFT guarantees, slower bootstrap
Best Practices
For System Operators
- Deploy more than
3f+1witnesses - Provides tolerance for both Byzantine and crash faults - Monitor witness behavior - Watch for trust score patterns indicating malicious activity
- Rotate witness keys - Periodic key rotation limits damage from compromised keys
- Geographically distribute witnesses - Reduces correlation and common-mode failures
For Witness Operators
- Secure private keys - Use hardware security modules (HSMs) or secure enclaves
- Verify message integrity - Ensure your implementation correctly signs all report fields
- Monitor your reports - Check if your reports align with aggregated beliefs
- Investigate anomalies - If your reports often disagree, investigate why
Testing Byzantine Scenarios
The codebase includes brutal Byzantine testing:Related Components
- System Design - Overall architecture including BFT integration
- Trust Scoring - Complementary long-term defense mechanism
- Core Concepts - Understanding beliefs and confidence
Further Reading
- Byzantine Generals Problem - Lamport, Shostak, Pease (1982)
- Practical Byzantine Fault Tolerance - Castro, Liskov (1999)
- Ed25519 Signature Scheme - Bernstein et al. (2011)