Overview
HH-suite implements sophisticated algorithms for comparing Hidden Markov Models (HMMs) representing protein sequences. The core algorithms include:- Viterbi algorithm: Finds optimal alignment path
- Forward-Backward algorithms: Calculate alignment probabilities
- MAC (Maximum Accuracy): Posterior decoding for optimal expected accuracy
HMM-HMM Comparison
Profile HMM Structure
HH-suite uses profile HMMs with a linear topology:Each column (1 to L) contains:
- Match state (M): Emits aligned residue
- Insert state (I): Emits extra query residues
- Delete state (D): Skips template residues
Pair States
Aligning two HMMs creates pair states:| Pair State | Query State | Template State | Description |
|---|---|---|---|
| MM | Match | Match | Both emit residues |
| GD | Gap | Delete | Query gap, template emits |
| IM | Insert | Match | Query insert, template emits |
| DG | Delete | Gap | Query emits, template gap |
| MI | Match | Insert | Query emits, template insert |
Transitions only occur between MM state and the four other states (no direct I↔D transitions).
Viterbi Algorithm
The Viterbi algorithm finds the single best alignment between two HMMs using dynamic programming.Algorithm Description
Source:hhviterbialgorithm.cpp
Initialization
Initialize the top row (i=0) and leftmost column (j=0) of the dynamic programming matrix.
Recursion
For each cell (i,j), calculate scores for all pair states:Match-Match (MM):Where
S(qi, tj) is the emission score (scalar product of emission probabilities).SIMD Vectorization
HH-suite processes 4-8 alignments in parallel using SIMD instructions:- SSE2
- AVX2
hhviterbi.h
Secondary Structure Scoring
SS scoring is integrated into the Viterbi alignment:hhviterbi.h
SS scoring modes:
PRED_DSSP: Query has predicted SS, template has DSSPDSSP_PRED: Query has DSSP, template has predicted SSPRED_PRED: Both have predicted SS
Cell-Off Optimization
The “cell-off” mechanism excludes regions from alignment:hhviterbialgorithm.cpp
- Prevent self-alignments with |i-j| < threshold
- Exclude previously found alignments
- Enforce minimum overlap constraints
Forward-Backward Algorithms
Forward Algorithm
Calculates the probability of all alignment paths up to position (i,j). Source:hhforwardalgorithm.cpp
Forward recursion (simplified)
Backward Algorithm
Calculates the probability of all alignment paths from position (i,j) to the end. Source:hhbackwardalgorithm.cpp
Backward recursion (simplified)
Posterior Probabilities
Combining Forward and Backward:P_forward is the total alignment probability.
P_MM[i][j] = probability that positions i and j are aligned in state MM across all possible alignments.MAC Algorithm (Maximum Accuracy)
MAC finds the alignment with maximum expected accuracy using posterior probabilities. Source:hhmacalgorithm.cpp, hhposteriordecoder.h
Algorithm Overview
MAC Recursion
mact parameter controls greediness:- Higher
mact: shorter, more confident alignments - Lower
mact: longer, more greedy alignments
MAC vs Viterbi
Viterbi
- Finds single best path
- Maximizes joint probability
- Faster computation
- Can be overconfident
MAC
- Maximizes expected accuracy
- Considers all possible paths
- More reliable boundaries
- Requires Forward-Backward
Alignment Path Representation
State Encoding
Fromhhdecl.h:
Backtrace Structure
hhviterbi.h
Performance Optimizations
SIMD Vectorization
Key optimizations inhhviterbi.cpp:
Scalar Product (20 amino acids)
Scalar Product (20 amino acids)
hhviterbi.h
Prefetching (x86 only)
Prefetching (x86 only)
Multiple Algorithm Variants
Multiple Algorithm Variants
HH-suite compiles separate libraries for different feature combinations:
hhviterbialgorithm(base)hhviterbialgorithm_with_celloff(with exclusion)hhviterbialgorithm_with_celloff_and_ss(exclusion + SS scoring)
Memory Layout
Optimized cache access:hhviterbi.h
Complexity Analysis
Time Complexity
| Algorithm | Complexity | Notes |
|---|---|---|
| Viterbi | O(L_q × L_t) | L_q = query length, L_t = template length |
| Forward | O(L_q × L_t) | Same as Viterbi |
| Backward | O(L_q × L_t) | Same as Viterbi |
| MAC | O(L_q × L_t) | Requires F+B first |
SIMD speedup: ~4-8× faster with vectorization
Space Complexity
- Viterbi: O(L_t) per alignment (only stores one row)
- Forward-Backward: O(L_q × L_t) (stores full matrix for posteriors)
- Backtrace: O(L_q × L_t) (stores backtrace pointers)
Practical Implications
Speed vs Accuracy
- Use Viterbi-only for fast searches
- Add
-realignfor critical alignments - MAC improves boundary detection
Memory Constraints
- Viterbi: minimal memory
- MAC: needs ~8 bytes × L_q × L_t
- Use
-maxmemto limit realignment
References
HH-suite3 Paper
Steinegger et al., BMC Bioinformatics (2019)
Source Code
View implementation on GitHub
Next Steps
Parameters
Learn how to tune algorithm parameters
Compilation
Build with optimal SIMD flags