Overview
HHM is the Hidden Markov Model format used by HH-suite. It contains profile information derived from multiple sequence alignments, including amino acid frequencies, transition probabilities, and optional secondary structure predictions.Format Specification
Header Section
The file begins with metadata:Header Fields
- HHsearch: Version identifier
- NAME: Protein name and description
- FAM: Family information (optional)
- FILE: Base filename
- COM: Command used to generate the HMM
- DATE: Creation timestamp
- LENG: Number of match states and alignment columns
- FILT: Filter statistics
- NEFF: Effective number of sequences (diversity measure)
Sequence Section
Consensus and representative sequences:- Consensus sequence (derived from alignment)
- Representative sequences from the MSA
NULL Model
Background amino acid frequencies:HMM Header Line
Match State Entries
For each match state:- State type (M for match)
- Position number
- Emission probabilities for 20 amino acids (negative log probabilities, * = -infinity)
- Neff value
- Transition probabilities (M->M, M->I, M->D, I->M, I->I, D->M, D->D)
- Neff values for insert and delete states
Emission Probabilities
Emission values are stored as:- Negative log-scale probabilities (base unclear, often -log2)
*represents probability zero (negative infinity in log space)- Smaller numbers indicate higher probability
Transition Probabilities
Transitions between states:- M->M: Match to match
- M->I: Match to insert
- M->D: Match to delete
- I->M: Insert to match
- I->I: Insert to insert
- D->M: Delete to match
- D->D: Delete to delete
Example Match State
- Position 2
- Amino acid R (Arginine)
- High probability for R (value 293) and Q (value 2443)
- Neff = 2 (low diversity at this position)
Creating HHM Files
From Alignment
With Custom Name
With Pseudocounts
Using HHM Files
As Query
As Template
Binary Format
HH-suite can also use a binary HHM format (.hhm.bin) for faster loading:- More compact storage
- Faster parsing
- Generated automatically by some tools
Best Practices
Building Quality HMMs
- Diverse alignments: Use sequences with varied identity (filter with hhfilter)
- Sufficient sequences: Aim for Neff > 4 for good coverage
- Quality filtering: Remove low-coverage sequences
- Pseudocounts: Use context-specific pseudocounts for better profiles
Neff Values
- Neff < 2: Low diversity, may need more sequences
- Neff 4-8: Good diversity for most purposes
- Neff > 10: High diversity, excellent for sensitive searches
File Size Considerations
HHM files are typically:- Larger than input alignments (due to probability matrices)
- Smaller than storing full alignments
- Text format: ~5-10 KB per 100 residues
- Binary format: ~50% smaller
See Also
- A3M Format - Input alignment format
- HHR Format - Results output format
- hhmake - Build HMMs from alignments
- hhsearch - Search with HMMs