Overview
HH-suite uses several specialized file formats for storing alignments, HMM profiles, and search results. Understanding these formats is essential for working with HH-suite tools and integrating them into custom workflows.A3M Format
Description
A3M is a compact multiple sequence alignment format that distinguishes between match states (aligned to consensus) and insert states (not aligned to consensus).Format Specification
Source:scripts/a3m.py:13-20
Character Meanings:
- Upper case letters (
A-Z): Match states (aligned to consensus) - Lower case letters (
a-z): Insert states (not aligned to consensus) - Dash (
-): Deletion in match columns - Dot (
.): Gap aligned to insert states (optional in A3M, required in A2M)
Example
- Query has 53 match states
- Homolog_1:
kgapegare insertions (not in consensus) - Homolog_1:
--are deletions (consensus has residues here) - Homolog_2: Lower case
sveqpare insertions
Special Sequences
A3M files can include annotation lines:Secondary Structure Annotations
Secondary Structure Annotations
H= HelixE= Extended (beta sheet)C= Coil/Loop
0-9= Confidence level (0=low, 9=high)
H= Alpha helixG= 3-10 helixI= Pi helixE= Extended (beta sheet)B= Beta bridgeT= TurnS= BendC= Coil
Consensus Sequence
One sequence can be marked as consensus by appending_consensus to its name:
A3M vs A2M
The key difference: A2M: Gaps aligned to inserts are explicitly represented with dotsHHM Format
Description
HHM (HH-suite HMM) format stores Hidden Markov Model profiles with amino acid emission and transition probabilities.File Structure
Match State Block
Each match state contains:- POS: Position in the alignment (1-based)
- Emission probabilities: 20 values for each amino acid (in log scale)
- Transition probabilities: 7 values for state transitions (in log scale)
Example
Probabilities are stored as log-odds scores. Asterisks (*) represent infinity (impossible transitions).
HHR Format
Description
HHR is the human-readable results format fromhhsearch, hhblits, and hhalign. It contains alignment details, statistics, and match summaries.
File Structure
Match Summary Format
- No: Rank of hit
- Hit: Database identifier and description
- Prob: Probability that hit is a true positive (0-100%)
- E-value: Expected number of false positives with this score or better
- P-value: Probability of false positive
- Score: Raw alignment score (bits)
- SS: Secondary structure score
- Cols: Number of aligned columns
- Query HMM: Query alignment range
- Template HMM: Template alignment range
Detailed Alignment Format
- Q ss_pred: Query secondary structure prediction
- Q [name]: Query sequence
- Q Consensus: Query consensus sequence
- Match line: Symbols indicating match quality (
|= identical,+= similar,.= weak) - T Consensus: Template consensus
- T [name]: Template sequence
- T ss_pred: Template secondary structure
Match Line Symbols
Match Line Symbols
|= Identical residues+= Similar residues (positive substitution score).= Weakly similar(space) = Dissimilar or gaps
Other Formats
FASTA Format
Standard unaligned sequence format:STOCKHOLM Format
Used by HMMER and some other tools:PSI-BLAST Format
Position-Specific Scoring Matrix format:Compressed Formats
CA3M Format
Compressed A3M format that stores sequences as references to a sequence database. Source:src/a3m_compress.cpp:245-354
Structure:
- Header/commentary (optional)
- Consensus sequence
- Separator:
; - Compressed sequences:
- 4 bytes: Database entry index
- 2 bytes: Start position in sequence
- 2 bytes: Number of blocks
- Block data: Match counts + insertion/deletion counts
FFindex Format
FFindex is a database format for storing many small files efficiently: Index file (.ffindex):
.ffdata):
Concatenated data with entries at specified offsets.
Format Conversion
Usereformat.pl to convert between formats:
Format Validation
Validate A3M Files
- Valid character sets
- Consistent match state counts
- Proper consensus sequence
- Valid annotations
Common Format Errors
Format Best Practices
Tips for Working with HH-suite Formats
Tips for Working with HH-suite Formats
- Use A3M for alignments: It’s more compact than A2M and supported by all HH-suite tools
-
Add secondary structure: Include
>ss_predand>ss_conflines for better search performance -
Validate before processing: Run
check_a3m.pyon alignments before database building - Use FFindex for databases: Essential for efficient storage and access of many profiles
- Compress large databases: Use CA3M format to reduce disk space by 60-80%
-
Keep all database files: HH-suite databases need both
.ffdataand.ffindexfiles - Parse HHR programmatically: The HHR format is human-readable but can be parsed for automated workflows
Format Specifications Summary
| Format | Type | Used For | Tools |
|---|---|---|---|
| A3M | Alignment | Multiple sequence alignments | hhblits, hhmake |
| HHM | Profile | HMM profiles with emissions/transitions | hhsearch, hhalign |
| HHR | Results | Search results and alignments | hhsearch output |
| CA3M | Compressed | Compressed alignments | a3m_extract |
| FFindex | Database | Efficient multi-file storage | All database tools |
| FASTA | Sequence | Single sequences | Input format |
See Also
- A3M Tools - Working with A3M files
- Helper Scripts - Format conversion utilities
- Building Custom Databases - Database format requirements
- File Format Examples - Example files in the HH-suite repository