Overview
HH-suite provides several utilities for working with A3M (FASTA-like) multiple sequence alignments. These tools enable compression, extraction, filtering, and reduction of A3M files, which is essential for managing large alignment databases.Available Tools
a3m_compress
Compress A3M alignments by storing sequences as references to a sequence database
a3m_extract
Extract compressed A3M alignments back to standard A3M format
a3m_reduce
Reduce redundancy in A3M alignments by filtering similar sequences
a3m_database_reduce
Reduce A3M databases in FFindex format
a3m_database_extract
Extract A3M alignments from FFindex databases
a3m_database_filter
Filter A3M databases based on various criteria
a3m_extract
Overview
Extracts compressed A3M alignments back to standard A3M format. The compressed format stores sequences as indices and block encodings that reference a sequence database.Usage
Options
| Option | Description |
|---|---|
-i <file> | Input compressed A3M file or stdin |
-o <file> | Output A3M file or stdout |
-d <prefix> | FFindex sequence database prefix (without .ffdata/.ffindex) |
-q <prefix> | FFindex header database prefix |
-h | Display help message |
Example
Compression Algorithm
How A3M Compression Works
How A3M Compression Works
The compression algorithm works by:
- Storing the consensus sequence in plain text
- Referencing sequences by index from a pre-built sequence database
- Encoding alignments as blocks of matches, insertions, and deletions:
- Matches: Upper-case letters aligned to consensus
- Insertions: Lower-case letters (not in consensus)
- Deletions: Gaps in the sequence relative to consensus
Identify Sequence
Extract sequence ID from header and look up in sequence database (source:
a3m_compress.cpp:356-382)Find Start Position
Determine where the aligned sequence starts in the full sequence (source:
a3m_compress.cpp:477-498)Encode Blocks
Store sequence as blocks of:
- Number of matches (upper-case residues)
- Number of insertions (lower-case residues) or deletions (gaps)
a3m_compress.cpp:396-473)A3M Format Validation
Source:scripts/a3m.py
The A3M format uses specific character meanings:
Valid Characters
Special Sequences
A3M files can include annotation sequences:| Header | Description | Valid Characters |
|---|---|---|
>ss_pred | Predicted secondary structure | E (extended), C (coil), H (helix) |
>ss_conf | Secondary structure confidence | 0-9 (confidence levels) |
>ss_dssp | DSSP secondary structure | CHBEGITS- |
>*_consensus | Consensus sequence | Standard amino acids |
Working with FFindex Databases
a3m_database_extract
Extract specific entries from an A3M FFindex database:a3m_database_reduce
Reduce redundancy in an entire A3M database:a3m_database_filter
Filter database entries by various criteria:Python Utilities
Source:scripts/a3m.py
The a3m.py module provides Python utilities for A3M manipulation:
Key Methods
check_and_add_sequence(): Validate and add a sequence to the containercheck_match_states(): Verify all sequences have the same number of match statessplit_a3m(): Extract a subsequence range from the alignmentget_sub_sequence(): Get a specific region of a sequence
Performance Tips
Best Practices
Best Practices
- Compress alignments when storing large databases to save disk space
- Use FFindex format for databases with many alignments
- Validate A3M files with
check_a3m.pybefore processing - Keep sequence databases when using compressed format
- Build separate header and sequence databases for optimal compression
Common Workflows
Create Compressed Database
Error Handling
Common errors and solutions:| Error | Cause | Solution |
|---|---|---|
| ”More than one consensus sequence” | Multiple sequences ending in _consensus | Ensure only one consensus per A3M |
| ”No protein sequences could be compressed” | No matching sequences in database | Check sequence IDs match database |
| ”Sequence with zero match states” | Empty or invalid sequence | Validate A3M format |
| ”Diverging number of match states” | Sequences have different lengths | Check alignment integrity |
Related Tools
- reformat.pl - Convert between alignment formats
- FFindex Tools - Manage FFindex databases
- File Formats - A3M format specification