Overview
HH-suite includes numerous helper scripts written in Perl and Python for common bioinformatics tasks. These scripts handle format conversion, database preparation, secondary structure prediction, and workflow automation.Format Conversion
reformat.pl
Convert multiple sequence alignments between different formats.Supported Formats
Input formats:fas- Aligned FASTA (lower/upper case equivalent, ’.’ and ’-’ equivalent)a2m- Aligned FASTA (inserts: lower case, matches: upper case, deletes: ’-’, gaps: ’.’)a3m- Like A2M, but gaps aligned to inserts may be omittedsto- Stockholm format (HMMER output)psi- PSI-BLAST formatclu- Clustal format
fas- Aligned FASTA (all gaps as ’-’)a2m- A2M formata3m- A3M format (gaps to inserts omitted)sto- Stockholm format (sequences in one block)psi- PSI-BLAST formatclu- Clustal format
Usage
If no format is specified, the file extension is used to determine the format (
.aln is treated as clu).Options
| Option | Description |
|---|---|
-v | Verbose mode |
-num | Add number prefix to sequence names: 1:name, 2:name, etc. |
-noss | Remove secondary structure sequences (beginning with >ss_) |
-sa | Keep solvent accessibility sequences (beginning with >sa_) |
-M first | Make all columns with residue in first sequence match columns |
-M <int> | Make columns with <X% gaps match columns (for a2m/a3m output) |
-r | Remove all lower case residues (insert states) after -M is processed |
-r <int> | Remove lower case columns with >X% gaps |
-g '' | Suppress all gaps |
-g '-' | Write all gaps as ’-‘ |
-uc | Write all residues in upper case (after all other options) |
-lc | Write all residues in lower case (after all other options) |
-l <int> | Number of residues per line (default: 100) |
-d <int> | Maximum characters in nameline (default: 1000) |
Examples
Database Building
hhsuitedb.py
Create HH-suite database files from A3M and HHM files. Source:scripts/hhsuitedb.py
Usage
Options
| Option | Description |
|---|---|
-o <name> | Output database base name |
-ia3m <dir> | Directory with A3M alignment files |
-ihhm <dir> | Directory with HHM profile files |
-ics <dir> | Directory with CS219 files |
--threads <n> | Number of threads for HMM calculation |
What It Does
createdb.sh
Simplified wrapper for database creation.Sequence Processing
splitfasta.pl
Split FASTA file into individual sequence files.pdbfilter.pl / pdbfilter.py
Filter protein sequences by length, composition, or other criteria.PDB Processing
pdb2fasta.pl
Convert PDB structure files to FASTA sequences.- Multiple chains
- Modified residues
- Missing residues
cif2fasta.py
Convert mmCIF format structure files to FASTA.renumberpdb.pl
Renumber residues in PDB files.Secondary Structure
addss.pl
Add secondary structure prediction to alignments.>ss_pred and >ss_conf lines to A3M alignments.
HMM Profile Tools
create_profile_from_hhm.pl
Convert HHM profiles to other formats.create_profile_from_hmmer.pl
Convert HMMER profiles to HH-suite format.Model Building
hhmakemodel.pl / hhmakemodel.py
Build 3D models from HH-suite alignments.Model Building Workflow
Model Building Workflow
- Takes HHsearch/HHblits results (HHR file)
- Reads template structure
- Maps query sequence to template using alignment
- Builds model by copying coordinates
- Handles insertions and deletions
Alignment Processing
mergeali.pl
Merge multiple alignments into a single alignment.multithread.pl
Run HH-suite searches in parallel.- Query file
- Database name
- Number of threads
- Search program (hhblits/hhsearch)
Python Modules
a3m.py
Python module for A3M format handling (see A3M Tools for details).hh_reader.py
Python module for reading HH-suite result files.ffindex.py
Python interface to FFindex databases.FFindex Utilities
While not included in the scripts directory, HH-suite uses the FFindex library for database management:ffindex_build- Create FFindex database from filesffindex_get- Extract entry from databaseffindex_modify- Modify database entriesffindex_unpack- Extract all entries to files
HHPaths.pm
Perl module providing path configuration for HH-suite.Align.pm
Perl module for alignment manipulation.Validation Scripts
check_a3m.py
Validate A3M format files.- Valid amino acid characters
- Consistent match state counts
- Proper consensus sequence
- Valid secondary structure annotations
get_a3m_size.py
Report statistics about A3M files.- Number of sequences
- Number of match states
- Average sequence length
Best Practices
Common Workflows
Common Workflows
-
Format Conversion Pipeline
-
Database Building Pipeline
-
Batch Structure Conversion
See Also
- A3M Tools - Specialized A3M utilities
- cstranslate - Abstract state translation
- File Formats - Format specifications
- Building Custom Databases