Helper Scripts - HH-suite

Overview

HH-suite includes numerous helper scripts written in Perl and Python for common bioinformatics tasks. These scripts handle format conversion, database preparation, secondary structure prediction, and workflow automation.

Format Conversion

reformat.pl

Convert multiple sequence alignments between different formats.

Supported Formats

Input formats:

fas - Aligned FASTA (lower/upper case equivalent, ’.’ and ’-’ equivalent)
a2m - Aligned FASTA (inserts: lower case, matches: upper case, deletes: ’-’, gaps: ’.’)
a3m - Like A2M, but gaps aligned to inserts may be omitted
sto - Stockholm format (HMMER output)
psi - PSI-BLAST format
clu - Clustal format

Output formats:

fas - Aligned FASTA (all gaps as ’-’)
a2m - A2M format
a3m - A3M format (gaps to inserts omitted)
sto - Stockholm format (sequences in one block)
psi - PSI-BLAST format
clu - Clustal format

Usage

reformat.pl [informat] [outformat] <infile> <outfile> [options]

If no format is specified, the file extension is used to determine the format (.aln is treated as clu).

Options

Option	Description
`-v`	Verbose mode
`-num`	Add number prefix to sequence names: `1:name`, `2:name`, etc.
`-noss`	Remove secondary structure sequences (beginning with `>ss_`)
`-sa`	Keep solvent accessibility sequences (beginning with `>sa_`)
`-M first`	Make all columns with residue in first sequence match columns
`-M <int>`	Make columns with <X% gaps match columns (for a2m/a3m output)
`-r`	Remove all lower case residues (insert states) after `-M` is processed
`-r <int>`	Remove lower case columns with >X% gaps
`-g ''`	Suppress all gaps
`-g '-'`	Write all gaps as ’-‘
`-uc`	Write all residues in upper case (after all other options)
`-lc`	Write all residues in lower case (after all other options)
`-l <int>`	Number of residues per line (default: 100)
`-d <int>`	Maximum characters in nameline (default: 1000)

Examples

reformat.pl a3m a2m input.a3m output.a2m

Database Building

hhsuitedb.py

Create HH-suite database files from A3M and HHM files. Source: scripts/hhsuitedb.py

Usage

hhsuitedb.py -o <db_name> \
  [-ia3m <a3m_dir>] \
  [-ihhm <hhm_dir>] \
  [-ics <cs_dir>] \
  [options]

Options

Option	Description
`-o <name>`	Output database base name
`-ia3m <dir>`	Directory with A3M alignment files
`-ihhm <dir>`	Directory with HHM profile files
`-ics <dir>`	Directory with CS219 files
`--threads <n>`	Number of threads for HMM calculation

What It Does

Collect Input Files

Scans specified directories for A3M, HHM, and CS219 files

Build FFindex

Creates FFindex databases for efficient random access

Generate Missing Profiles

Calls hhmake to generate HMM profiles from A3M files if not provided

Optimize Databases

Reorganizes FFindex databases for optimal sequential access

createdb.sh

Simplified wrapper for database creation.

creatdb.sh -i input_dir/ -o database_name

Sequence Processing

splitfasta.pl

Split FASTA file into individual sequence files.

splitfasta.pl input.fasta output_dir/

pdbfilter.pl / pdbfilter.py

Filter protein sequences by length, composition, or other criteria.

pdbfilter.pl input.fasta output.fasta --min-length 50 --max-length 500

pdbfilter.py input.fasta output.fasta --min-length 50

PDB Processing

pdb2fasta.pl

Convert PDB structure files to FASTA sequences.

pdb2fasta.pl input.pdb > output.fasta

Extracts sequence information from PDB coordinate files, handling:

Multiple chains
Modified residues
Missing residues

cif2fasta.py

Convert mmCIF format structure files to FASTA.

cif2fasta.py input.cif > output.fasta

Supports the modern mmCIF format used by the PDB.

renumberpdb.pl

Renumber residues in PDB files.

renumberpdb.pl input.pdb output.pdb

Secondary Structure

addss.pl

Add secondary structure prediction to alignments.

addss.pl input.a3m output.a3m

Integrates with external secondary structure prediction tools (PSIPRED, DSSP) and adds >ss_pred and >ss_conf lines to A3M alignments.

Requires PSIPRED or DSSP to be installed and in your PATH.

HMM Profile Tools

create_profile_from_hhm.pl

Convert HHM profiles to other formats.

create_profile_from_hhm.pl input.hhm output.prf

create_profile_from_hmmer.pl

Convert HMMER profiles to HH-suite format.

create_profile_from_hmmer.pl input.hmm output.hhm

Model Building

hhmakemodel.pl / hhmakemodel.py

Build 3D models from HH-suite alignments.

hhmakemodel.py -i alignment.hhr -ts template.pdb -o model.pdb

Model Building Workflow

Takes HHsearch/HHblits results (HHR file)
Reads template structure
Maps query sequence to template using alignment
Builds model by copying coordinates
Handles insertions and deletions

Alignment Processing

mergeali.pl

Merge multiple alignments into a single alignment.

mergeali.pl alignment1.a3m alignment2.a3m > merged.a3m

multithread.pl

Run HH-suite searches in parallel.

multithread.pl queries.fasta database 8 hhblits

Parameters:

Query file
Database name
Number of threads
Search program (hhblits/hhsearch)

Python Modules

a3m.py

Python module for A3M format handling (see A3M Tools for details).

import a3m

container = a3m.A3M_Container()
with open('input.a3m') as f:
    container.read_a3m(f)

hh_reader.py

Python module for reading HH-suite result files.

import hh_reader

results = hh_reader.read_result('output.hhr')
for hit in results:
    print(f"{hit.name}: {hit.probability}")

ffindex.py

Python interface to FFindex databases.

import ffindex

entries = ffindex.read_index('database.ffindex')
data = ffindex.read_data('database.ffdata')

for entry in entries:
    content = ffindex.read_entry(entry, data)

FFindex Utilities

While not included in the scripts directory, HH-suite uses the FFindex library for database management:

ffindex_build - Create FFindex database from files
ffindex_get - Extract entry from database
ffindex_modify - Modify database entries
ffindex_unpack - Extract all entries to files

# Build database
ffindex_build -s database.ffdata database.ffindex input_dir/

# Extract entry
ffindex_get database.ffdata database.ffindex query_name

HHPaths.pm

Perl module providing path configuration for HH-suite.

use HHPaths;

my $hhlib = $HHPaths::hhlib;    # HH-suite library path
my $hhbin = $HHPaths::hhbin;    # HH-suite binary path
my $hhscripts = $HHPaths::hhscripts;  # Scripts path

Align.pm

Perl module for alignment manipulation.

use Align;

my $ali = Align::read_a3m("input.a3m");
Align::remove_inserts($ali);
Align::write_fasta($ali, "output.fasta");

Validation Scripts

check_a3m.py

Validate A3M format files.

check_a3m.py input.a3m

Checks for:

Valid amino acid characters
Consistent match state counts
Proper consensus sequence
Valid secondary structure annotations

get_a3m_size.py

Report statistics about A3M files.

get_a3m_size.py input.a3m

Outputs:

Number of sequences
Number of match states
Average sequence length

Best Practices

Script Locations: Make sure the scripts directory is in your PATH:

export PATH="/path/to/hhsuite/scripts:$PATH"

Common Workflows

Format Conversion Pipeline

reformat.pl fas a3m input.fasta temp.a3m
addss.pl temp.a3m output.a3m
hhmake -i output.a3m -o output.hhm

Database Building Pipeline

# Convert sequences to A3M
for f in *.fasta; do
    reformat.pl fas a3m $f ${f%.fasta}.a3m
done

# Build database
hhsuitedb.py -ia3m . -o my_database

Batch Structure Conversion

for pdb in structures/*.pdb; do
    pdb2fasta.pl $pdb > sequences/$(basename $pdb .pdb).fasta
done

Getting Started

Core Tools

Utility Tools

Guides

Advanced

​Overview

​Format Conversion

​reformat.pl

​Supported Formats

​Usage

​Options

​Examples

​Database Building

​hhsuitedb.py

​Usage

​Options

​What It Does

​createdb.sh

​Sequence Processing

​splitfasta.pl

​pdbfilter.pl / pdbfilter.py

​PDB Processing

​pdb2fasta.pl

​cif2fasta.py

​renumberpdb.pl

​Secondary Structure

​addss.pl

​HMM Profile Tools

​create_profile_from_hhm.pl

​create_profile_from_hmmer.pl

​Model Building

​hhmakemodel.pl / hhmakemodel.py

​Alignment Processing

​mergeali.pl

​multithread.pl

​Python Modules

​a3m.py

​hh_reader.py

​ffindex.py

​FFindex Utilities

​HHPaths.pm

​Align.pm

​Validation Scripts

​check_a3m.py

​get_a3m_size.py

​Best Practices

​See Also

Build docs developers (and LLMs) love

Overview

Format Conversion

reformat.pl

Supported Formats

Usage

Options

Examples

Database Building

hhsuitedb.py

Usage

Options

What It Does

createdb.sh

Sequence Processing

splitfasta.pl

pdbfilter.pl / pdbfilter.py

PDB Processing

pdb2fasta.pl

cif2fasta.py

renumberpdb.pl

Secondary Structure

addss.pl

HMM Profile Tools

create_profile_from_hhm.pl

create_profile_from_hmmer.pl

Model Building

hhmakemodel.pl / hhmakemodel.py

Alignment Processing

mergeali.pl

multithread.pl

Python Modules

a3m.py

hh_reader.py

ffindex.py

FFindex Utilities

HHPaths.pm

Align.pm

Validation Scripts

check_a3m.py

get_a3m_size.py

Best Practices

See Also