Skip to main content

Overview

This guide walks you through running a complete HH-suite workflow: from a single protein sequence to identifying homologous structures in the PDB database.
Make sure you have installed HH-suite before proceeding.

Step 1: Prepare a Query Sequence

Create a file called query.fasta with your protein sequence in FASTA format:
query.fasta
>sp|Q5VUD6|FA69B_HUMAN Protein FAM69B
MRRLRRLAHLVLFCPFSKRLQGRLPGLRVRCIFLAWLGVFAGSWLVYVHYSSYSERCRGHVCQVVI
CDQYRKGIISGSVCQDLCELHMVEWRTCLSVAPGQQVYSGLWRDKDVTIKCGIEETLDSKARSDAA
PRRELVLFDKPTRGTSIKEFREMTLSFLKANLGDLPSLPALVGQVLLMADFNKDNRVSLAEAKSV
WALLQRNEFLLLLSLQEKEHASRLLGYCGDLYLTEGVPHGAWHAAALPPLLRPLLPPALQGALQQ
WLGPAWPWRAKIAIGLLEFVEELFHGSYGTFYMCETTLANVGYTATYDFKMADLQQVAPEATVRR
FLQGRRCEHSTDCTYGRDCRAPCDRLMRQCKGDLIQPNLAKVCALLRGYLLPGAPADLREELGTQ
LRTCTTLSGLASQVEAHHSLVLSHLKTLLWKKISNTKYS
You can use any protein sequence in FASTA format. For testing, the example above is from the query.a3m file included with HH-suite.

Step 2: Download a Database

For this quickstart, we’ll use the PDB70 database (a filtered version of protein structures in the PDB):
# Download PDB70 database
wget http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/pdb70_from_mmcif_latest.tar.gz

# Extract the database
tar xvfz pdb70_from_mmcif_latest.tar.gz
Database files are large (PDB70 is ~2-3 GB compressed). Make sure you have sufficient disk space and a stable internet connection.
For quick testing, you can use the smaller SCOP database:
wget http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/scop70_1.75.tar.gz
tar xvfz scop70_1.75.tar.gz
Now let’s run HHblits to search for homologous sequences:
hhblits -i query.fasta -o results.hhr -n 1 -d pdb70

Command Breakdown

  • -i query.fasta - Input query sequence
  • -o results.hhr - Output file with search results
  • -oa3m results.a3m - Output multiple sequence alignment in A3M format
  • -n 3 - Number of search iterations (default: 2)
  • -d pdb70 - Database basename (without file extension)
The first search iteration may take 30 seconds to a few minutes depending on your CPU and database size.

Step 4: Examine the Results

Open results.hhr to see the homology search results:
less results.hhr
# or
cat results.hhr | head -100

Understanding the Output

The results file contains:
1

Summary Statistics

Header information about the query and search parameters
2

Hit List

Ranked list of homologous proteins with:
  • E-value: Statistical significance (lower is better; < 0.001 is significant)
  • Probability: Likelihood of true homology (0-100%)
  • P-value: Probability of seeing this score by chance
  • Score: Raw alignment score
  • Aligned columns: Length of the alignment
3

Detailed Alignments

Pairwise alignments between query and each hit, showing:
  • Sequence alignment
  • Secondary structure predictions
  • Confidence scores

Example Output Interpretation

No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
 1 6xyz_A Protein FAM69B; HUMAN    99.9   1E-45   1E-50  312.5  25.2  389    1-389    15-403 (410)
 2 3abc_B FAM69 family member       95.2   0.012   1E-07   89.3  12.1  234   45-278    89-322 (450)
  • Prob > 95% and E-value < 0.001: Strong evidence of homology
  • Prob 50-95%: Possible homology, check alignment carefully
  • Prob < 50%: Likely not related

Step 5: Build a Multiple Sequence Alignment

Use the generated MSA for downstream analyses:
# View the alignment
less results.a3m

# Convert to other formats if needed
reformat.pl a3m fas results.a3m results.fasta
reformat.pl a3m sto results.a3m results.stockholm
The .a3m format is HH-suite’s compressed alignment format that preserves insertion/deletion information.

Alternative Workflows

Using hhsearch Instead of hhblits

If you already have a multiple sequence alignment (MSA), use hhsearch for more sensitive profile-profile searches:
1

Build an HMM Profile

hhmake -i results.a3m -o query.hhm
2

Search Against a Profile Database

hhsearch -i query.hhm -d pdb70 -o hhsearch_results.hhr

Searching Against Sequence Databases

To search against large sequence databases like Uniclust30 or BFD:
# Download Uniclust30 (note: this is a large file ~17 GB)
wget http://wwwuser.gwdg.de/~compbiol/uniclust/current_release/uniclust30_2018_08_hhsuite.tar.gz
tar xvfz uniclust30_2018_08_hhsuite.tar.gz

# Run iterative search
hhblits -i query.fasta -o results.hhr -oa3m results.a3m -n 3 -d uniclust30_2018_08/uniclust30_2018_08

Common Options

Customize your search with these frequently used parameters:
# More sensitive (slower)
hhblits -i query.fasta -d database -o results.hhr -n 3 -e 0.001

# Less sensitive (faster)
hhblits -i query.fasta -d database -o results.hhr -n 1 -e 1
  • -e 0.001 - E-value threshold (default: 0.001)
  • -n 3 - More iterations = more sensitivity
# Use multiple CPU cores
hhblits -i query.fasta -d database -o results.hhr -cpu 8

# Reduce coverage requirement
hhblits -i query.fasta -d database -o results.hhr -cov 50
  • -cpu 8 - Use 8 threads (default: 2)
  • -cov 50 - Minimum coverage of query (default: 20%)
# Only report significant hits
hhblits -i query.fasta -d database -o results.hhr -E 0.001 -p 50
  • -E 0.001 - Maximum E-value to report
  • -p 50 - Minimum probability threshold (0-100)
# Multiple output options
hhblits -i query.fasta -d database \
  -o results.hhr \      # HH-suite format
  -oa3m results.a3m \   # Multiple sequence alignment
  -opsi results.psi \   # PSI-BLAST format
  -ohhm results.hhm     # HMM profile

Troubleshooting

Make sure:
  • Database files are extracted in the current directory
  • You’re using the basename without extension: -d pdb70 not -d pdb70.ff
  • Required database files exist (.ffdata, .ffindex, .cs219.*)
  • Use fewer iterations: -n 1 instead of -n 3
  • Enable more threads: -cpu 8
  • Use the AVX2 build if your CPU supports it
  • Try a smaller database for testing (e.g., SCOP instead of Uniclust)
  • Increase sensitivity: -n 3 -e 1 (more iterations, higher E-value)
  • Check if your query is too short (< 30 residues)
  • Try a different database (e.g., Uniclust30 for more diversity)
  • Your protein may be truly novel or from an undersampled family
  • Reduce number of threads: -cpu 2
  • Search against smaller database subsets
  • Increase system swap space
  • Use a machine with more RAM (16+ GB recommended for large databases)

Next Steps

Database Guide

Download and set up additional databases

HHblits Reference

Complete documentation for hhblits options

HHsearch Guide

Profile-profile searches for maximum sensitivity

Output Formats

Understanding and parsing HH-suite results

Example Workflow Script

Here’s a complete bash script for running a protein homology search:
run_hhblits.sh
#!/bin/bash

# Configuration
QUERY="query.fasta"
DATABASE="pdb70"
OUTPUT_DIR="hhblits_results"
THREADS=4

# Create output directory
mkdir -p $OUTPUT_DIR

# Run HHblits search
echo "Running HHblits search..."
hhblits \
  -i $QUERY \
  -d $DATABASE \
  -o $OUTPUT_DIR/results.hhr \
  -oa3m $OUTPUT_DIR/results.a3m \
  -n 3 \
  -cpu $THREADS \
  -e 0.001 \
  -v 2

echo "Search complete! Results saved to $OUTPUT_DIR/"

# Generate summary statistics
echo "\nTop 10 hits:"
grep "^>" $OUTPUT_DIR/results.hhr | head -10
Make it executable and run:
chmod +x run_hhblits.sh
./run_hhblits.sh

Build docs developers (and LLMs) love