Quick Start Guide

Overview

This guide walks you through running a complete HH-suite workflow: from a single protein sequence to identifying homologous structures in the PDB database.

Make sure you have installed HH-suite before proceeding.

Step 1: Prepare a Query Sequence

Create a file called query.fasta with your protein sequence in FASTA format:

query.fasta

>sp|Q5VUD6|FA69B_HUMAN Protein FAM69B
MRRLRRLAHLVLFCPFSKRLQGRLPGLRVRCIFLAWLGVFAGSWLVYVHYSSYSERCRGHVCQVVI
CDQYRKGIISGSVCQDLCELHMVEWRTCLSVAPGQQVYSGLWRDKDVTIKCGIEETLDSKARSDAA
PRRELVLFDKPTRGTSIKEFREMTLSFLKANLGDLPSLPALVGQVLLMADFNKDNRVSLAEAKSV
WALLQRNEFLLLLSLQEKEHASRLLGYCGDLYLTEGVPHGAWHAAALPPLLRPLLPPALQGALQQ
WLGPAWPWRAKIAIGLLEFVEELFHGSYGTFYMCETTLANVGYTATYDFKMADLQQVAPEATVRR
FLQGRRCEHSTDCTYGRDCRAPCDRLMRQCKGDLIQPNLAKVCALLRGYLLPGAPADLREELGTQ
LRTCTTLSGLASQVEAHHSLVLSHLKTLLWKKISNTKYS

You can use any protein sequence in FASTA format. For testing, the example above is from the query.a3m file included with HH-suite.

Step 2: Download a Database

For this quickstart, we’ll use the PDB70 database (a filtered version of protein structures in the PDB):

# Download PDB70 database
wget http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/pdb70_from_mmcif_latest.tar.gz

# Extract the database
tar xvfz pdb70_from_mmcif_latest.tar.gz

Database files are large (PDB70 is ~2-3 GB compressed). Make sure you have sufficient disk space and a stable internet connection.

Alternative: Use a Smaller Test Database

For quick testing, you can use the smaller SCOP database:

wget http://wwwuser.gwdg.de/~compbiol/data/hhsuite/databases/hhsuite_dbs/scop70_1.75.tar.gz
tar xvfz scop70_1.75.tar.gz

Step 3: Run Your First Search

Now let’s run HHblits to search for homologous sequences:

hhblits -i query.fasta -o results.hhr -n 1 -d pdb70

Command Breakdown

-i query.fasta - Input query sequence
-o results.hhr - Output file with search results
-oa3m results.a3m - Output multiple sequence alignment in A3M format
-n 3 - Number of search iterations (default: 2)
-d pdb70 - Database basename (without file extension)

The first search iteration may take 30 seconds to a few minutes depending on your CPU and database size.

Step 4: Examine the Results

Open results.hhr to see the homology search results:

less results.hhr
# or
cat results.hhr | head -100

Understanding the Output

The results file contains:

Summary Statistics

Header information about the query and search parameters

Hit List

Ranked list of homologous proteins with:

E-value: Statistical significance (lower is better; < 0.001 is significant)
Probability: Likelihood of true homology (0-100%)
P-value: Probability of seeing this score by chance
Score: Raw alignment score
Aligned columns: Length of the alignment

Detailed Alignments

Pairwise alignments between query and each hit, showing:

Sequence alignment
Secondary structure predictions
Confidence scores

Example Output Interpretation

No Hit                             Prob E-value P-value  Score    SS Cols Query HMM  Template HMM
 1 6xyz_A Protein FAM69B; HUMAN    99.9   1E-45   1E-50  312.5  25.2  389    1-389    15-403 (410)
 2 3abc_B FAM69 family member       95.2   0.012   1E-07   89.3  12.1  234   45-278    89-322 (450)

Prob > 95% and E-value < 0.001: Strong evidence of homology
Prob 50-95%: Possible homology, check alignment carefully
Prob < 50%: Likely not related

Step 5: Build a Multiple Sequence Alignment

Use the generated MSA for downstream analyses:

# View the alignment
less results.a3m

# Convert to other formats if needed
reformat.pl a3m fas results.a3m results.fasta
reformat.pl a3m sto results.a3m results.stockholm

The .a3m format is HH-suite’s compressed alignment format that preserves insertion/deletion information.

Alternative Workflows

Using hhsearch Instead of hhblits

If you already have a multiple sequence alignment (MSA), use hhsearch for more sensitive profile-profile searches:

Build an HMM Profile

hhmake -i results.a3m -o query.hhm

Search Against a Profile Database

hhsearch -i query.hhm -d pdb70 -o hhsearch_results.hhr

Searching Against Sequence Databases

To search against large sequence databases like Uniclust30 or BFD:

# Download Uniclust30 (note: this is a large file ~17 GB)
wget http://wwwuser.gwdg.de/~compbiol/uniclust/current_release/uniclust30_2018_08_hhsuite.tar.gz
tar xvfz uniclust30_2018_08_hhsuite.tar.gz

# Run iterative search
hhblits -i query.fasta -o results.hhr -oa3m results.a3m -n 3 -d uniclust30_2018_08/uniclust30_2018_08

Common Options

Customize your search with these frequently used parameters:

Control Sensitivity

# More sensitive (slower)
hhblits -i query.fasta -d database -o results.hhr -n 3 -e 0.001

# Less sensitive (faster)
hhblits -i query.fasta -d database -o results.hhr -n 1 -e 1

-e 0.001 - E-value threshold (default: 0.001)
-n 3 - More iterations = more sensitivity

Speed Up Searches

# Use multiple CPU cores
hhblits -i query.fasta -d database -o results.hhr -cpu 8

# Reduce coverage requirement
hhblits -i query.fasta -d database -o results.hhr -cov 50

-cpu 8 - Use 8 threads (default: 2)
-cov 50 - Minimum coverage of query (default: 20%)

Filter Results

# Only report significant hits
hhblits -i query.fasta -d database -o results.hhr -E 0.001 -p 50

-E 0.001 - Maximum E-value to report
-p 50 - Minimum probability threshold (0-100)

Output Formats

# Multiple output options
hhblits -i query.fasta -d database \
  -o results.hhr \      # HH-suite format
  -oa3m results.a3m \   # Multiple sequence alignment
  -opsi results.psi \   # PSI-BLAST format
  -ohhm results.hhm     # HMM profile

Troubleshooting

Error: Could not open database

Make sure:

Database files are extracted in the current directory
You’re using the basename without extension: -d pdb70 not -d pdb70.ff
Required database files exist (.ffdata, .ffindex, .cs219.*)

Search is very slow

Use fewer iterations: -n 1 instead of -n 3
Enable more threads: -cpu 8
Use the AVX2 build if your CPU supports it
Try a smaller database for testing (e.g., SCOP instead of Uniclust)

No significant hits found

Increase sensitivity: -n 3 -e 1 (more iterations, higher E-value)
Check if your query is too short (< 30 residues)
Try a different database (e.g., Uniclust30 for more diversity)
Your protein may be truly novel or from an undersampled family

Out of memory errors

Reduce number of threads: -cpu 2
Search against smaller database subsets
Increase system swap space
Use a machine with more RAM (16+ GB recommended for large databases)

Next Steps

Database Guide

Download and set up additional databases

HHblits Reference

Complete documentation for hhblits options

HHsearch Guide

Profile-profile searches for maximum sensitivity

Output Formats

Understanding and parsing HH-suite results

Example Workflow Script

Here’s a complete bash script for running a protein homology search:

run_hhblits.sh

#!/bin/bash

# Configuration
QUERY="query.fasta"
DATABASE="pdb70"
OUTPUT_DIR="hhblits_results"
THREADS=4

# Create output directory
mkdir -p $OUTPUT_DIR

# Run HHblits search
echo "Running HHblits search..."
hhblits \
  -i $QUERY \
  -d $DATABASE \
  -o $OUTPUT_DIR/results.hhr \
  -oa3m $OUTPUT_DIR/results.a3m \
  -n 3 \
  -cpu $THREADS \
  -e 0.001 \
  -v 2

echo "Search complete! Results saved to $OUTPUT_DIR/"

# Generate summary statistics
echo "\nTop 10 hits:"
grep "^>" $OUTPUT_DIR/results.hhr | head -10

Make it executable and run:

chmod +x run_hhblits.sh
./run_hhblits.sh

Getting Started

Core Tools

Utility Tools

Guides

Advanced

Quick Start Guide

Overview

Step 1: Prepare a Query Sequence

Step 2: Download a Database

Step 3: Run Your First Search

Command Breakdown

Step 4: Examine the Results

Understanding the Output

Example Output Interpretation

Step 5: Build a Multiple Sequence Alignment

Alternative Workflows

Using hhsearch Instead of hhblits

Searching Against Sequence Databases

Common Options

Troubleshooting

Next Steps

Database Guide

HHblits Reference

HHsearch Guide

Output Formats

Example Workflow Script

Build docs developers (and LLMs) love

Getting Started

Core Tools

Utility Tools

Guides

Advanced

​Overview

​Step 1: Prepare a Query Sequence

​Step 2: Download a Database

​Step 3: Run Your First Search

​Command Breakdown

​Step 4: Examine the Results

​Understanding the Output

​Example Output Interpretation

​Step 5: Build a Multiple Sequence Alignment

​Alternative Workflows

​Using hhsearch Instead of hhblits

​Searching Against Sequence Databases

​Common Options

​Troubleshooting

​Next Steps

Database Guide

HHblits Reference

HHsearch Guide

Output Formats

​Example Workflow Script

Build docs developers (and LLMs) love

Overview

Step 1: Prepare a Query Sequence

Step 2: Download a Database

Step 3: Run Your First Search

Command Breakdown

Step 4: Examine the Results

Understanding the Output

Example Output Interpretation

Step 5: Build a Multiple Sequence Alignment

Alternative Workflows

Using hhsearch Instead of hhblits

Searching Against Sequence Databases

Common Options

Troubleshooting

Next Steps

Example Workflow Script