Overview
This guide walks you through running a complete HH-suite workflow: from a single protein sequence to identifying homologous structures in the PDB database.Make sure you have installed HH-suite before proceeding.
Step 1: Prepare a Query Sequence
Create a file calledquery.fasta with your protein sequence in FASTA format:
query.fasta
Step 2: Download a Database
For this quickstart, we’ll use the PDB70 database (a filtered version of protein structures in the PDB):Alternative: Use a Smaller Test Database
Alternative: Use a Smaller Test Database
For quick testing, you can use the smaller SCOP database:
Step 3: Run Your First Search
Now let’s run HHblits to search for homologous sequences:Command Breakdown
-i query.fasta- Input query sequence-o results.hhr- Output file with search results-oa3m results.a3m- Output multiple sequence alignment in A3M format-n 3- Number of search iterations (default: 2)-d pdb70- Database basename (without file extension)
The first search iteration may take 30 seconds to a few minutes depending on your CPU and database size.
Step 4: Examine the Results
Openresults.hhr to see the homology search results:
Understanding the Output
The results file contains:Hit List
Ranked list of homologous proteins with:
- E-value: Statistical significance (lower is better; < 0.001 is significant)
- Probability: Likelihood of true homology (0-100%)
- P-value: Probability of seeing this score by chance
- Score: Raw alignment score
- Aligned columns: Length of the alignment
Example Output Interpretation
Step 5: Build a Multiple Sequence Alignment
Use the generated MSA for downstream analyses:The
.a3m format is HH-suite’s compressed alignment format that preserves insertion/deletion information.Alternative Workflows
Using hhsearch Instead of hhblits
If you already have a multiple sequence alignment (MSA), usehhsearch for more sensitive profile-profile searches:
Searching Against Sequence Databases
To search against large sequence databases like Uniclust30 or BFD:Common Options
Customize your search with these frequently used parameters:Control Sensitivity
Control Sensitivity
-e 0.001- E-value threshold (default: 0.001)-n 3- More iterations = more sensitivity
Speed Up Searches
Speed Up Searches
-cpu 8- Use 8 threads (default: 2)-cov 50- Minimum coverage of query (default: 20%)
Filter Results
Filter Results
-E 0.001- Maximum E-value to report-p 50- Minimum probability threshold (0-100)
Output Formats
Output Formats
Troubleshooting
Error: Could not open database
Error: Could not open database
Make sure:
- Database files are extracted in the current directory
- You’re using the basename without extension:
-d pdb70not-d pdb70.ff - Required database files exist (
.ffdata,.ffindex,.cs219.*)
Search is very slow
Search is very slow
- Use fewer iterations:
-n 1instead of-n 3 - Enable more threads:
-cpu 8 - Use the AVX2 build if your CPU supports it
- Try a smaller database for testing (e.g., SCOP instead of Uniclust)
No significant hits found
No significant hits found
- Increase sensitivity:
-n 3 -e 1(more iterations, higher E-value) - Check if your query is too short (< 30 residues)
- Try a different database (e.g., Uniclust30 for more diversity)
- Your protein may be truly novel or from an undersampled family
Out of memory errors
Out of memory errors
- Reduce number of threads:
-cpu 2 - Search against smaller database subsets
- Increase system swap space
- Use a machine with more RAM (16+ GB recommended for large databases)
Next Steps
Database Guide
Download and set up additional databases
HHblits Reference
Complete documentation for hhblits options
HHsearch Guide
Profile-profile searches for maximum sensitivity
Output Formats
Understanding and parsing HH-suite results
Example Workflow Script
Here’s a complete bash script for running a protein homology search:run_hhblits.sh