HHM Format

Overview

HHM is the Hidden Markov Model format used by HH-suite. It contains profile information derived from multiple sequence alignments, including amino acid frequencies, transition probabilities, and optional secondary structure predictions.

Format Specification

Header Section

The file begins with metadata:

HHsearch 1.6
NAME  sp|Q5VUD6|FA69B_HUMAN Protein FAM69B OS=Homo sapiens GN=FAM69B PE=2 SV=3
FAM   
FILE  query
COM   hhmake -i /path/to/query.a3m 
DATE  Wed Jan  4 15:14:55 2012
LENG  431 match states, 431 columns in multiple alignment
FILT  149 out of 270 sequences passed filter (-id 90 -cov 0 -qid 0 -qsc -20.00 -diff 100)
NEFF  5.2

Header Fields

HHsearch: Version identifier
NAME: Protein name and description
FAM: Family information (optional)
FILE: Base filename
COM: Command used to generate the HMM
DATE: Creation timestamp
LENG: Number of match states and alignment columns
FILT: Filter statistics
NEFF: Effective number of sequences (diversity measure)

Sequence Section

Consensus and representative sequences:

SEQ
>Consensus
xxxxxxxxxxxxxxxxxxxxxxrxxxxxxxxxxxxwxxxxxxsxxxyxxyssxselcrxxxcxxxiCxxYxxGxisGxlCxxLCxxxxlxxxxClxxxxx
>sp|Q5VUD6|FA69B_HUMAN Protein FAM69B
MRRLRRLAHLVLFCPFSKRLQGRLPGLRVRCIFLAWLGVFAGSWLVYVHYSSYSERCRGHVCQVVICDQYRKGIISGSVCQDLCELHMVEWRTCLSVAPG

Sequences are shown in blocks, typically including:

Consensus sequence (derived from alignment)
Representative sequences from the MSA

NULL Model

Background amino acid frequencies:

NULL   3706	5728	4211	4064	4839	3729	4763	4308	4069	3323	5509	4640	4464	4937	4285	4423	3815	3783	6325	4665

Values for: A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y

HMM Header Line

HMM    A	C	D	E	F	G	H	I	K	L	M	N	P	Q	R	S	T	V	W	Y
       M->M	M->I	M->D	I->M	I->I	D->M	D->D	Neff	Neff_I	Neff_D

Match State Entries

For each match state:

M 1    2443	*	*	*	*	*	3455	*	*	*	1095	*	1962	*	*	*	*	*	*	*	1
       0	*	*	*	*	*	*	1695	0	0

First line:

State type (M for match)
Position number
Emission probabilities for 20 amino acids (negative log probabilities, * = -infinity)
Neff value

Second line:

Transition probabilities (M->M, M->I, M->D, I->M, I->I, D->M, D->D)
Neff values for insert and delete states

Emission Probabilities

Emission values are stored as:

Negative log-scale probabilities (base unclear, often -log2)
* represents probability zero (negative infinity in log space)
Smaller numbers indicate higher probability

Transition Probabilities

Transitions between states:

M->M: Match to match
M->I: Match to insert
M->D: Match to delete
I->M: Insert to match
I->I: Insert to insert
D->M: Delete to match
D->D: Delete to delete

Example Match State

R 2    *	*	*	*	*	*	*	*	*	*	*	*	*	2443	293	*	*	*	*	*	2
       0	*	*	*	*	*	*	1695	0	0

This represents:

Position 2
Amino acid R (Arginine)
High probability for R (value 293) and Q (value 2443)
Neff = 2 (low diversity at this position)

Creating HHM Files

From Alignment

hhmake -i alignment.a3m -o model.hhm

With Custom Name

hhmake -i alignment.a3m -o model.hhm -name MyProtein

With Pseudocounts

hhmake -i alignment.a3m -o model.hhm -pc_hhm_contxt_a 0.9

Using HHM Files

As Query

hhsearch -i query.hhm -d database

As Template

hhalign -i query.a3m -t template.hhm

Binary Format

HH-suite can also use a binary HHM format (.hhm.bin) for faster loading:

More compact storage
Faster parsing
Generated automatically by some tools

Best Practices

Building Quality HMMs

Diverse alignments: Use sequences with varied identity (filter with hhfilter)
Sufficient sequences: Aim for Neff > 4 for good coverage
Quality filtering: Remove low-coverage sequences
Pseudocounts: Use context-specific pseudocounts for better profiles

Neff Values

Neff < 2: Low diversity, may need more sequences
Neff 4-8: Good diversity for most purposes
Neff > 10: High diversity, excellent for sensitive searches

File Size Considerations

HHM files are typically:

Larger than input alignments (due to probability matrices)
Smaller than storing full alignments
Text format: ~5-10 KB per 100 residues
Binary format: ~50% smaller

Command Reference

File Formats

Overview

Format Specification

Header Section

Header Fields

Sequence Section

NULL Model

HMM Header Line

Match State Entries

Emission Probabilities

Transition Probabilities

Example Match State

Creating HHM Files

From Alignment

With Custom Name

With Pseudocounts

Using HHM Files

As Query

As Template

Binary Format

Best Practices

Building Quality HMMs

Neff Values

File Size Considerations

See Also

Build docs developers (and LLMs) love

Command Reference

File Formats

​Overview

​Format Specification

​Header Section

​Header Fields

​Sequence Section

​NULL Model

​HMM Header Line

​Match State Entries

​Emission Probabilities

​Transition Probabilities

​Example Match State

​Creating HHM Files

​From Alignment

​With Custom Name

​With Pseudocounts

​Using HHM Files

​As Query

​As Template

​Binary Format

​Best Practices

​Building Quality HMMs

​Neff Values

​File Size Considerations

​See Also

Build docs developers (and LLMs) love

Overview

Format Specification

Header Section

Header Fields

Sequence Section

NULL Model

HMM Header Line

Match State Entries

Emission Probabilities

Transition Probabilities

Example Match State

Creating HHM Files

From Alignment

With Custom Name

With Pseudocounts

Using HHM Files

As Query

As Template

Binary Format

Best Practices

Building Quality HMMs

Neff Values

File Size Considerations

See Also