Skip to main content

Synopsis

cstranslate -i <infile> [options]

Description

CStranslate translates a sequence or alignment into an abstract state alphabet (AS219). This tool is part of the CS-BLAST package and is used for context-specific sequence representation.

Required Parameters

-i
file
required
Input file with alignment or sequence

Output Options

-o
file
Output file for generated abstract state sequence (default: <infile>.as)
-a
file
Append generated abstract state sequence to this file
-O
string
default:"seq"
Output format:
  • seq: abstract state sequence
  • prf: abstract state profile

Input Options

-I
string
default:"auto"
Input format: prf, seq, fas, a2m, a3m, or ca3m (default: auto-detect from file extension)
-M
integer
Match column assignment for FASTA alignments:
  • -1 (default): make columns with residue in first sequence match columns
  • [0-100]: make all FASTA columns with less than X% gaps match columns

Pseudocount Options

-A
file
default:"internal"
Abstract state alphabet consisting of exactly 219 states
-D
file
default:"internal"
Add context-specific pseudocounts using given context-data
-x
float
default:"0.90"
Pseudocount admix for context-specific pseudocounts (range: 0-1)
-c
float
default:"12.0"
Constant in pseudocount calculation for alignments (range: 0-inf)
-w
float
default:"1000.0"
Weight of abstract state column in emission calculation (range: 0-inf)

FFindex Options

-f
flag
Enable FFindex mode: read from -i <ffindex>, write to -o <ffindex> (do not include _ca3m suffix for ca3m informat). Enables OpenMP if available.

Other Options

-v
boolean
default:"true"
Verbose mode: show progress and results

Examples

Translate sequence to abstract states

cstranslate -i protein.seq -o protein.as

Translate alignment with custom pseudocount parameters

cstranslate -i alignment.a3m -o output.as -x 0.8 -c 10.0

Process FFindex database

cstranslate -i input_db -o output_db -f -I ca3m

Output Format

The tool produces abstract state sequences in AS219 alphabet, which represents protein sequences using 219 context-dependent states. When verbose mode is enabled, it shows:
  • Position numbers
  • Consensus sequence
  • Match symbols indicating confidence
  • AS219 sequence
  • Confidence values (0-9)

Exit Codes

  • 0: Success
  • 1: Error reading input file
  • 2: Invalid parameters

See Also

Notes

CStranslate is typically used as a preprocessing step for HHblits searches. The AS219 alphabet provides a compressed representation that enables fast prefiltering while maintaining sensitivity.

Build docs developers (and LLMs) love