Skip to main content

Overview

Selections let you target specific atoms for analysis. The selection language is inspired by VMD/CHARMM syntax but simplified. Selections compile to an immutable list of atom indices and are cached by the System for reuse.

Basic Usage

from warp_md import System

system = System.from_pdb("protein.pdb")

# Returns a Selection object
backbone = system.select("backbone")
print(backbone.indices)  # Array of atom indices

Selection Syntax

From crates/traj-core/src/selection.rs:244-281:

Keywords

name <atom_name>
Predicate
Select atoms by name (case-insensitive).
ca = system.select("name CA")
oxygens = system.select("name O")
resname <residue_name>
Predicate
Select atoms by residue name.
water = system.select("resname SOL")
alanine = system.select("resname ALA")
resid <N>
Predicate
Select atoms in residue with ID N.
res10 = system.select("resid 10")
resid <start>:<end>
Predicate
Select atoms in residue range (inclusive).
n_terminus = system.select("resid 1:20")
middle = system.select("resid 50:150")
chain <chain_id>
Predicate
Select atoms by chain identifier.
chain_a = system.select("chain A")
chain_b = system.select("chain B")
protein
Predicate
Select all protein atoms (standard amino acids).Matches residues: ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL, MSE, HSD, HSE, HSP.
prot = system.select("protein")
backbone
Predicate
Select protein backbone atoms: N, CA, C, O, OXT.
bb = system.select("backbone")

Boolean Operators

# Intersection
ca_chain_a = system.select("name CA and chain A")
protein_backbone = system.select("protein and backbone")

Parentheses

Group expressions for precedence:
# Select backbone atoms in chain A or B
sel = system.select("backbone and (chain A or chain B)")

# Select protein atoms that are NOT backbone
sidechain = system.select("protein and not backbone")

Examples from Tests

From crates/traj-core/src/selection.rs:438-463:
#[test]
fn selection_name() {
    let mut system = build_system();
    let sel = compile_selection("name CA", &mut system).unwrap();
    assert_eq!(sel.indices.as_slice(), &[1]);
}

#[test]
fn selection_resid_range() {
    let mut system = build_system();
    let sel = compile_selection("resid 1:2", &mut system).unwrap();
    assert_eq!(sel.indices.len(), 4);
}

#[test]
fn selection_protein_backbone() {
    let mut system = build_system();
    let sel = compile_selection("protein and backbone", &mut system).unwrap();
    assert_eq!(sel.indices.as_slice(), &[0, 1]);
}

Implementation Details

Parsing

From crates/traj-core/src/selection.rs:135-282:
  1. Lexer: Tokenizes input string
  2. Parser: Builds AST with precedence (OR → AND → NOT → PRIMARY)
  3. Evaluator: Walks AST to produce boolean mask
  4. Compiler: Converts mask to index array
pub fn compile_selection(expr: &str, system: &mut System) -> TrajResult<Selection> {
    let mut parser = Parser::new(expr, system)?;
    let ast = parser.parse()?;
    let mask = eval(&ast, system)?;
    let mut indices = Vec::new();
    for (i, &keep) in mask.iter().enumerate() {
        if keep {
            indices.push(i as u32);
        }
    }
    Ok(Selection {
        expr: expr.to_string(),
        indices: Arc::new(indices),
    })
}

Caching

From crates/traj-core/src/system.rs:58-69:
pub fn select(&mut self, expr: &str) -> TrajResult<Selection> {
    if let Some(sel) = self.selection_cache.get(expr) {
        return Ok(Selection {
            expr: sel.expr.clone(),
            indices: Arc::clone(&sel.indices),
        });
    }
    let compiled = compile_selection(expr, self)?;
    self.selection_cache.insert(expr.to_string(), compiled.clone());
    Ok(compiled)
}
Selections are cached at the System level. Re-using the same selection expression is free.

Selection Object

sel = system.select("backbone")

# Access indices
print(sel.indices)  # np.ndarray of uint32

# Original expression
print(sel.expr)  # "backbone"
From crates/traj-core/src/selection.rs:6-10:
pub struct Selection {
    pub expr: String,
    pub indices: Arc<Vec<u32>>,
}

Advanced Patterns

Selecting by Wildcard

Atom/residue names support wildcards:
# All hydrogens
hydrogens = system.select("name H*")

# All carbons starting with C (CA, CB, CG, etc.)
carbons = system.select("name C*")
From crates/traj-core/src/selection.rs:131-133:
fn is_ident_continue(c: char) -> bool {
    c.is_ascii_alphanumeric() || c == '_' || c == '.' || c == '*'
}

Complex Queries

# Active site: residues 50-60 in chain A, heavy atoms only
active_site = system.select(
    "chain A and resid 50:60 and not name H*"
)

# Protein or nucleic acids (if you had RNA/DNA support)
bio = system.select("protein or resname GUA or resname CYT")

Using Selections with Plans

All analysis Plans accept a Selection as the first argument:
from warp_md import RgPlan, RmsdPlan, DistancePlan

system = System.from_pdb("protein.pdb")
backbone = system.select("backbone")
ca_atoms = system.select("name CA")

# Radius of gyration for backbone
rg_plan = RgPlan(backbone, mass_weighted=True)

# RMSD for CA atoms
rmsd_plan = RmsdPlan(ca_atoms, reference="topology", align=True)

# Distance between two selections
sel_a = system.select("resid 10 and name CA")
sel_b = system.select("resid 50 and name CA")
dist_plan = DistancePlan(sel_a, sel_b)

Limitations

  • No arithmetic: resid 10+5 is invalid
  • No regex: Use wildcards (*) only
  • No distance-based selections: within 5 of resid 10 not supported
  • Case-insensitive matching: Name CA and name ca are equivalent

See Also

Build docs developers (and LLMs) love