Overview
Selections let you target specific atoms for analysis. The selection language is inspired by VMD/CHARMM syntax but simplified.
Selections compile to an immutable list of atom indices and are cached by the System for reuse.
Basic Usage
from warp_md import System
system = System.from_pdb("protein.pdb")
# Returns a Selection object
backbone = system.select("backbone")
print(backbone.indices) # Array of atom indices
Selection Syntax
From crates/traj-core/src/selection.rs:244-281:
Keywords
Select atoms by name (case-insensitive).ca = system.select("name CA")
oxygens = system.select("name O")
Select atoms by residue name.water = system.select("resname SOL")
alanine = system.select("resname ALA")
Select atoms in residue with ID N.res10 = system.select("resid 10")
Select atoms in residue range (inclusive).n_terminus = system.select("resid 1:20")
middle = system.select("resid 50:150")
Select atoms by chain identifier.chain_a = system.select("chain A")
chain_b = system.select("chain B")
Select all protein atoms (standard amino acids).Matches residues: ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE, LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL, MSE, HSD, HSE, HSP.prot = system.select("protein")
Select protein backbone atoms: N, CA, C, O, OXT.bb = system.select("backbone")
Boolean Operators
# Intersection
ca_chain_a = system.select("name CA and chain A")
protein_backbone = system.select("protein and backbone")
Parentheses
Group expressions for precedence:
# Select backbone atoms in chain A or B
sel = system.select("backbone and (chain A or chain B)")
# Select protein atoms that are NOT backbone
sidechain = system.select("protein and not backbone")
Examples from Tests
From crates/traj-core/src/selection.rs:438-463:
#[test]
fn selection_name() {
let mut system = build_system();
let sel = compile_selection("name CA", &mut system).unwrap();
assert_eq!(sel.indices.as_slice(), &[1]);
}
#[test]
fn selection_resid_range() {
let mut system = build_system();
let sel = compile_selection("resid 1:2", &mut system).unwrap();
assert_eq!(sel.indices.len(), 4);
}
#[test]
fn selection_protein_backbone() {
let mut system = build_system();
let sel = compile_selection("protein and backbone", &mut system).unwrap();
assert_eq!(sel.indices.as_slice(), &[0, 1]);
}
Implementation Details
Parsing
From crates/traj-core/src/selection.rs:135-282:
- Lexer: Tokenizes input string
- Parser: Builds AST with precedence (OR → AND → NOT → PRIMARY)
- Evaluator: Walks AST to produce boolean mask
- Compiler: Converts mask to index array
pub fn compile_selection(expr: &str, system: &mut System) -> TrajResult<Selection> {
let mut parser = Parser::new(expr, system)?;
let ast = parser.parse()?;
let mask = eval(&ast, system)?;
let mut indices = Vec::new();
for (i, &keep) in mask.iter().enumerate() {
if keep {
indices.push(i as u32);
}
}
Ok(Selection {
expr: expr.to_string(),
indices: Arc::new(indices),
})
}
Caching
From crates/traj-core/src/system.rs:58-69:
pub fn select(&mut self, expr: &str) -> TrajResult<Selection> {
if let Some(sel) = self.selection_cache.get(expr) {
return Ok(Selection {
expr: sel.expr.clone(),
indices: Arc::clone(&sel.indices),
});
}
let compiled = compile_selection(expr, self)?;
self.selection_cache.insert(expr.to_string(), compiled.clone());
Ok(compiled)
}
Selections are cached at the System level. Re-using the same selection expression is free.
Selection Object
sel = system.select("backbone")
# Access indices
print(sel.indices) # np.ndarray of uint32
# Original expression
print(sel.expr) # "backbone"
From crates/traj-core/src/selection.rs:6-10:
pub struct Selection {
pub expr: String,
pub indices: Arc<Vec<u32>>,
}
Advanced Patterns
Selecting by Wildcard
Atom/residue names support wildcards:
# All hydrogens
hydrogens = system.select("name H*")
# All carbons starting with C (CA, CB, CG, etc.)
carbons = system.select("name C*")
From crates/traj-core/src/selection.rs:131-133:
fn is_ident_continue(c: char) -> bool {
c.is_ascii_alphanumeric() || c == '_' || c == '.' || c == '*'
}
Complex Queries
# Active site: residues 50-60 in chain A, heavy atoms only
active_site = system.select(
"chain A and resid 50:60 and not name H*"
)
# Protein or nucleic acids (if you had RNA/DNA support)
bio = system.select("protein or resname GUA or resname CYT")
Using Selections with Plans
All analysis Plans accept a Selection as the first argument:
from warp_md import RgPlan, RmsdPlan, DistancePlan
system = System.from_pdb("protein.pdb")
backbone = system.select("backbone")
ca_atoms = system.select("name CA")
# Radius of gyration for backbone
rg_plan = RgPlan(backbone, mass_weighted=True)
# RMSD for CA atoms
rmsd_plan = RmsdPlan(ca_atoms, reference="topology", align=True)
# Distance between two selections
sel_a = system.select("resid 10 and name CA")
sel_b = system.select("resid 50 and name CA")
dist_plan = DistancePlan(sel_a, sel_b)
Limitations
- No arithmetic:
resid 10+5 is invalid
- No regex: Use wildcards (
*) only
- No distance-based selections:
within 5 of resid 10 not supported
- Case-insensitive matching:
Name CA and name ca are equivalent
See Also