Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ispras/casr/llms.txt
Use this file to discover all available pages before exploring further.
casr-cluster provides a suite of operations for managing large collections of .casrep
files. It deduplicates reports by filtered stack trace hashing, groups similar crashes
into numbered cluster directories, merges new findings into existing sets, updates live
cluster structures with incremental results, and computes set differences for CI
pipelines. All operations are parallelised across available CPU cores.
Synopsis
Options
Compute and print the similarity score (0.0–1.0) between two
.casrep files based on
their filtered stack traces.Cluster all
.casrep files in INPUT_DIR. If OUTPUT_DIR is also provided, cluster
subdirectories (cl1, cl2, …) are created inside OUTPUT_DIR and the original files
in INPUT_DIR are left untouched. If only one directory is given, clusters are created
in-place and existing reports in the directory are preserved.After clustering, keep only one report per unique crash line within each cluster
directory. Can also be controlled with the
CASR_CLUSTER_UNIQUE_CRASHLINE environment
variable (see below).Remove duplicate reports. If two directories are provided, unique reports are copied
to
OUTPUT_DIR. If only one directory is provided, duplicate reports are deleted
in place.Merge unique reports from
INPUT_DIR into OUTPUT_DIR. Only reports whose stack
traces are not already represented in OUTPUT_DIR are copied. Useful for accumulating
new fuzzer findings into an existing triage set.Add reports from
NEW_DIR to an existing cluster structure in OLD_DIR. Reports that
fit into existing clusters are added there; reports that do not fit form new clusters.
Prints a silhouette score after updating.Calculate and print the average silhouette coefficient for an existing cluster
directory. A score near 1.0 indicates well-separated clusters.
Compute the set difference
NEW_DIR \ PREV_DIR and copy unique reports to DIFF_DIR.
Useful in CI pipelines to find crashes discovered since the last run.Path to a file containing regular expressions for function names and file paths that
should be ignored during stack trace comparison (see Ignore File Format below).
Number of parallel worker threads. Defaults to half of available CPU cores.
Deduplication
Deduplication compares filtered stack traces and removes reports whose traces are identical to one already seen. Run deduplication before clustering to improve cluster quality.Two-directory deduplication (copy unique reports)
out-dedup/; originals in the source directory are not
modified.
In-place deduplication (delete duplicates)
Provide only one directory to remove duplicates directly:Clustering
Clustering groups reports with similar stack traces into numbered subdirectories (cl1, cl2, …). Reports that cannot be parsed are placed in a clerr/ subdirectory.
Basic clustering workflow
Resulting directory structure
After clustering,out-cluster will look like this:
clN directory contains reports from the same crash cluster. The number of reports
per cluster reflects how often that crash type was triggered.
Cluster with unique crash line filtering
Keep only the report with the most representative crash line in each cluster:Similarity Comparison
Print the normalised similarity score between two individual reports:1.0 means the filtered stack traces are identical; 0.0 means they share
no common frames.
Merge
Merge new reports into an existing set, adding only those not already represented:Update (Continuous Fuzzing)
--update is designed for long-running fuzzing campaigns. It integrates new reports into
an existing cluster tree, extending existing clusters where possible and creating new
ones for genuinely different crashes.
Example — simulating an incremental update
--update, casr-cluster prints:
- Number of reports added to existing clusters
- Number of duplicates skipped
- Number of new clusters created
- Cluster silhouette score
Diff (CI Pipelines)
Compute the set of reports that are new inNEW_DIR compared with PREV_DIR and save
them to DIFF_DIR. Useful for tracking which crashes were found since the last triage:
Silhouette Score
Estimate the quality of an existing clustering with the average silhouette coefficient:Ignore File Format
The--ignore flag accepts a plain-text file with two optional sections. Frames whose
function names or file paths match any of the listed regular expressions are excluded from
stack trace comparison.
- Both sections are optional and may appear in either order.
- Each line inside a section is treated as a separate Go-compatible regular expression.
- Frames matching any pattern are dropped before similarity and deduplication analysis.
CASR_CLUSTER_UNIQUE_CRASHLINE Environment Variable
The--unique-crashline flag can also be controlled via the environment variable
CASR_CLUSTER_UNIQUE_CRASHLINE. The following values are treated as false:
n, no, f, false, off, 0, or an absent variable.
Any other value (e.g., 1, true, yes) is treated as true.