Overview
The recommended workflow follows these stages:- Triage — Understand what’s in your profile
- Drill down — Explore hot methods and their callers
- Trace — Find the hottest execution path
- Compare — Identify performance regressions or improvements
1. Triage
Start every analysis with theinfo command to understand your profile’s contents:
- Available event types (cpu, wall, alloc, lock, or hardware counters)
- Total recording duration
- Sample distribution across threads
- Top 20 hot methods by self time
- CPU vs WALL thread-group comparison (when both events exist)
- First command on any profile
- Quick overview without specifying event types
- Understanding thread distribution before filtering
- Checking if cpu and wall show different bottlenecks
If your profile contains both cpu and wall events, info automatically compares them across thread groups to highlight where threads are waiting vs actively computing.
2. Drill Down
Once you’ve identified hot methods from triage, usetree to explore their call trees:
- Self% — samples in this method itself
- Total% — samples in this method plus its callees
- Hierarchical call structure
- Hot method has high Total% but low Self% (it’s calling expensive methods)
- Understanding which callee paths are most expensive
- Comparing behavior across thread groups
- Remove Framework Noise
- Time Window Focus
Use The
--hide to remove framework or wrapper frames before analysis:--hide flag accepts regex patterns and works with tree, trace, and callers commands.3. Trace
After drilling down with tree, usetrace to find the hottest execution path from your method to the leaf:
- Root → Leaf — single hottest path through the call stack
- Self% — samples at each frame
- Leaf frame is the actual bottleneck
- Quick path from entry point to bottleneck
- Confirming tree analysis with a linear view
- Understanding the dominant execution path
Callers Analysis
Usecallers to see who’s calling your hot method:
- Which caller paths contribute most samples
- Whether the method is expensive in all contexts or just one
- Opportunity to optimize specific call sites
Line-Level Detail
When JFR contains line number information, uselines for line-by-line breakdown:
4. Compare
Compare profiles to identify regressions, improvements, or behavioral changes:Compare Two Profiles
- REGRESSION — methods that got slower
- IMPROVEMENT — methods that got faster
- NEW — methods only in the after profile
- GONE — methods only in the before profile
Compare Time Windows
For JFR files, compare two time windows within the same recording:- Before/after performance testing
- Regression detection in CI
- Comparing different phases of a long-running profile
- A/B testing different configurations
- Timeline Analysis
- Thread Analysis
Use timeline to visualize sample distribution over time and identify spikes:Timeline reveals:
- When hot methods are active (startup vs steady-state)
- Spikes that indicate intermittent issues
- Per-bucket hot method (by self time)
- CPU/WALL ratio per bucket (with
--compare)
--from/--to:Workflow Summary
Here’s the complete workflow in practice:Best Practices
- Always start with info — understand event types and thread distribution first
- Quote method names — when analyzing methods with special characters:
ap-query tree profile.jfr -m 'MyService$1' - Use —fqn for disambiguation — when multiple classes have the same method name
- Filter threads early — when profiles mix different workloads (
-tflag) - Remove idle frames for wall profiles — use
--no-idleto focus on active work - Combine timeline with —from/—to — identify spikes with timeline, then drill in
- Export for visualization —
ap-query collapse profile.jfroutputs collapsed stacks for flamegraph tools
Interpreting Results
Self% ≈ Total%
Self% ≈ Total%
The method is a leaf bottleneck — most time is spent in the method itself, not its callees. Focus optimization effort here.Example:The resize method itself is the bottleneck.
Total% >> Self%
Total% >> Self%
The method is an entry point — most time is in its callees. Use processRequest is expensive because of what it calls, not what it does directly.
tree to find the expensive callees.Example:High cpu, Low wall
High cpu, Low wall
The workload is CPU-bound — threads are actively computing, not waiting. Optimize computational algorithms.
Low cpu, High wall
Low cpu, High wall
The workload is I/O or lock-bound — threads spend most time waiting. Use
--no-idle to remove wait frames and focus on active work between waits.Switch to --event lock or examine I/O configuration.