Core Event Types
ap-query recognizes four core event types plus hardware counters:cpu (default)
Captures on-CPU samples — where your application burns CPU cycles. When to use cpu:- Identifying compute-intensive code
- Optimizing algorithms and hot loops
- Default choice when CPU usage is high
- When you want to reduce CPU consumption
- Methods actively executing on CPU cores
- Excludes threads waiting on I/O, locks, or sleep
- Shows computational bottlenecks
CPU is the default event type. If you don’t specify
--event, ap-query uses cpu when available.wall
Captures wall-clock samples — all threads at regular intervals, including blocked threads. When to use wall:- Application has low CPU but high latency
- Investigating slow requests where threads wait on I/O
- Understanding total time (compute + waiting)
- Threads blocked on database, network, or locks
- Everything cpu shows PLUS waiting time
- Threads in sleep, park, futex, epoll_wait
- True total time spent in methods
- CPU vs WALL
- Timeline Comparison
CPU: On-CPU time only (computational work)
WALL: Total elapsed time (compute + waiting)If CPU shows 5% and WALL shows 60%, your threads are blocked 55% of the time.UseWhen wall profiles show >50% idle leaf frames, use The
WALL: Total elapsed time (compute + waiting)If CPU shows 5% and WALL shows 60%, your threads are blocked 55% of the time.Use
info to compare both when available:--no-idle:--no-idle flag removes frames like:futex(lock waiting)sleep/park(explicit delays)epoll_wait(network I/O)
alloc
Captures allocation samples — where your application allocates memory. When to use alloc:- High GC overhead or long GC pauses
- Memory consumption issues
- Identifying allocation-heavy code paths
- Optimizing object creation
- Methods allocating objects (both TLAB and non-TLAB)
- Allocation hotspots causing GC pressure
- Which types are being allocated
lock
Captures lock contention samples — where threads block waiting for locks. When to use lock:- High thread contention
- Scalability issues under load
- Threads spending time in
BLOCKEDstate - Synchronization bottlenecks
- Monitor enter events (Java synchronized blocks/methods)
- Which locks have high contention
- Call stacks waiting to acquire locks
Hardware Counters
Beyond the four core event types, async-profiler supports hardware performance counters:branch-misses— CPU branch prediction failurescache-misses— CPU cache missescycles— CPU cycle countinstructions— Instructions executed- Other PMU events supported by Linux perf
- Low-level CPU optimization
- Cache optimization for hot loops
- Understanding microarchitecture behavior
Hardware counters require Linux perf support and appropriate kernel capabilities. They’re auto-detected from JFR metadata when present.
Event Selection Logic
ap-query automatically selects the appropriate event type based on what’s available in your profile.Explicit Selection
When you specify--event, ap-query uses that event:
Automatic Selection
When you don’t specify--event, ap-query follows this logic:
Single Available Event
Single Available Event
If the profile contains only one event type, ap-query uses it:Output:
Default Present
Default Present
If the profile contains multiple events and one is Output:
cpu, ap-query defaults to cpu:Fallback to Dominant
Fallback to Dominant
If cpu is not available but other events are, ap-query picks the event with the most samples:Output:
Source Code Reference
The event selection logic is implemented inevent_select.go:43-65:
Event Validation
ap-query validates event types at runtime:Known Event Types
The four core events are always recognized (event_select.go:10):
Dynamic Hardware Counters
For JFR files, hardware counter names are discovered fromjdk.ActiveSetting events:
ap-query to work with any hardware counter recorded by async-profiler.
Error Cases
pprof SampleType Mapping
pprof profiles map theirSampleType fields to ap-query events automatically (pprof.go:42-68):
| pprof SampleType | ap-query Event | Priority |
|---|---|---|
cpu/nanoseconds | cpu | 2 (high) |
samples/count | cpu | 1 (low) |
wall/nanoseconds | wall | 2 (high) |
alloc_objects/count | alloc | 1 (low) |
alloc_space/bytes | alloc | 2 (high) |
inuse_objects/count | alloc | 1 (low) |
inuse_space/bytes | alloc | 2 (high) |
contentions/count | lock | 1 (low) |
delay/nanoseconds | lock | 2 (high) |
alloc_objects/count and alloc_space/bytes, ap-query uses alloc_space/bytes (priority 2) for the alloc event.
Best Practices
Start with CPU
When unsure, start with cpu:Switch to WALL for Latency
If CPU samples are low but your application is slow, switch to wall:Use —no-idle with WALL
Wall profiles often show >50% idle frames. Remove them to focus on active work:--no-idle when appropriate (main.go:266-279):
Use info for Multi-Event Profiles
When a profile contains multiple event types, useinfo to compare them:
Record the Right Event
Match your profiling event to your investigation:- CPU-bound workload →
-e cpu(default) - I/O-bound workload →
-e wall - Memory issues →
-e alloc - Contention issues →
-e lock - Cache optimization →
-e cache-missesor-e branch-misses