Documentation Index
Fetch the complete documentation index at: https://mintlify.com/pixlcore/xyops/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Snapshots capture a point-in-time view of everything happening on one server (or across a server group). They’re designed for fast forensics, side-by-side comparisons (before/after a deploy, during an incident), and long-term audit trails.A snapshot records the current state of a server including processes, connections, mounts, devices, metrics, jobs, alerts, and more.
Key Points
- A snapshot records the current state of a server (processes, connections, mounts, devices, metrics, jobs, alerts)
- Group snapshots record a whole group at once (all current members, plus recently offline servers)
- Snapshots can be created manually, by API, or automatically via Actions and Watches
- Snapshots are visible on the Snapshots page and linked from servers, groups, jobs and alerts
- Snapshots are retained up to a global cap (default 100,000) and pruned nightly
What a Snapshot Contains
Server Snapshots
All server snapshots include:A full copy of the current ServerMonitorData:
- CPU, memory, load, OS/platform/release/arch, uptime
- Full process list and process stats
- Active network connections (including listeners)
- Network interfaces and stats
- Disk mounts and filesystem stats
- Monitors (computed values) and deltas
- Raw plugin command output
The last 60 seconds of per-second “quick” samples for CPU/mem/disk/net
- IDs of active jobs at capture time
- IDs of active alerts at capture time
- For workflow sub-jobs, parents may be included for context
Group Snapshots
Group snapshots add fleet context:- All current members (online) plus recently offline servers (within the last hour)
- Per-server ServerMonitorData objects aligned 1:1 with
servers - Per-server 60-second quick samples aligned 1:1 with
servers - Active alerts and jobs relevant to any member server at capture time
- Each server labeled with online/offline state
Code Implementation
Snapshot creation from/workspace/source/lib/monitor.js:731-761:
monitor.js:731-761
Creating Snapshots
You can create snapshots in several ways:Manually (UI)
Manually (UI)
- Server: Open a server page and click “Snapshot”
- Group: Open a group page and click “Snapshot”
Automatically via Actions
Automatically via Actions
Add a Snapshot action to a job or alert:
- Jobs: The job must target a specific server; the snapshot is taken on that server
- Alerts: The snapshot is taken on the alert’s server when the action triggers
alert_newBy API
By API
- Server:
create_snapshotAPI call - Group:
create_group_snapshotAPI call
Watches
Watches instruct xyOps to take snapshots every minute for a specified duration. Use these to capture short-lived issues or observe changes during a rollout.Server Watch
Automatic snapshots
Snapshots are taken when that server’s minute data arrives (each server’s minute offset is deterministically staggered across the fleet)
Group Watch
Automatic snapshots
Snapshots run once per minute on the :30 second mark, capturing all matching servers using their most recent minute samples
Watch Implementation
Group watch checking from/workspace/source/lib/monitor.js:835-859:
monitor.js:835-859
Provenance: Automatically created snapshots record
source as watch; manually created ones record source as user and include username.Snapshot Retention
Fromconfig.json:242-244:
config.json:242-244
- Retained up to a global cap (default: 100,000)
- Pruned nightly during database maintenance
- Deduplicated by type/source/time to prevent excessive storage
Viewing and Searching
- UI: Click on “Snapshots” in the sidebar
- Links: Snapshots also link from server and group pages, and from job/alert activity when actions create them
- API search: Use
search_snapshotsto filter and paginate snapshot history
Search Parameters
- Filter by server ID or group ID
- Filter by source (
user,watch,alert,job) - Filter by date range
- Sort by creation time
- Paginate results
Comparison and Forensics
Use Cases
Pre/Post Comparisons
Pre/Post Comparisons
Take one snapshot before and one after your change; record links in the related ticket or job notes to compare:
- Process changes (what started/stopped)
- Network connection changes
- Resource usage deltas (CPU, memory, disk)
- Configuration changes
Troublesome Jobs
Troublesome Jobs
Assign snapshot actions on both job start AND job complete to compare the server state before and after job execution.
Transient Issues
Transient Issues
If a problem is bursty or short-lived, start a short watch (5-10 minutes) rather than taking a single manual snapshot to capture the full behavior.
Alert Forensics
Alert Forensics
When an alert fires, the universal snapshot action captures the exact state at firing time, allowing you to examine processes, connections, and metrics.
Troubleshooting and Tips
Understand minute vs. second data: The core state is minute-granularity ServerMonitorData; the
quickmon buffer adds the previous 60 seconds of per-second context.Best Practices
- Prefer watches for transient issues: Capture continuous state over several minutes
- Align timing with events: Take snapshots before/after deployments or changes
- Use workflow context: Snapshots automatically include parent workflow jobs for full context
- Check permissions: Ensure your user or API Key has
create_snapshotsprivilege - Recently offline hosts: Group snapshots include recently offline hosts (last hour) marked offline
Data Structure
Group snapshot creation from/workspace/source/lib/monitor.js:764-832:
monitor.js:774-786
Related APIs
create_snapshot- Take a snapshot of a specific serverwatch_server- Start/stop a server watchcreate_group_snapshot- Take a snapshot of an entire groupwatch_group- Start/stop a group watchsearch_snapshots- Search and filter snapshot history