Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pixlcore/xyops/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Snapshots capture a point-in-time view of everything happening on one server (or across a server group). They’re designed for fast forensics, side-by-side comparisons (before/after a deploy, during an incident), and long-term audit trails.
A snapshot records the current state of a server including processes, connections, mounts, devices, metrics, jobs, alerts, and more.

Key Points

  • A snapshot records the current state of a server (processes, connections, mounts, devices, metrics, jobs, alerts)
  • Group snapshots record a whole group at once (all current members, plus recently offline servers)
  • Snapshots can be created manually, by API, or automatically via Actions and Watches
  • Snapshots are visible on the Snapshots page and linked from servers, groups, jobs and alerts
  • Snapshots are retained up to a global cap (default 100,000) and pruned nightly

What a Snapshot Contains

Server Snapshots

All server snapshots include:
Minute sample
ServerMonitorData
A full copy of the current ServerMonitorData:
  • CPU, memory, load, OS/platform/release/arch, uptime
  • Full process list and process stats
  • Active network connections (including listeners)
  • Network interfaces and stats
  • Disk mounts and filesystem stats
  • Monitors (computed values) and deltas
  • Raw plugin command output
Quick metrics
QuickmonData
The last 60 seconds of per-second “quick” samples for CPU/mem/disk/net
Context
references
  • IDs of active jobs at capture time
  • IDs of active alerts at capture time
  • For workflow sub-jobs, parents may be included for context

Group Snapshots

Group snapshots add fleet context:
  • All current members (online) plus recently offline servers (within the last hour)
  • Per-server ServerMonitorData objects aligned 1:1 with servers
  • Per-server 60-second quick samples aligned 1:1 with servers
  • Active alerts and jobs relevant to any member server at capture time
  • Each server labeled with online/offline state

Code Implementation

Snapshot creation from /workspace/source/lib/monitor.js:731-761:
monitor.js:731-761
saveSnapshot(server, params, callback) {
    // save snapshot of server data
    var self = this;
    if (!callback) callback = function() {};
    
    var snapshot = Tools.mergeHashes(params, {
        id: Tools.generateShortID('sn'),
        type: 'server',
        server: server.id,
        groups: server.groups,
        alerts: this.findActiveAlerts({ server: server.id }).map( function(alert) { return alert.id; } ),
        jobs: this.findActiveJobs({ server: server.id }).map( function(job) { return job.id; } ),
        quickmon: this.quickMonCache[server.id] || []
    });
    
    // for workflow sub-jobs, also include the parent workflow job in the list
    snapshot.jobs.forEach( function(id) {
        var job = self.activeJobs[id];
        if (!job) return; // sanity
        
        if (job.workflow && job.workflow.job && !snapshot.jobs.includes(job.workflow.job)) {
            snapshot.jobs.push(job.workflow.job);
        }
    });
    
    this.logMonitor(6, "Saving snapshot of server: " + server.id, { hostname: server.hostname, snapshot_id: snapshot.id });
    
    this.unbase.insert( 'snapshots', snapshot.id, snapshot, function(err) {
        if (err) return callback(err);
        callback(null, snapshot.id);
    } );
}

Creating Snapshots

You can create snapshots in several ways:
  • Server: Open a server page and click “Snapshot”
  • Group: Open a group page and click “Snapshot”
Add a Snapshot action to a job or alert:
  • Jobs: The job must target a specific server; the snapshot is taken on that server
  • Alerts: The snapshot is taken on the alert’s server when the action triggers
Note: A snapshot is included by default via universal alert actions on alert_new
  • Server: create_snapshot API call
  • Group: create_group_snapshot API call
Creating snapshots (UI or API) requires the create_snapshots privilege.

Watches

Watches instruct xyOps to take snapshots every minute for a specified duration. Use these to capture short-lived issues or observe changes during a rollout.

Server Watch

1

Start a watch

Set from a server page (UI) or API: watch_server
2

Automatic snapshots

Snapshots are taken when that server’s minute data arrives (each server’s minute offset is deterministically staggered across the fleet)
3

Cancel

Set duration to 0 (UI or API). The UI defaults to 5 minutes.

Group Watch

1

Start a watch

Set from a group page (UI) or API: watch_group
2

Automatic snapshots

Snapshots run once per minute on the :30 second mark, capturing all matching servers using their most recent minute samples
3

Offline servers

Recently offline servers (within the last hour) are included and marked offline

Watch Implementation

Group watch checking from /workspace/source/lib/monitor.js:835-859:
monitor.js:835-859
checkGroupWatches() {
    // see if any groups need to be watched, runs every minute on the :30 sec
    var self = this;
    var now = Tools.timeNow(true);
    var snap_groups = [];
    
    if (this.state.watches && this.state.watches.groups) {
        for (var group_id in this.state.watches.groups) {
            var expires = this.state.watches.groups[group_id];
            if (now < expires) snap_groups.push(group_id);
        }
    }
    
    if (!snap_groups.length) return;
    
    async.eachSeries( snap_groups,
        function(group_id, callback) {
            self.createGroupSnapshot( group_id, { source: 'watch' }, callback );
        },
        function(err) {
            if (err) self.logError('snapshot', "Failed to watch groups: " + err);
            else self.logMonitor(7, "Group watch snaps complete");
        }
    );
}
Provenance: Automatically created snapshots record source as watch; manually created ones record source as user and include username.

Snapshot Retention

From config.json:242-244:
config.json:242-244
"snapshots": {
    "max_rows": 100000
}
Snapshots are:
  • Retained up to a global cap (default: 100,000)
  • Pruned nightly during database maintenance
  • Deduplicated by type/source/time to prevent excessive storage

Viewing and Searching

  • UI: Click on “Snapshots” in the sidebar
  • Links: Snapshots also link from server and group pages, and from job/alert activity when actions create them
  • API search: Use search_snapshots to filter and paginate snapshot history

Search Parameters

  • Filter by server ID or group ID
  • Filter by source (user, watch, alert, job)
  • Filter by date range
  • Sort by creation time
  • Paginate results

Comparison and Forensics

Use Cases

Take one snapshot before and one after your change; record links in the related ticket or job notes to compare:
  • Process changes (what started/stopped)
  • Network connection changes
  • Resource usage deltas (CPU, memory, disk)
  • Configuration changes
Assign snapshot actions on both job start AND job complete to compare the server state before and after job execution.
If a problem is bursty or short-lived, start a short watch (5-10 minutes) rather than taking a single manual snapshot to capture the full behavior.
When an alert fires, the universal snapshot action captures the exact state at firing time, allowing you to examine processes, connections, and metrics.

Troubleshooting and Tips

Understand minute vs. second data: The core state is minute-granularity ServerMonitorData; the quickmon buffer adds the previous 60 seconds of per-second context.
Group snapshots timing: Group watch runs on :30; servers submit minute samples on staggered offsets. Group snapshots use the latest saved minute for each server.

Best Practices

  • Prefer watches for transient issues: Capture continuous state over several minutes
  • Align timing with events: Take snapshots before/after deployments or changes
  • Use workflow context: Snapshots automatically include parent workflow jobs for full context
  • Check permissions: Ensure your user or API Key has create_snapshots privilege
  • Recently offline hosts: Group snapshots include recently offline hosts (last hour) marked offline

Data Structure

Group snapshot creation from /workspace/source/lib/monitor.js:764-832:
monitor.js:774-786
// gather all active servers for group, and make copies so we can decorate
var servers = Object.values(this.servers).filter( function(server) {
    // also filter out servers that were JUST added, to avoid possible race condition
    return server.groups.includes(group_id) && ((now - server.modified) > 1);
} ).map( function(server) {
    return Object.assign( {}, server, { offline: false } );
} );

// add in recently offline (cached) servers too, also copy and decorate
servers = servers.concat( Object.values(this.serverCache).filter( function(server) {
    return server.groups.includes(group_id) && !self.servers[server.id] && (server.modified > now - 3600);
} ).map( function(server) {
    return Object.assign( {}, server, { offline: true } );
} ) );
  • create_snapshot - Take a snapshot of a specific server
  • watch_server - Start/stop a server watch
  • create_group_snapshot - Take a snapshot of an entire group
  • watch_group - Start/stop a group watch
  • search_snapshots - Search and filter snapshot history

See Also

  • Servers - Server management and monitoring
  • Monitors - Time-series metrics and expressions
  • Alerts - Alert definitions and actions
  • Data Objects: Snapshot, GroupSnapshot, ServerMonitorData, QuickmonData

Build docs developers (and LLMs) love