Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/grafana/k6/llms.txt

Use this file to discover all available pages before exploring further.

Stress testing pushes your system beyond normal operating capacity to identify breaking points, observe how it fails, and test recovery mechanisms.

Purpose

Stress tests help you:
  • Identify system breaking points and maximum capacity
  • Observe how the system fails under extreme load
  • Test system recovery after failure
  • Find memory leaks and resource exhaustion issues
  • Validate that the system degrades gracefully
Stress tests will likely cause errors and failures. This is intentional - the goal is to understand failure modes and breaking points.

Configuration Pattern

Stress tests ramp up beyond normal capacity:
import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },  // Below normal load
    { duration: '5m', target: 100 },
    { duration: '2m', target: 200 },  // Normal load
    { duration: '5m', target: 200 },
    { duration: '2m', target: 300 },  // Around breaking point
    { duration: '5m', target: 300 },
    { duration: '2m', target: 400 },  // Beyond breaking point
    { duration: '5m', target: 400 },
    { duration: '10m', target: 0 },   // Recovery
  ],
};

export default function() {
  const res = http.get('https://quickpizza.grafana.com');
  check(res, {
    'status is 200': (r) => r.status === 200,
  });
  sleep(1);
}

Using the Ramping VUs Executor

The ramping-vus executor is ideal for stress testing:
export const options = {
  scenarios: {
    stress: {
      executor: 'ramping-vus',
      startVUs: 0,
      stages: [
        { duration: '2m', target: 100 },
        { duration: '5m', target: 100 },
        { duration: '2m', target: 200 },
        { duration: '5m', target: 200 },
        { duration: '2m', target: 300 },
        { duration: '5m', target: 300 },
        { duration: '2m', target: 400 },
        { duration: '5m', target: 400 },
        { duration: '10m', target: 0 },
      ],
      gracefulRampDown: '30s',
    },
  },
};

Ramp Up

Gradually increase to breaking point

Peak Stress

Maintain extreme load

Recovery

Monitor system recovery

Stress Test Stages

1

Baseline Load

Start below normal operating capacity to establish a baseline.
{ duration: '2m', target: 100 },
{ duration: '5m', target: 100 },
2

Normal Capacity

Reach expected peak load to verify normal operation.
{ duration: '2m', target: 200 },
{ duration: '5m', target: 200 },
3

Stress Zone

Push beyond normal capacity to find breaking points.
{ duration: '2m', target: 300 },
{ duration: '5m', target: 300 },
{ duration: '2m', target: 400 },
{ duration: '5m', target: 400 },
4

Recovery Period

Ramp down and observe how the system recovers.
{ duration: '10m', target: 0 },

Advanced Stress Testing

Multi-Stage Stress Pattern

Test multiple stress levels:
import http from 'k6/http';
import { check } from 'k6';

export const options = {
  stages: [
    // Warm up
    { duration: '1m', target: 50 },
    
    // Stress level 1: 150% of normal
    { duration: '3m', target: 150 },
    { duration: '5m', target: 150 },
    
    // Stress level 2: 200% of normal
    { duration: '3m', target: 200 },
    { duration: '5m', target: 200 },
    
    // Stress level 3: 300% of normal
    { duration: '3m', target: 300 },
    { duration: '5m', target: 300 },
    
    // Recovery
    { duration: '10m', target: 0 },
  ],
  thresholds: {
    // More lenient thresholds for stress tests
    http_req_duration: ['p(95)<2000'],
    http_req_failed: ['rate<0.1'], // Allow up to 10% errors
  },
};

export default function() {
  const res = http.get('https://quickpizza.grafana.com/api/pizza');
  check(res, { 'status is 200': (r) => r.status === 200 });
}
Stress test thresholds are typically more lenient than load tests since you expect the system to struggle under extreme conditions.

What to Monitor

System Metrics

  • Response times: When do they start degrading?
  • Error rates: At what load do errors appear?
  • Throughput: Where does it plateau?
  • Resource usage: CPU, memory, disk, network
  • Queue depths: Database connections, message queues

Breaking Point Indicators

import http from 'k6/http';
import { check, sleep } from 'k6';
import { Counter } from 'k6/metrics';

const errors = new Counter('errors');
const timeouts = new Counter('timeouts');

export const options = {
  stages: [
    { duration: '2m', target: 100 },
    { duration: '5m', target: 100 },
    { duration: '2m', target: 200 },
    { duration: '5m', target: 200 },
    { duration: '2m', target: 300 },
    { duration: '5m', target: 300 },
  ],
  thresholds: {
    errors: ['count<100'],
    timeouts: ['count<50'],
  },
};

export default function() {
  const res = http.get('https://quickpizza.grafana.com', {
    timeout: '10s',
  });
  
  if (res.status === 0) {
    timeouts.add(1);
  }
  
  if (res.status >= 400) {
    errors.add(1);
  }
  
  check(res, {
    'status is 200': (r) => r.status === 200,
  });
  
  sleep(1);
}
Use custom metrics to track specific failure modes like timeouts, connection errors, and server errors.

Best Practices

Gradual Stress Increase

Incremental Steps

Increase load in 25-50% increments to identify exact breaking points

Hold Periods

Maintain each stress level for 3-5 minutes to observe steady-state behavior

Recovery Testing

The recovery period is critical:
export const options = {
  stages: [
    // ... stress stages ...
    
    // Long recovery period to monitor
    { duration: '10m', target: 0 },
  ],
};
During recovery, monitor:
  • How quickly do response times return to normal?
  • Are there lingering errors or stuck processes?
  • Do queues drain properly?
  • Does memory get released?

Realistic Stress Scenarios

import http from 'k6/http';
import { check, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '2m', target: 100 },
    { duration: '5m', target: 200 },
    { duration: '2m', target: 300 },
    { duration: '5m', target: 300 },
    { duration: '10m', target: 0 },
  ],
};

export default function() {
  // Mix of read and write operations
  http.get('https://quickpizza.grafana.com/api/pizza');
  sleep(0.5);
  
  http.post('https://quickpizza.grafana.com/api/cart', JSON.stringify({
    pizzaId: 1,
    quantity: 2,
  }), {
    headers: { 'Content-Type': 'application/json' },
  });
  
  sleep(1);
}

When to Use

  • Capacity planning: Determine absolute maximum capacity
  • Failure mode analysis: Understand how the system fails
  • Auto-scaling validation: Test that scaling mechanisms work
  • Resource limits: Identify resource bottlenecks
  • Pre-production: Before major releases or traffic events

Common Findings

Expected Behaviors

  • Graceful degradation: System slows but doesn’t crash
  • Error handling: Meaningful error messages
  • Resource limits: Clear capacity boundaries
  • Recovery: System returns to normal after stress

Red Flags

Watch for these serious issues:
  • Complete system crashes
  • Cascading failures across services
  • Memory leaks that persist after recovery
  • Data corruption or inconsistency
  • Inability to recover without manual intervention

Analysis Tips

Identify your breaking point by analyzing:
  1. Response time curve: Where does p(95) exceed acceptable limits?
  2. Error rate: When do errors start appearing?
  3. Throughput plateau: Where does requests/sec stop increasing?
  4. Resource exhaustion: When do CPU/memory/connections max out?
export const options = {
  thresholds: {
    http_req_duration: [
      'p(50)<500',   // Median should be fast
      'p(95)<2000',  // 95th percentile can be slower
      'p(99)<5000',  // 99th percentile - stress conditions
    ],
    http_req_failed: ['rate<0.1'], // Up to 10% errors acceptable
  },
};

Build docs developers (and LLMs) love