Dealing with score variability

Lighthouse performance scores change between runs even when no code has changed. This is normal. Understanding where variability comes from — and how to reduce it — leads to more trustworthy results.

Variability affects the performance category most significantly. Accessibility, SEO, and best practices scores are generally stable across runs.

Why scores vary

The table below shows common sources of variability and their likelihood across different environments.

Source	Impact	Typical end user	PageSpeed Insights	Controlled lab
Page nondeterminism	High	Likely	Likely	Likely
Local network variability	High	Likely	Unlikely	Unlikely
Tier-1 network variability	Medium	Possible	Possible	Possible
Web server variability	Low	Likely	Likely	Likely
Client hardware variability	High	Likely	Unlikely	Unlikely
Client resource contention	High	Likely	Possible	Unlikely
Browser nondeterminism	Medium	Certain	Certain	Certain

Page nondeterminism

Pages with A/B tests, ad campaigns, or randomly loaded assets produce different results on each run by design. This variance is intentional and cannot be removed by Lighthouse. The only mitigation is ensuring you test the exact same version of the page across runs.

Local network variability

Packet loss, variable traffic prioritization, and last-mile congestion all affect timing. Simulated throttling (Lighthouse’s default) mitigates this by replaying network activity independently of the real network. DevTools throttling only partially masks these effects.

Client hardware variability

The machine running Chrome directly affects how fast JavaScript executes and how quickly the page renders. Simulated throttling partially mitigates this by capping theoretical CPU task execution time during simulation. DevTools throttling does not.

Client resource contention

Other processes running alongside Lighthouse — anti-virus software, browser extensions, or other Lighthouse instances — compete for CPU, memory, and network. Multi-tenant CI environments (Travis, shared AWS/GCP instances) are particularly susceptible.

Browser nondeterminism

Browsers have inherent variability in task scheduling. This is unavoidable when using DevTools throttling, which records whatever the browser observed. Simulated throttling partially mitigates this by re-simulating execution using task durations captured during the real run.

How throttling strategies compare

Source	Impact	Simulated throttling	DevTools throttling	No throttling
Page nondeterminism	High	No mitigation	No mitigation	No mitigation
Local network variability	High	Mitigated	Partially mitigated	No mitigation
Tier-1 network variability	Medium	Mitigated	Partially mitigated	No mitigation
Web server variability	Low	No mitigation	Partially mitigated	No mitigation
Client hardware variability	High	Partially mitigated	No mitigation	No mitigation
Client resource contention	High	Partially mitigated	No mitigation	No mitigation
Browser nondeterminism	Medium	Partially mitigated	No mitigation	No mitigation

Strategies to reduce variability

Run on adequate hardware

Underpowered machines produce noisy results. Minimum requirements for reliable performance testing:

2 dedicated CPU cores (4 recommended)
2 GB RAM (4–8 GB recommended)
Avoid burstable or shared-core instance types (AWS t instances, GCP shared-core N1/E2)
Avoid function-as-a-service infrastructure (AWS Lambda, Google Cloud Functions)

Suitable single-instance options: AWS m5.large, GCP n2-standard-2, Azure D2 (approximately $0.10/hour, ~30 seconds per test).

Do not run multiple Lighthouse tests concurrently on the same machine. Parallel runs compete for CPU and memory, and will skew all results. Scale horizontally across machines rather than vertically on one machine.

Isolate external factors

Test against a local or same-network server to eliminate network hops.
Disable browser extensions in the test profile.
Remove anti-virus scanning from the test environment when possible.
Avoid A/B tests or randomized content during performance measurement.

Run Lighthouse multiple times and use the median

A single run is not reliable enough for decision-making. The median of 5 runs is approximately twice as stable as a single run.

Run Lighthouse at least 3 times per URL and use the median score for comparisons. Use 5 runs when precision matters.

The simplest way to collect multiple runs and extract the median is Lighthouse CI:

npx -p @lhci/cli lhci collect --url https://example.com -n 5
npx -p @lhci/cli lhci upload --target filesystem --outputDir ./lhci-reports

To read the median result from the saved reports:

import fs from 'fs';

const manifest = JSON.parse(fs.readFileSync('./lhci-reports/manifest.json', 'utf-8'));
const medianEntry = manifest.find(entry => entry.isRepresentativeRun);
const medianResult = JSON.parse(fs.readFileSync(medianEntry.jsonPath, 'utf-8'));

console.log('Median performance score:', medianResult.categories.performance.score * 100);

You can also use the PageSpeed Insights API as the data source for multiple runs:

npx -p @lhci/cli lhci collect \
  --url https://example.com \
  -n 5 \
  --mode psi \
  --psiApiKey YOUR_API_KEY

When running Lighthouse directly via Node, use computeMedianRun to select the representative run:

import { spawnSync } from 'child_process';
import { createRequire } from 'module';
import { computeMedianRun } from 'lighthouse/core/lib/median-run.js';

const require = createRequire(import.meta.url);
const lighthouseCli = require.resolve('lighthouse/cli');

const results = [];
for (let i = 0; i < 5; i++) {
  console.log(`Run ${i + 1} of 5...`);
  const { status = -1, stdout } = spawnSync('node', [
    lighthouseCli,
    'https://example.com',
    '--output=json',
  ]);
  if (status !== 0) continue;
  results.push(JSON.parse(stdout));
}

const median = computeMedianRun(results);
console.log('Median performance score:', median.categories.performance.score * 100);

Tracking trends over time

For ongoing monitoring, tracking trends is more meaningful than asserting a fixed score threshold. A score that fluctuates within a known range is expected. A score that drops consistently across multiple runs signals a real regression. Lighthouse CI is the recommended tool for tracking score trends over time. It stores results per commit and surfaces regressions in pull requests.

Get Started

Running Audits

Configuration

Advanced Usage

Dealing with score variability

Why scores vary

Page nondeterminism

Local network variability

Client hardware variability

Client resource contention

Browser nondeterminism

How throttling strategies compare

Strategies to reduce variability

Run on adequate hardware

Isolate external factors

Run Lighthouse multiple times and use the median

Tracking trends over time

Build docs developers (and LLMs) love

Get Started

Running Audits

Configuration

Advanced Usage

Documentation Index

​Why scores vary

​Page nondeterminism

​Local network variability

​Client hardware variability

​Client resource contention

​Browser nondeterminism

​How throttling strategies compare

​Strategies to reduce variability

​Run on adequate hardware

​Isolate external factors

​Run Lighthouse multiple times and use the median

​Tracking trends over time

​Related documentation

Build docs developers (and LLMs) love

Why scores vary

Page nondeterminism

Local network variability

Client hardware variability

Client resource contention

Browser nondeterminism

How throttling strategies compare

Strategies to reduce variability

Run on adequate hardware

Isolate external factors

Run Lighthouse multiple times and use the median

Tracking trends over time

Related documentation