Skip to main content
Lighthouse performance scores change between runs even when no code has changed. This is normal. Understanding where variability comes from — and how to reduce it — leads to more trustworthy results.
Variability affects the performance category most significantly. Accessibility, SEO, and best practices scores are generally stable across runs.

Why scores vary

The table below shows common sources of variability and their likelihood across different environments.
SourceImpactTypical end userPageSpeed InsightsControlled lab
Page nondeterminismHighLikelyLikelyLikely
Local network variabilityHighLikelyUnlikelyUnlikely
Tier-1 network variabilityMediumPossiblePossiblePossible
Web server variabilityLowLikelyLikelyLikely
Client hardware variabilityHighLikelyUnlikelyUnlikely
Client resource contentionHighLikelyPossibleUnlikely
Browser nondeterminismMediumCertainCertainCertain

Page nondeterminism

Pages with A/B tests, ad campaigns, or randomly loaded assets produce different results on each run by design. This variance is intentional and cannot be removed by Lighthouse. The only mitigation is ensuring you test the exact same version of the page across runs.

Local network variability

Packet loss, variable traffic prioritization, and last-mile congestion all affect timing. Simulated throttling (Lighthouse’s default) mitigates this by replaying network activity independently of the real network. DevTools throttling only partially masks these effects.

Client hardware variability

The machine running Chrome directly affects how fast JavaScript executes and how quickly the page renders. Simulated throttling partially mitigates this by capping theoretical CPU task execution time during simulation. DevTools throttling does not.

Client resource contention

Other processes running alongside Lighthouse — anti-virus software, browser extensions, or other Lighthouse instances — compete for CPU, memory, and network. Multi-tenant CI environments (Travis, shared AWS/GCP instances) are particularly susceptible.

Browser nondeterminism

Browsers have inherent variability in task scheduling. This is unavoidable when using DevTools throttling, which records whatever the browser observed. Simulated throttling partially mitigates this by re-simulating execution using task durations captured during the real run.

How throttling strategies compare

SourceImpactSimulated throttlingDevTools throttlingNo throttling
Page nondeterminismHighNo mitigationNo mitigationNo mitigation
Local network variabilityHighMitigatedPartially mitigatedNo mitigation
Tier-1 network variabilityMediumMitigatedPartially mitigatedNo mitigation
Web server variabilityLowNo mitigationPartially mitigatedNo mitigation
Client hardware variabilityHighPartially mitigatedNo mitigationNo mitigation
Client resource contentionHighPartially mitigatedNo mitigationNo mitigation
Browser nondeterminismMediumPartially mitigatedNo mitigationNo mitigation

Strategies to reduce variability

Run on adequate hardware

Underpowered machines produce noisy results. Minimum requirements for reliable performance testing:
  • 2 dedicated CPU cores (4 recommended)
  • 2 GB RAM (4–8 GB recommended)
  • Avoid burstable or shared-core instance types (AWS t instances, GCP shared-core N1/E2)
  • Avoid function-as-a-service infrastructure (AWS Lambda, Google Cloud Functions)
Suitable single-instance options: AWS m5.large, GCP n2-standard-2, Azure D2 (approximately $0.10/hour, ~30 seconds per test).
Do not run multiple Lighthouse tests concurrently on the same machine. Parallel runs compete for CPU and memory, and will skew all results. Scale horizontally across machines rather than vertically on one machine.

Isolate external factors

  • Test against a local or same-network server to eliminate network hops.
  • Disable browser extensions in the test profile.
  • Remove anti-virus scanning from the test environment when possible.
  • Avoid A/B tests or randomized content during performance measurement.

Run Lighthouse multiple times and use the median

A single run is not reliable enough for decision-making. The median of 5 runs is approximately twice as stable as a single run.
Run Lighthouse at least 3 times per URL and use the median score for comparisons. Use 5 runs when precision matters.
The simplest way to collect multiple runs and extract the median is Lighthouse CI:
npx -p @lhci/cli lhci collect --url https://example.com -n 5
npx -p @lhci/cli lhci upload --target filesystem --outputDir ./lhci-reports
To read the median result from the saved reports:
import fs from 'fs';

const manifest = JSON.parse(fs.readFileSync('./lhci-reports/manifest.json', 'utf-8'));
const medianEntry = manifest.find(entry => entry.isRepresentativeRun);
const medianResult = JSON.parse(fs.readFileSync(medianEntry.jsonPath, 'utf-8'));

console.log('Median performance score:', medianResult.categories.performance.score * 100);
You can also use the PageSpeed Insights API as the data source for multiple runs:
npx -p @lhci/cli lhci collect \
  --url https://example.com \
  -n 5 \
  --mode psi \
  --psiApiKey YOUR_API_KEY
When running Lighthouse directly via Node, use computeMedianRun to select the representative run:
import { spawnSync } from 'child_process';
import { createRequire } from 'module';
import { computeMedianRun } from 'lighthouse/core/lib/median-run.js';

const require = createRequire(import.meta.url);
const lighthouseCli = require.resolve('lighthouse/cli');

const results = [];
for (let i = 0; i < 5; i++) {
  console.log(`Run ${i + 1} of 5...`);
  const { status = -1, stdout } = spawnSync('node', [
    lighthouseCli,
    'https://example.com',
    '--output=json',
  ]);
  if (status !== 0) continue;
  results.push(JSON.parse(stdout));
}

const median = computeMedianRun(results);
console.log('Median performance score:', median.categories.performance.score * 100);
For ongoing monitoring, tracking trends is more meaningful than asserting a fixed score threshold. A score that fluctuates within a known range is expected. A score that drops consistently across multiple runs signals a real regression. Lighthouse CI is the recommended tool for tracking score trends over time. It stores results per commit and surfaces regressions in pull requests.

Build docs developers (and LLMs) love