Lighthouse performance scores change between runs even when no code has changed. This is normal. Understanding where variability comes from — and how to reduce it — leads to more trustworthy results.
Variability affects the performance category most significantly. Accessibility, SEO, and best practices scores are generally stable across runs.
Why scores vary
The table below shows common sources of variability and their likelihood across different environments.
| Source | Impact | Typical end user | PageSpeed Insights | Controlled lab |
|---|
| Page nondeterminism | High | Likely | Likely | Likely |
| Local network variability | High | Likely | Unlikely | Unlikely |
| Tier-1 network variability | Medium | Possible | Possible | Possible |
| Web server variability | Low | Likely | Likely | Likely |
| Client hardware variability | High | Likely | Unlikely | Unlikely |
| Client resource contention | High | Likely | Possible | Unlikely |
| Browser nondeterminism | Medium | Certain | Certain | Certain |
Page nondeterminism
Pages with A/B tests, ad campaigns, or randomly loaded assets produce different results on each run by design. This variance is intentional and cannot be removed by Lighthouse. The only mitigation is ensuring you test the exact same version of the page across runs.
Local network variability
Packet loss, variable traffic prioritization, and last-mile congestion all affect timing. Simulated throttling (Lighthouse’s default) mitigates this by replaying network activity independently of the real network. DevTools throttling only partially masks these effects.
Client hardware variability
The machine running Chrome directly affects how fast JavaScript executes and how quickly the page renders. Simulated throttling partially mitigates this by capping theoretical CPU task execution time during simulation. DevTools throttling does not.
Client resource contention
Other processes running alongside Lighthouse — anti-virus software, browser extensions, or other Lighthouse instances — compete for CPU, memory, and network. Multi-tenant CI environments (Travis, shared AWS/GCP instances) are particularly susceptible.
Browser nondeterminism
Browsers have inherent variability in task scheduling. This is unavoidable when using DevTools throttling, which records whatever the browser observed. Simulated throttling partially mitigates this by re-simulating execution using task durations captured during the real run.
How throttling strategies compare
| Source | Impact | Simulated throttling | DevTools throttling | No throttling |
|---|
| Page nondeterminism | High | No mitigation | No mitigation | No mitigation |
| Local network variability | High | Mitigated | Partially mitigated | No mitigation |
| Tier-1 network variability | Medium | Mitigated | Partially mitigated | No mitigation |
| Web server variability | Low | No mitigation | Partially mitigated | No mitigation |
| Client hardware variability | High | Partially mitigated | No mitigation | No mitigation |
| Client resource contention | High | Partially mitigated | No mitigation | No mitigation |
| Browser nondeterminism | Medium | Partially mitigated | No mitigation | No mitigation |
Strategies to reduce variability
Run on adequate hardware
Underpowered machines produce noisy results. Minimum requirements for reliable performance testing:
- 2 dedicated CPU cores (4 recommended)
- 2 GB RAM (4–8 GB recommended)
- Avoid burstable or shared-core instance types (AWS
t instances, GCP shared-core N1/E2)
- Avoid function-as-a-service infrastructure (AWS Lambda, Google Cloud Functions)
Suitable single-instance options: AWS m5.large, GCP n2-standard-2, Azure D2 (approximately $0.10/hour, ~30 seconds per test).
Do not run multiple Lighthouse tests concurrently on the same machine.
Parallel runs compete for CPU and memory, and will skew all results. Scale
horizontally across machines rather than vertically on one machine.
Isolate external factors
- Test against a local or same-network server to eliminate network hops.
- Disable browser extensions in the test profile.
- Remove anti-virus scanning from the test environment when possible.
- Avoid A/B tests or randomized content during performance measurement.
A single run is not reliable enough for decision-making. The median of 5 runs is approximately twice as stable as a single run.
Run Lighthouse at least 3 times per URL and use the median score for
comparisons. Use 5 runs when precision matters.
The simplest way to collect multiple runs and extract the median is Lighthouse CI:
npx -p @lhci/cli lhci collect --url https://example.com -n 5
npx -p @lhci/cli lhci upload --target filesystem --outputDir ./lhci-reports
To read the median result from the saved reports:
import fs from 'fs';
const manifest = JSON.parse(fs.readFileSync('./lhci-reports/manifest.json', 'utf-8'));
const medianEntry = manifest.find(entry => entry.isRepresentativeRun);
const medianResult = JSON.parse(fs.readFileSync(medianEntry.jsonPath, 'utf-8'));
console.log('Median performance score:', medianResult.categories.performance.score * 100);
You can also use the PageSpeed Insights API as the data source for multiple runs:
npx -p @lhci/cli lhci collect \
--url https://example.com \
-n 5 \
--mode psi \
--psiApiKey YOUR_API_KEY
When running Lighthouse directly via Node, use computeMedianRun to select the representative run:
import { spawnSync } from 'child_process';
import { createRequire } from 'module';
import { computeMedianRun } from 'lighthouse/core/lib/median-run.js';
const require = createRequire(import.meta.url);
const lighthouseCli = require.resolve('lighthouse/cli');
const results = [];
for (let i = 0; i < 5; i++) {
console.log(`Run ${i + 1} of 5...`);
const { status = -1, stdout } = spawnSync('node', [
lighthouseCli,
'https://example.com',
'--output=json',
]);
if (status !== 0) continue;
results.push(JSON.parse(stdout));
}
const median = computeMedianRun(results);
console.log('Median performance score:', median.categories.performance.score * 100);
Tracking trends over time
For ongoing monitoring, tracking trends is more meaningful than asserting a fixed score threshold. A score that fluctuates within a known range is expected. A score that drops consistently across multiple runs signals a real regression.
Lighthouse CI is the recommended tool for tracking score trends over time. It stores results per commit and surfaces regressions in pull requests.