Puppeteer Provider — JavaScript Rendering

PuppeteerProvider drives a real headless Chromium browser through the Puppeteer library. Unlike FetchProvider, it executes JavaScript on every page it visits, which means single-page applications, lazy-loaded content, and client-side-rendered portfolios are all resolved correctly. A single browser process is launched on first use and reused across multiple getPageContent calls; each URL gets its own fresh page that is closed immediately after.

Installation

Puppeteer is listed as a direct dependency of @clyrisai/gitresolve and is downloaded automatically:

npm install @clyrisai/gitresolve

For a global CLI install the bundled Puppeteer may not be on the path. If BROWSER_PROVIDER=puppeteer gitresolve reports that the provider is unavailable, install Puppeteer globally alongside it:

npm install -g puppeteer

Set BROWSER_PROVIDER=puppeteer in your shell profile to make Puppeteer the default for every run without touching your code.

export BROWSER_PROVIDER=puppeteer

Usage

Direct instantiation

Always call provider.cleanup() in a finally block so the browser process is terminated even if an error occurs:

import { PuppeteerProvider } from '@clyrisai/gitresolve';

const provider = new PuppeteerProvider();

try {
  const html = await provider.getPageContent('https://github.com/torvalds');
  console.log(html.slice(0, 500));
} finally {
  await provider.cleanup(); // closes the Chromium process
}

Via the factory

import { createProvider } from '@clyrisai/gitresolve';

const provider = await createProvider('puppeteer');

try {
  const html = await provider.getPageContent('https://gitlab.com/someone');
} finally {
  await provider.cleanup();
}

CLI

To use PuppeteerProvider from the command line, set the BROWSER_PROVIDER environment variable:

BROWSER_PROVIDER=puppeteer gitresolve resolve resume.pdf

How it works

Lazy browser launch

The first call to getPageContent triggers ensureBrowser(), which imports Puppeteer dynamically and launches Chromium with the flags --no-sandbox and --disable-setuid-sandbox. Subsequent calls reuse the same browser instance.

New page per URL

For every getPageContent call, a fresh browser page (browser.newPage()) is created. This prevents cookies, local storage, and cached state from leaking between requests.

Navigation and content extraction

Puppeteer navigates to the URL using page.goto(url, { waitUntil, timeout }) and then calls page.content() to retrieve the fully rendered HTML — including all content injected by JavaScript.

Page teardown

The page is closed in a finally block after each call, regardless of whether navigation succeeded or threw an error.

Browser cleanup

Calling provider.cleanup() closes the Chromium process and sets the internal reference to null. After cleanup, the next getPageContent call will re-launch the browser automatically.

Launch arguments

Chromium is always started with the following flags:

Flag	Purpose
`--no-sandbox`	Required in Docker containers and most CI environments where the kernel sandbox is restricted
`--disable-setuid-sandbox`	Disables the setuid sandbox as a complementary measure for the same environments

Options

Option	Type	Default	Description
`timeout`	`number`	`30000`	Maximum milliseconds to wait for the page to reach the `waitUntil` state before throwing a timeout error.
`waitUntil`	`string`	`'networkidle2'`	Defines when navigation is considered complete. See the table below.

`waitUntil` values

Value	When navigation completes
`'load'`	The `load` event fires — all synchronous resources (scripts, stylesheets, images) are loaded. Fastest, but may miss late-rendered content.
`'domcontentloaded'`	The `DOMContentLoaded` event fires — the DOM is parsed but external resources may still be loading.
`'networkidle0'`	No more than 0 network connections for at least 500 ms. Safest for heavily async apps, but slowest.
`'networkidle2'`	No more than 2 network connections for at least 500 ms. Good balance of completeness and speed — this is the default.

const html = await provider.getPageContent('https://example.com/spa', {
  waitUntil: 'networkidle0', // wait for all async requests to finish
  timeout: 45000,
});

Always call provider.cleanup() in a finally block. If your process exits without calling it, the Chromium subprocess will be left running in the background. Over time this can exhaust system memory and file descriptors in long-lived services.

Get Started

CLI Guide

Browser Providers

Concepts

Puppeteer Provider — JavaScript Rendering | GitResolve

Installation

Usage

Direct instantiation

Via the factory

CLI

How it works

Launch arguments

Options

`waitUntil` values

Build docs developers (and LLMs) love

Get Started

CLI Guide

Browser Providers

Concepts

Documentation Index

​Installation

​Usage

​Direct instantiation

​Via the factory

​CLI

​How it works

​Launch arguments

​Options

​waitUntil values

Build docs developers (and LLMs) love

Installation

Usage

Direct instantiation

Via the factory

CLI

How it works

Launch arguments

Options

`waitUntil` values