Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/clyrisai/gitresolve/llms.txt

Use this file to discover all available pages before exploring further.

PuppeteerProvider drives a real headless Chromium browser through the Puppeteer library. Unlike FetchProvider, it executes JavaScript on every page it visits, which means single-page applications, lazy-loaded content, and client-side-rendered portfolios are all resolved correctly. A single browser process is launched on first use and reused across multiple getPageContent calls; each URL gets its own fresh page that is closed immediately after.

Installation

Puppeteer is listed as a direct dependency of @clyrisai/gitresolve and is downloaded automatically:
npm install @clyrisai/gitresolve
For a global CLI install the bundled Puppeteer may not be on the path. If BROWSER_PROVIDER=puppeteer gitresolve reports that the provider is unavailable, install Puppeteer globally alongside it:
npm install -g puppeteer
Set BROWSER_PROVIDER=puppeteer in your shell profile to make Puppeteer the default for every run without touching your code.
export BROWSER_PROVIDER=puppeteer

Usage

Direct instantiation

Always call provider.cleanup() in a finally block so the browser process is terminated even if an error occurs:
import { PuppeteerProvider } from '@clyrisai/gitresolve';

const provider = new PuppeteerProvider();

try {
  const html = await provider.getPageContent('https://github.com/torvalds');
  console.log(html.slice(0, 500));
} finally {
  await provider.cleanup(); // closes the Chromium process
}

Via the factory

import { createProvider } from '@clyrisai/gitresolve';

const provider = await createProvider('puppeteer');

try {
  const html = await provider.getPageContent('https://gitlab.com/someone');
} finally {
  await provider.cleanup();
}

CLI

To use PuppeteerProvider from the command line, set the BROWSER_PROVIDER environment variable:
BROWSER_PROVIDER=puppeteer gitresolve resolve resume.pdf

How it works

1

Lazy browser launch

The first call to getPageContent triggers ensureBrowser(), which imports Puppeteer dynamically and launches Chromium with the flags --no-sandbox and --disable-setuid-sandbox. Subsequent calls reuse the same browser instance.
2

New page per URL

For every getPageContent call, a fresh browser page (browser.newPage()) is created. This prevents cookies, local storage, and cached state from leaking between requests.
3

Navigation and content extraction

Puppeteer navigates to the URL using page.goto(url, { waitUntil, timeout }) and then calls page.content() to retrieve the fully rendered HTML — including all content injected by JavaScript.
4

Page teardown

The page is closed in a finally block after each call, regardless of whether navigation succeeded or threw an error.
5

Browser cleanup

Calling provider.cleanup() closes the Chromium process and sets the internal reference to null. After cleanup, the next getPageContent call will re-launch the browser automatically.

Launch arguments

Chromium is always started with the following flags:
FlagPurpose
--no-sandboxRequired in Docker containers and most CI environments where the kernel sandbox is restricted
--disable-setuid-sandboxDisables the setuid sandbox as a complementary measure for the same environments

Options

OptionTypeDefaultDescription
timeoutnumber30000Maximum milliseconds to wait for the page to reach the waitUntil state before throwing a timeout error.
waitUntilstring'networkidle2'Defines when navigation is considered complete. See the table below.

waitUntil values

ValueWhen navigation completes
'load'The load event fires — all synchronous resources (scripts, stylesheets, images) are loaded. Fastest, but may miss late-rendered content.
'domcontentloaded'The DOMContentLoaded event fires — the DOM is parsed but external resources may still be loading.
'networkidle0'No more than 0 network connections for at least 500 ms. Safest for heavily async apps, but slowest.
'networkidle2'No more than 2 network connections for at least 500 ms. Good balance of completeness and speed — this is the default.
const html = await provider.getPageContent('https://example.com/spa', {
  waitUntil: 'networkidle0', // wait for all async requests to finish
  timeout: 45000,
});
Always call provider.cleanup() in a finally block. If your process exits without calling it, the Chromium subprocess will be left running in the background. Over time this can exhaust system memory and file descriptors in long-lived services.

Build docs developers (and LLMs) love