Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/clyrisai/gitresolve/llms.txt

Use this file to discover all available pages before exploring further.

GitResolve separates page-fetching concerns from link extraction through the BrowserProvider interface. This lets you choose between a lightweight plain-fetch approach and a full headless browser without changing any other code. The createProvider factory handles automatic detection so most applications never need to instantiate a provider class directly.

createProvider

Creates and returns the best available BrowserProvider. Providers are tested for availability before being returned, so the result is always usable.
async function createProvider(preferred?: ProviderName): Promise<BrowserProvider>

Parameters

preferred
ProviderName
Optional explicit provider name. When specified, createProvider attempts to use only that provider. If the requested provider is unavailable, the function throws rather than silently falling back.

Returns

Promise<BrowserProvider> — a ready-to-use provider instance.

Resolution order

  1. preferred argument — if supplied, that provider is attempted. Throws Error("Requested provider '${preferred}' is not available") if unavailable.
  2. BROWSER_PROVIDER environment variable — if set to a valid ProviderName, behaves as if preferred was passed (including the throw-on-unavailable behaviour).
  3. Automatic fallback chainpuppeteerbrowserlessfetch. The first provider that reports isAvailable() === true is returned.
FetchProvider.isAvailable() always returns true (built into Node.js 18+), so the automatic fallback chain never exhausts all options. If neither Puppeteer nor a Browserless instance is reachable, FetchProvider is returned as the ultimate fallback.

Examples

import { createProvider, scrapePortfolio } from '@clyrisai/gitresolve';

// Automatically picks the best available provider
const provider = await createProvider();

try {
  const result = await scrapePortfolio('https://janedoe.dev', provider);
  console.log(result.ownerProfile?.username);
  console.log(result.warnings[0]); // which provider was used
} finally {
  await provider.cleanup();
}

ProviderName type

type ProviderName = 'puppeteer' | 'browserless' | 'fetch';

BrowserProvider interface

The contract that all three provider classes — and any custom provider — must implement.
interface BrowserProvider {
  readonly name: string;
  getPageContent(url: string, options?: BrowserProviderOptions): Promise<string>;
  isAvailable(): Promise<boolean>;
  cleanup(): Promise<void>;
}
name
string (readonly)
Human-readable identifier for the provider. Used in scrapePortfolio warning messages. Values for the built-in providers: 'fetch', 'puppeteer', 'browserless'.
getPageContent
(url: string, options?: BrowserProviderOptions) => Promise<string>
Fetches a URL and returns the fully rendered HTML as a string. For FetchProvider this is the raw server response. For PuppeteerProvider and BrowserlessProvider this is the post-JavaScript-execution DOM serialisation.
isAvailable
() => Promise<boolean>
Returns true if the provider can be used in the current environment. FetchProvider always returns true. PuppeteerProvider attempts a dynamic import of puppeteer. BrowserlessProvider attempts a GET /json/version health check with a 3-second timeout.
cleanup
() => Promise<void>
Releases any held resources. For PuppeteerProvider this closes the managed browser instance. For FetchProvider and BrowserlessProvider this is a no-op. Always call cleanup() in a finally block.

BrowserProviderOptions

Options accepted by getPageContent to control navigation behaviour.
interface BrowserProviderOptions {
  timeout?: number;
  waitUntil?: 'load' | 'domcontentloaded' | 'networkidle0' | 'networkidle2';
}
timeout
number
Navigation timeout in milliseconds. Defaults differ by provider:
ProviderDefault
FetchProvider15000 ms
PuppeteerProvider30000 ms
BrowserlessProvider30000 ms
waitUntil
'load' | 'domcontentloaded' | 'networkidle0' | 'networkidle2'
When to consider the navigation complete. Applies only to PuppeteerProvider and BrowserlessProviderFetchProvider ignores this option since fetch has no page lifecycle events.
ValueMeaning
'load'Wait for the load event to fire
'domcontentloaded'Wait for DOMContentLoaded — faster but JS may not have run
'networkidle0'Wait until there are zero in-flight network requests for 500 ms
'networkidle2'Wait until there are ≤ 2 in-flight network requests for 500 ms (default)

Provider classes

FetchProvider

Uses Node.js built-in fetch to download HTML. No extra dependencies, no browser process. Works on any static site or server-rendered page. Does not execute JavaScript.
import { FetchProvider } from '@clyrisai/gitresolve';

const provider = new FetchProvider();
// name === 'fetch'
// isAvailable() always resolves true
// cleanup() is a no-op
  • Best for: Static portfolio sites, GitHub Pages sites, server-rendered Rails/Django/Next.js apps with SSR.
  • Not suitable for: SPAs that render links via React, Vue, Angular, or similar client-side routing.
  • Sends a realistic User-Agent header (Mozilla/5.0 (compatible; ClyrisBot/1.0)) to avoid bot-blocking on common static hosts.

PuppeteerProvider

Launches a headless Chromium browser via Puppeteer. One browser instance is reused across multiple getPageContent calls within the same provider instance.
import { PuppeteerProvider } from '@clyrisai/gitresolve';

const provider = new PuppeteerProvider();
// name === 'puppeteer'
// Requires: npm install puppeteer
puppeteer is a peer dependency and is not installed automatically. Run npm install puppeteer to enable this provider. PuppeteerProvider.isAvailable() returns false when puppeteer cannot be imported.
  • Launches Chrome with --no-sandbox --disable-setuid-sandbox flags (required for most CI/container environments).
  • Each page is opened in a new tab and closed after getPageContent returns.
  • Call provider.cleanup() to close the browser and free resources.

BrowserlessProvider

Uses the Browserless /content REST endpoint for full JS rendering without managing a local browser process. Ideal for serverless environments and autoscaled pipelines.
import { BrowserlessProvider } from '@clyrisai/gitresolve';

// Uses BROWSERLESS_URL env var, or defaults to http://localhost:3000
const provider = new BrowserlessProvider();

// Or pass an explicit base URL
const provider2 = new BrowserlessProvider('https://chrome.browserless.io');
// name === 'browserless'
The base URL is resolved in this order:
  1. Constructor argument baseUrl
  2. BROWSERLESS_URL environment variable
  3. Default: http://localhost:3000
isAvailable() performs a GET {baseUrl}/json/version health check with a 3-second timeout. cleanup() is a no-op since the provider is stateless REST.

Custom provider

You can implement your own provider by satisfying the BrowserProvider interface. This is useful for injecting a mock in tests or integrating an alternative rendering service.
import type { BrowserProvider, BrowserProviderOptions } from '@clyrisai/gitresolve';

class PlaywrightProvider implements BrowserProvider {
  readonly name = 'playwright';

  private browser: import('playwright').Browser | null = null;

  async getPageContent(url: string, options?: BrowserProviderOptions): Promise<string> {
    const { chromium } = await import('playwright');
    this.browser ??= await chromium.launch({ headless: true });

    const page = await this.browser.newPage();
    try {
      await page.goto(url, {
        waitUntil: options?.waitUntil ?? 'networkidle',
        timeout: options?.timeout ?? 30000,
      });
      return page.content();
    } finally {
      await page.close();
    }
  }

  async isAvailable(): Promise<boolean> {
    try {
      await import('playwright');
      return true;
    } catch {
      return false;
    }
  }

  async cleanup(): Promise<void> {
    await this.browser?.close();
    this.browser = null;
  }
}

// Use directly with scrapePortfolio
import { scrapePortfolio } from '@clyrisai/gitresolve';

const provider = new PlaywrightProvider();
try {
  const result = await scrapePortfolio('https://janedoe.dev', provider);
} finally {
  await provider.cleanup();
}

Build docs developers (and LLMs) love