POST /api/download-page

Overview

The Page Ripper endpoint captures any public web page using headless Chrome and returns a downloadable ZIP archive containing:

Self-contained HTML (via SingleFile)
Categorized assets (CSS, JavaScript, images, fonts, media)

The entire capture happens synchronously within a single HTTP request (no background jobs).

This endpoint enforces SSRF protection at multiple layers: hostname validation, DNS resolution checks, and post-redirect validation. Requests to private/internal addresses are blocked.

Endpoint

POST /api/download-page

Authentication

Requires a Supabase access token:

Authorization: Bearer <supabase_access_token>

Request Body

url

string

required

Full HTTP/HTTPS URL of the page to capture.Restrictions:

Must use http:// or https:// protocol
Cannot resolve to private/internal IP addresses
Cannot be localhost or reserved IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16, etc.)

Example Request

{
  "url": "https://example.com/landing-page"
}

Response

Success (200 OK)

Returns a ZIP file download:

HTTP/1.1 200 OK
Content-Type: application/zip
Content-Disposition: attachment; filename="example.com-2026-03-02T14-30-15.zip"
Content-Length: 4567890

[ZIP binary data]

ZIP Archive Structure

example.com-2026-03-02T14-30-15.zip
├── page.html                    # Self-contained HTML (SingleFile output)
└── assets/
    ├── css/                     # Stylesheets
    │   ├── main.css
    │   └── styles_1.css
    ├── js/                      # JavaScript files
    │   ├── app.js
    │   └── vendor.js
    ├── images/                  # PNG, JPG, SVG, WebP, AVIF, ICO, GIF
    │   ├── logo.png
    │   └── hero.jpg
    ├── fonts/                   # WOFF, WOFF2, TTF, OTF, EOT
    │   └── font.woff2
    └── media/                   # MP4, WebM, MP3, OGG, WAV
        └── video.mp4

HTML documents are not duplicated in the assets folder — page.html at the root is the only HTML file. The SingleFile library inlines critical resources directly into this HTML.

Error Codes

Status	Condition	Response Body
`400`	Missing `url` field	`{"error": "Missing required field: url"}`
`400`	Invalid URL format	`{"error": "Invalid URL."}`
`400`	Non-HTTP/HTTPS protocol	`{"error": "Invalid URL. Must use http or https protocol."}`
`400`	Private/internal address	`{"error": "Requests to private/internal addresses are not allowed."}`
`400`	DNS rebinding detected	`{"error": "Requests to private/internal addresses are not allowed."}`
`401`	Missing bearer token	`{"error": "Missing Authorization bearer token."}`
`401`	Invalid/expired token	`{"error": "Invalid or expired session."}`
`405`	Non-POST method	`{"error": "Method not allowed."}`
`429`	Rate limit exceeded	`{"error": "Rate limit exceeded. Maximum 10 page captures per 15 minutes."}`
`500`	Capture failed	`{"error": "Page capture failed: <details>"}`
`504`	Timeout	`{"error": "Page capture timed out."}`

Rate Limiting

Each authenticated user is limited to:

10 captures per 15-minute window
Tracked via page_rip_log table keyed by user_id

Rate-limited responses include a Retry-After header (seconds until reset).

Rate Limit Response

HTTP/1.1 429 Too Many Requests
Retry-After: 900

{
  "error": "Rate limit exceeded. Maximum 10 page captures per 15 minutes."
}

SSRF Protection

The endpoint implements defense-in-depth SSRF protection:

1. Hostname Validation

Rejects obviously private hostnames (source:api/download-page.js:104-107):

Private Hostname Patterns

const PRIVATE_IP_PATTERNS = [
  /^127\./,              // Loopback (127.0.0.0/8)
  /^10\./,               // Private class A (10.0.0.0/8)
  /^172\.(1[6-9]|2\d|3[01])\./, // Private class B (172.16.0.0/12)
  /^192\.168\./,         // Private class C (192.168.0.0/16)
  /^169\.254\./,         // Link-local (169.254.0.0/16)
  /^0\.0\.0\.0$/,        // Unspecified
  /^::1$/,               // IPv6 loopback
  /^::ffff:(127\.|10\.|...)/, // IPv4-mapped IPv6 private ranges
  /^fc00:/i,             // IPv6 unique local (fc00::/7)
  /^fd00:/i,
  /^fe80:/i,             // IPv6 link-local
  /^localhost$/i
]

2. DNS Resolution Check

Resolves hostname via DNS and validates all returned IPs (source:api/download-page.js:114-134):

DNS Validation

async function assertPublicDns(hostname) {
  const addresses = await dns.resolve4(hostname).catch(() => [])
  const addresses6 = await dns.resolve6(hostname).catch(() => [])
  const allAddresses = [...addresses, ...addresses6]
  
  for (const ip of allAddresses) {
    if (PRIVATE_IP_PATTERNS.some(pattern => pattern.test(ip))) {
      throw new Error(`DNS for ${hostname} resolved to private address ${ip}`)
    }
  }
}

This prevents DNS rebinding attacks where a public hostname resolves to an internal IP.

3. Post-Redirect Validation

After Puppeteer navigation, the final URL (post-redirects) is re-validated (source:api/download-page.js:451-460):

Post-Navigation Check

const finalUrl = new URL(page.url())
if (isPrivateHostname(finalUrl.hostname)) {
  throw new Error('Redirect to private/internal address detected.')
}
await assertPublicDns(finalUrl.hostname)

Resource Limits

To prevent memory exhaustion:

Limit	Value	Behavior When Exceeded
Max total size	100 MB	Stop capturing additional resources (source:api/download-page.js:46)
Max resource count	500 items	Ignore additional resources (source:api/download-page.js:49)

These limits apply to captured network resources only. The SingleFile HTML can be larger as it’s generated separately.

Timeouts

Timeout	Value	Purpose
Navigation timeout	60 seconds	Puppeteer page load (source:api/download-page.js:13)
Hard timeout	110 seconds	Total request duration (source:api/download-page.js:12)
Auto-scroll timeout	15 seconds	Lazy-load trigger (source:api/download-page.js:332)
Network idle after scroll	2 seconds	Wait for lazy resources (source:api/download-page.js:16)

The 110-second hard timeout is just under Vercel’s 120-second function limit. Long-running captures will return 504 Gateway Timeout.

Examples

Basic Capture

const token = session.access_token // From Supabase auth

const response = await fetch('/api/download-page', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${token}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    url: 'https://example.com/landing-page'
  })
})

if (!response.ok) {
  const error = await response.json()
  throw new Error(error.error)
}

const blob = await response.blob()
const url = URL.createObjectURL(blob)

// Trigger download
const a = document.createElement('a')
a.href = url
a.download = 'captured-page.zip'
a.click()

Error Handling

Comprehensive Error Handling

async function capturePage(url) {
  const { data: { session } } = await supabase.auth.getSession()
  
  if (!session) {
    throw new Error('Not authenticated')
  }
  
  const response = await fetch('/api/download-page', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${session.access_token}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ url })
  })
  
  if (!response.ok) {
    const error = await response.json()
    
    if (response.status === 429) {
      const retryAfter = response.headers.get('Retry-After')
      throw new Error(`Rate limit exceeded. Try again in ${retryAfter} seconds.`)
    }
    
    if (response.status === 400 && error.error.includes('private')) {
      throw new Error('Cannot capture internal/private URLs')
    }
    
    if (response.status === 504) {
      throw new Error('Page capture timed out. Try a simpler page.')
    }
    
    throw new Error(error.error)
  }
  
  return response.blob()
}

React Hook Example

usePageCapture Hook

import { useState } from 'react'
import { useSupabaseClient } from '@/hooks/useSupabase'

export function usePageCapture() {
  const [loading, setLoading] = useState(false)
  const [error, setError] = useState(null)
  const supabase = useSupabaseClient()
  
  const capture = async (url) => {
    setLoading(true)
    setError(null)
    
    try {
      const { data: { session } } = await supabase.auth.getSession()
      
      const response = await fetch('/api/download-page', {
        method: 'POST',
        headers: {
          'Authorization': `Bearer ${session.access_token}`,
          'Content-Type': 'application/json'
        },
        body: JSON.stringify({ url })
      })
      
      if (!response.ok) {
        const err = await response.json()
        throw new Error(err.error)
      }
      
      const blob = await response.blob()
      const downloadUrl = URL.createObjectURL(blob)
      
      // Extract filename from Content-Disposition header
      const disposition = response.headers.get('Content-Disposition')
      const filename = disposition?.match(/filename="(.+)"/)?.[1] || 'page.zip'
      
      // Trigger download
      const a = document.createElement('a')
      a.href = downloadUrl
      a.download = filename
      a.click()
      
      URL.revokeObjectURL(downloadUrl)
      
      return { success: true }
    } catch (err) {
      setError(err.message)
      return { success: false, error: err.message }
    } finally {
      setLoading(false)
    }
  }
  
  return { capture, loading, error }
}

Implementation Details

Browser Engine

Puppeteer Core with @sparticuz/chromium (optimized for Vercel/serverless)
Headless Chrome with disabled sandboxing for serverless environments
Viewport: 1280x800 (source:api/download-page.js:408)

Page Capture Process

Launch browser (different executable paths for dev/production)
Navigate to target URL with networkidle0 wait condition
SSRF re-check on final URL after redirects
Auto-scroll to trigger lazy-loaded content (300px steps with 100ms pauses)
Network idle wait (2 seconds after scroll completes)
SingleFile capture — inlines critical CSS/fonts/images into HTML
Close browser immediately after HTML capture
Build ZIP with captured resources organized by type

Resource Classification

Assets are categorized by MIME type and file extension (source:api/download-page.js:149-179):

Asset Folder Mapping

const MIME_FOLDER_MAP = [
  { test: ct => ct.startsWith('text/css'), folder: 'css' },
  { test: ct => ct.includes('javascript'), folder: 'js' },
  { test: ct => ct.startsWith('image/'), folder: 'images' },
  { test: ct => ct.includes('font'), folder: 'fonts' },
  { test: ct => ct.startsWith('video/') || ct.startsWith('audio/'), folder: 'media' }
]

HTML documents are skipped (already captured by SingleFile).

User-Agent

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36

Best Practices

Validate URLs client-side

Reject private IPs and invalid protocols before sending requests to save quota

Implement retry logic

Handle 504 timeouts with exponential backoff for large pages

Monitor rate limits

Track remaining captures and show warnings before hitting limits

Provide feedback

Captures can take 30-60+ seconds — show progress indicators to users

Overview — API authentication and error handling
Media Proxy — Proxy individual media assets

Endpoints

Database

POST /api/download-page

Overview

Endpoint

Authentication

Request Body

Example Request

Response

Success (200 OK)

ZIP Archive Structure

Error Codes

Rate Limiting

Rate Limit Response

SSRF Protection

1. Hostname Validation

2. DNS Resolution Check

3. Post-Redirect Validation

Resource Limits

Timeouts

Examples

Basic Capture

Error Handling

React Hook Example

Implementation Details

Browser Engine

Page Capture Process

Resource Classification

User-Agent

Best Practices

Validate URLs client-side

Implement retry logic

Monitor rate limits

Provide feedback

Build docs developers (and LLMs) love

Endpoints

Database

​Overview

​Endpoint

​Authentication

​Request Body

​Example Request

​Response

​Success (200 OK)

​ZIP Archive Structure

​Error Codes

​Rate Limiting

​Rate Limit Response

​SSRF Protection

​1. Hostname Validation

​2. DNS Resolution Check

​3. Post-Redirect Validation

​Resource Limits

​Timeouts

​Examples

​Basic Capture

​Error Handling

​React Hook Example

​Implementation Details

​Browser Engine

​Page Capture Process

​Resource Classification

​User-Agent

​Best Practices

Validate URLs client-side

Implement retry logic

Monitor rate limits

Provide feedback

​Related Endpoints

Build docs developers (and LLMs) love

Overview

Endpoint

Authentication

Request Body

Example Request

Response

Success (200 OK)

ZIP Archive Structure

Error Codes

Rate Limiting

Rate Limit Response

SSRF Protection

1. Hostname Validation

2. DNS Resolution Check

3. Post-Redirect Validation

Resource Limits

Timeouts

Examples

Basic Capture

Error Handling

React Hook Example

Implementation Details

Browser Engine

Page Capture Process

Resource Classification

User-Agent

Best Practices

Related Endpoints