Overview
The Page Ripper endpoint captures any public web page using headless Chrome and returns a downloadable ZIP archive containing:
Self-contained HTML (via SingleFile )
Categorized assets (CSS, JavaScript, images, fonts, media)
The entire capture happens synchronously within a single HTTP request (no background jobs).
This endpoint enforces SSRF protection at multiple layers: hostname validation, DNS resolution checks, and post-redirect validation. Requests to private/internal addresses are blocked.
Endpoint
Authentication
Requires a Supabase access token:
Authorization : Bearer <supabase_access_token>
Request Body
Full HTTP/HTTPS URL of the page to capture. Restrictions:
Must use http:// or https:// protocol
Cannot resolve to private/internal IP addresses
Cannot be localhost or reserved IP ranges (127.0.0.0/8, 10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16, 169.254.0.0/16, etc.)
Example Request
{
"url" : "https://example.com/landing-page"
}
Response
Success (200 OK)
Returns a ZIP file download:
HTTP / 1.1 200 OK
Content-Type : application/zip
Content-Disposition : attachment; filename="example.com-2026-03-02T14-30-15.zip"
Content-Length : 4567890
[ ZIP binary data ]
ZIP Archive Structure
example.com-2026-03-02T14-30-15.zip
├── page.html # Self-contained HTML (SingleFile output)
└── assets/
├── css/ # Stylesheets
│ ├── main.css
│ └── styles_1.css
├── js/ # JavaScript files
│ ├── app.js
│ └── vendor.js
├── images/ # PNG, JPG, SVG, WebP, AVIF, ICO, GIF
│ ├── logo.png
│ └── hero.jpg
├── fonts/ # WOFF, WOFF2, TTF, OTF, EOT
│ └── font.woff2
└── media/ # MP4, WebM, MP3, OGG, WAV
└── video.mp4
HTML documents are not duplicated in the assets folder — page.html at the root is the only HTML file. The SingleFile library inlines critical resources directly into this HTML.
Error Codes
Status Condition Response Body 400Missing url field {"error": "Missing required field: url"}400Invalid URL format {"error": "Invalid URL."}400Non-HTTP/HTTPS protocol {"error": "Invalid URL. Must use http or https protocol."}400Private/internal address {"error": "Requests to private/internal addresses are not allowed."}400DNS rebinding detected {"error": "Requests to private/internal addresses are not allowed."}401Missing bearer token {"error": "Missing Authorization bearer token."}401Invalid/expired token {"error": "Invalid or expired session."}405Non-POST method {"error": "Method not allowed."}429Rate limit exceeded {"error": "Rate limit exceeded. Maximum 10 page captures per 15 minutes."}500Capture failed {"error": "Page capture failed: <details>"}504Timeout {"error": "Page capture timed out."}
Rate Limiting
Each authenticated user is limited to:
10 captures per 15-minute window
Tracked via page_rip_log table keyed by user_id
Rate-limited responses include a Retry-After header (seconds until reset).
Rate Limit Response
HTTP / 1.1 429 Too Many Requests
Retry-After : 900
{
"error" : "Rate limit exceeded. Maximum 10 page captures per 15 minutes."
}
SSRF Protection
The endpoint implements defense-in-depth SSRF protection:
1. Hostname Validation
Rejects obviously private hostnames (source:api/download-page.js:104-107):
Private Hostname Patterns
const PRIVATE_IP_PATTERNS = [
/ ^ 127 \. / , // Loopback (127.0.0.0/8)
/ ^ 10 \. / , // Private class A (10.0.0.0/8)
/ ^ 172 \. ( 1 [ 6-9 ] | 2 \d | 3 [ 01 ] ) \. / , // Private class B (172.16.0.0/12)
/ ^ 192 \. 168 \. / , // Private class C (192.168.0.0/16)
/ ^ 169 \. 254 \. / , // Link-local (169.254.0.0/16)
/ ^ 0 \. 0 \. 0 \. 0 $ / , // Unspecified
/ ^ ::1 $ / , // IPv6 loopback
/ ^ ::ffff: ( 127 \. | 10 \. | ... ) / , // IPv4-mapped IPv6 private ranges
/ ^ fc00:/ i , // IPv6 unique local (fc00::/7)
/ ^ fd00:/ i ,
/ ^ fe80:/ i , // IPv6 link-local
/ ^ localhost $ / i
]
2. DNS Resolution Check
Resolves hostname via DNS and validates all returned IPs (source:api/download-page.js:114-134):
async function assertPublicDns ( hostname ) {
const addresses = await dns . resolve4 ( hostname ). catch (() => [])
const addresses6 = await dns . resolve6 ( hostname ). catch (() => [])
const allAddresses = [ ... addresses , ... addresses6 ]
for ( const ip of allAddresses ) {
if ( PRIVATE_IP_PATTERNS . some ( pattern => pattern . test ( ip ))) {
throw new Error ( `DNS for ${ hostname } resolved to private address ${ ip } ` )
}
}
}
This prevents DNS rebinding attacks where a public hostname resolves to an internal IP.
3. Post-Redirect Validation
After Puppeteer navigation, the final URL (post-redirects) is re-validated (source:api/download-page.js:451-460):
const finalUrl = new URL ( page . url ())
if ( isPrivateHostname ( finalUrl . hostname )) {
throw new Error ( 'Redirect to private/internal address detected.' )
}
await assertPublicDns ( finalUrl . hostname )
Resource Limits
To prevent memory exhaustion:
Limit Value Behavior When Exceeded Max total size 100 MB Stop capturing additional resources (source:api/download-page.js:46) Max resource count 500 items Ignore additional resources (source:api/download-page.js:49)
These limits apply to captured network resources only. The SingleFile HTML can be larger as it’s generated separately.
Timeouts
Timeout Value Purpose Navigation timeout 60 seconds Puppeteer page load (source:api/download-page.js:13) Hard timeout 110 seconds Total request duration (source:api/download-page.js:12) Auto-scroll timeout 15 seconds Lazy-load trigger (source:api/download-page.js:332) Network idle after scroll 2 seconds Wait for lazy resources (source:api/download-page.js:16)
The 110-second hard timeout is just under Vercel’s 120-second function limit. Long-running captures will return 504 Gateway Timeout.
Examples
Basic Capture
const token = session . access_token // From Supabase auth
const response = await fetch ( '/api/download-page' , {
method: 'POST' ,
headers: {
'Authorization' : `Bearer ${ token } ` ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({
url: 'https://example.com/landing-page'
})
})
if ( ! response . ok ) {
const error = await response . json ()
throw new Error ( error . error )
}
const blob = await response . blob ()
const url = URL . createObjectURL ( blob )
// Trigger download
const a = document . createElement ( 'a' )
a . href = url
a . download = 'captured-page.zip'
a . click ()
Error Handling
Comprehensive Error Handling
async function capturePage ( url ) {
const { data : { session } } = await supabase . auth . getSession ()
if ( ! session ) {
throw new Error ( 'Not authenticated' )
}
const response = await fetch ( '/api/download-page' , {
method: 'POST' ,
headers: {
'Authorization' : `Bearer ${ session . access_token } ` ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({ url })
})
if ( ! response . ok ) {
const error = await response . json ()
if ( response . status === 429 ) {
const retryAfter = response . headers . get ( 'Retry-After' )
throw new Error ( `Rate limit exceeded. Try again in ${ retryAfter } seconds.` )
}
if ( response . status === 400 && error . error . includes ( 'private' )) {
throw new Error ( 'Cannot capture internal/private URLs' )
}
if ( response . status === 504 ) {
throw new Error ( 'Page capture timed out. Try a simpler page.' )
}
throw new Error ( error . error )
}
return response . blob ()
}
React Hook Example
import { useState } from 'react'
import { useSupabaseClient } from '@/hooks/useSupabase'
export function usePageCapture () {
const [ loading , setLoading ] = useState ( false )
const [ error , setError ] = useState ( null )
const supabase = useSupabaseClient ()
const capture = async ( url ) => {
setLoading ( true )
setError ( null )
try {
const { data : { session } } = await supabase . auth . getSession ()
const response = await fetch ( '/api/download-page' , {
method: 'POST' ,
headers: {
'Authorization' : `Bearer ${ session . access_token } ` ,
'Content-Type' : 'application/json'
},
body: JSON . stringify ({ url })
})
if ( ! response . ok ) {
const err = await response . json ()
throw new Error ( err . error )
}
const blob = await response . blob ()
const downloadUrl = URL . createObjectURL ( blob )
// Extract filename from Content-Disposition header
const disposition = response . headers . get ( 'Content-Disposition' )
const filename = disposition ?. match ( /filename=" ( . + ) "/ )?.[ 1 ] || 'page.zip'
// Trigger download
const a = document . createElement ( 'a' )
a . href = downloadUrl
a . download = filename
a . click ()
URL . revokeObjectURL ( downloadUrl )
return { success: true }
} catch ( err ) {
setError ( err . message )
return { success: false , error: err . message }
} finally {
setLoading ( false )
}
}
return { capture , loading , error }
}
Implementation Details
Browser Engine
Puppeteer Core with @sparticuz/chromium (optimized for Vercel/serverless)
Headless Chrome with disabled sandboxing for serverless environments
Viewport: 1280x800 (source:api/download-page.js:408)
Page Capture Process
Launch browser (different executable paths for dev/production)
Navigate to target URL with networkidle0 wait condition
SSRF re-check on final URL after redirects
Auto-scroll to trigger lazy-loaded content (300px steps with 100ms pauses)
Network idle wait (2 seconds after scroll completes)
SingleFile capture — inlines critical CSS/fonts/images into HTML
Close browser immediately after HTML capture
Build ZIP with captured resources organized by type
Resource Classification
Assets are categorized by MIME type and file extension (source:api/download-page.js:149-179):
const MIME_FOLDER_MAP = [
{ test : ct => ct . startsWith ( 'text/css' ), folder: 'css' },
{ test : ct => ct . includes ( 'javascript' ), folder: 'js' },
{ test : ct => ct . startsWith ( 'image/' ), folder: 'images' },
{ test : ct => ct . includes ( 'font' ), folder: 'fonts' },
{ test : ct => ct . startsWith ( 'video/' ) || ct . startsWith ( 'audio/' ), folder: 'media' }
]
HTML documents are skipped (already captured by SingleFile).
User-Agent
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/125.0.0.0 Safari/537.36
Best Practices
Validate URLs client-side Reject private IPs and invalid protocols before sending requests to save quota
Implement retry logic Handle 504 timeouts with exponential backoff for large pages
Monitor rate limits Track remaining captures and show warnings before hitting limits
Provide feedback Captures can take 30-60+ seconds — show progress indicators to users