TheDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/cobyeastwood/spinney/llms.txt
Use this file to discover all available pages before exploring further.
Spinney constructor accepts up to three arguments: the target URL string, an optional Options object that controls scraping behavior, and an optional AxiosRequestConfig object that is passed directly to axios.create(). The Axios config is merged with { responseType: 'stream' } as the base — your config is applied on top, so any key you provide, including responseType, will override the default. In practice you should leave responseType as 'stream' because Spinney pipes response data incrementally; overriding it will break parsing.
The Options type
The Options type is defined as { overide?: boolean; debug?: boolean }. Both fields are optional and default to false.
Skip robots.txt Disallow rule enforcement. When
true, _isForbidden() returns true immediately for every URL, meaning all paths on the origin domain are eligible for crawling regardless of what robots.txt says. See robots.txt for details.When
true, errors are logged to stderr via console.error(error?.message). When false (the default), the internal error handler is a noop and errors are silently swallowed in catch blocks before being forwarded to the subscriber. Enabling debug is useful during development for diagnosing network failures and parse errors without adding custom error handlers everywhere.Axios configuration
The third constructor argument is any validAxiosRequestConfig. Spinney calls axios.create(Object.assign({ responseType: 'stream' }, config ?? {})), so your config values are applied on top of the stream default. The resulting axiosInstance is used for every request Spinney makes — robots.txt, sitemaps, and all HTML pages.
AxiosRequestConfig is valid here — auth, proxy, httpsAgent, maxRedirects, and so on. Consult the Axios documentation for the full list.
Automatic retry behavior
httpXMLOrDocument() includes built-in retry logic for transient HTTP errors. The behavior depends on the response status code:
- HTTP 404 — resolved immediately with no data; the URL is silently skipped with no retries.
- Any other HTTP error (5xx, 429, connection reset, etc.) — the request is retried. On each retry attempt the timeout is increased by
(retries * 1000) / 4milliseconds:
MAX_RETRIES(5) exhausted — throwsError('retries reached maximum' + retries), which is caught by the outer try/catch, forwarded tosubscriber.error(), andpause()is called to stop the crawl.
Once the retry limit is reached the error is emitted to your subscriber’s
error handler and the crawl stops. See Error Handling for how to handle that case and restart the crawl if needed.Batched concurrency
_setUp() processes URLs in batches of four using Promise.all(promises.splice(0, 4)):
_setUp() as the next batch. This keeps peak concurrency bounded at four in-flight requests without serializing the crawl into a single queue. There is currently no configuration option to change the batch size.