Documentation Index
Fetch the complete documentation index at: https://mintlify.com/cobyeastwood/spinney/llms.txt
Use this file to discover all available pages before exploring further.
Spinney is the main — and only — export of the spinney package. It extends Observable<any> from RxJS, which means any RxJS operator (pipe, take, filter, tap, and so on) or subscription method can be applied to it directly. Internally, the constructor sets up the Axios HTTP instance with responseType: 'stream' locked in, initialises a deduplication seen Set and a forbidden Set (for robots.txt paths), and defines the Observable source function that calls setUp() on first subscribe. setUp() triggers the robots.txt fetch, populates forbidden, then calls resume() to set isProcessing to true and kicks off the recursive batch crawl.
Import
Constructor signature
Parameters
The target URL to crawl. Must be a valid URL that
new URL(site) can parse — an invalid or empty string will throw a TypeError at construction time.Scraping behaviour flags. See the Options type reference for full details. Both fields default to
false if the object is omitted entirely.Passed directly to
axios.create() via Object.assign({ responseType: 'stream' }, config ?? {}). The property responseType: 'stream' is always set by Spinney and cannot be overridden — all other Axios config (timeouts, headers, proxy, etc.) is respected as-is.Throws
The constructor callsnew URL(site) synchronously. If site is not a valid URL — including an empty string — the URL constructor throws a TypeError, which propagates immediately out of new Spinney(...) before any network activity begins.
Internal state initialised
When the constructor runs, it sets up the following private fields that drive the crawl lifecycle:cbs— an object registry that stores htmlparser2 handler callbacks (e.g.onattribute,ontext) passed in throughsubscribe(), kept separate from the RxJS observer callbacks.debug— either a no-op function (whenoptions.debugisfalseor omitted) orconsole.error(error?.message)(whenoptions.debugistrue). Called on non-fatal internal errors.isOveride— boolean derived fromoptions?.overide ?? false. Whentrue, theforbiddenSet is bypassed and all URLs are treated as crawlable.isProcessing— starts asfalse. Flipped totruebyresume()once robots.txt has been fetched and parsed inside thehttpText('/robots.txt').then(...)callback insetUp(), and back tofalsebypause()when the crawl ends or a fatal error occurs.forbidden— aSet<string>of disallowed path patterns parsed from the site’srobots.txt. Initially empty; populated bysetForbidden()once robots.txt has been fetched.seen— aSet<string>of URLs already visited or queued. Used to deduplicate the crawl queue so the same page is never fetched twice.site— the rawsitestring passed to the constructor, stored for use ingetURL().decodeURL— aURLobject constructed fromsite, used to extracthostnameandoriginfor URL matching.axiosInstance— the Axios instance created withaxios.create(Object.assign({ responseType: 'stream' }, config ?? {})).subscriber— the RxJS subscriber object captured inside the Observable source function whensubscribe()is first called. Used internally to emitnext,error, andcompletenotifications.
Example
isProcessing starts as false at construction time. It is set to true by resume() only after the robots.txt response has been received and parsed inside setUp(). The internal batch loop (_setUp) checks isProcessing before processing each URL batch — no crawling occurs until resume() has been called.