Spinney Public Methods Reference — Complete API Guide

Beyond subscribe(), the Spinney class exposes several public utility methods. Some — getURL, isApproved, getApproved, and isMatch — are used internally during the crawl but are part of the public interface and can be called directly for testing, custom filtering logic, or introspection. Others — pause and resume — directly control the crawl’s state machine by toggling the isProcessing flag that gates the internal batch loop.

`pause()`

spinney.pause(): void

Sets isProcessing to false, which prevents the internal _setUp batch loop from scheduling any further URL processing. Called automatically when the crawl completes (subscriber.complete() has been called) or when a fatal error causes the Observable to terminate. You can also call it manually to temporarily halt crawling.

// Pause the scraper temporarily
spinney.pause();

`resume()`

spinney.resume(): void

Sets isProcessing to true, allowing the batch loop to begin or continue processing URL batches. This is called automatically by setUp() immediately after the robots.txt response is received and parsed. Calling resume() after a fatal error will set the flag but will not restart a terminated Observable — you must construct a new Spinney instance to begin a fresh crawl.

`getURL()`

spinney.getURL(pathname: string): string

Resolves a path or URL string to an absolute URL relative to the scraper’s base site. The resolution rules are:

If pathname starts with // (two forward slashes), the first / is stripped with pathname.slice(1) and the result is set as the pathname of a new URL built from the base site.
If pathname starts with / (single forward slash), it is set directly as the pathname of a new URL built from the base site.
If pathname does not start with / (e.g. an absolute URL like https://other.com), it is returned unchanged.

Parameters

pathname

string

required

A root-relative path (e.g. /about), double-slash path (e.g. //collections), or absolute URL (e.g. https://other.com).

Returns

string — a fully qualified URL string.

Throws

Throws a TypeError with the message 'pathname is not type string' if pathname is not a string.

Examples

const spinney = new Spinney('https://www.example.com/');

spinney.getURL('/path');              // => 'https://www.example.com/path'
spinney.getURL('https://other.com'); // => 'https://other.com'

`isMatch()`

spinney.isMatch(testPathname: string, basePathname: string): boolean

Tests whether basePathname (a full URL string) matches the pattern defined by testPathname (a disallow path entry from robots.txt). First checks RegExps.ForwardSlashWord.test(testPathname) — if that returns false, isMatch returns false immediately. If it passes, the method finds the index of '/' in testPathname. If the index is found (i.e. Not(index === -1) is true), it slices from that index to get the pathname portion and builds a RegExp via RegExps.getHostnameAndPathname(hostname, pathname). Otherwise it builds the RegExp from testPathname directly. The RegExp is then tested against basePathname.

Parameters

testPathname

string

required

A robots.txt-style disallow path, such as /private or /*/collections/name. Patterns are taken directly from the robots.txt Disallow: lines.

basePathname

string

required

A full absolute URL to test against the pattern, e.g. 'https://www.example.com/dontdoit/collections/name'.

Returns

boolean — true if basePathname matches the pattern built from testPathname and the scraper’s hostname. Returns false if testPathname contains no path segment matching RegExps.ForwardSlashWord.

Examples

const spinney = new Spinney('https://www.example.com/');

spinney.isMatch(
  '/*/collections/name',
  'https://www.example.com/dontdoit/collections/name'
); // => true

spinney.isMatch(
  '/*/collections/name',
  'https://www.example.com/dontdoit/collections'
); // => false

`isApproved()`

spinney.isApproved(site: string): boolean

Returns true if site passes all three approval checks:

It is a syntactically valid URL (tested against RegExps.getURL()).
Its hostname or origin starts with the base site’s hostname or origin — i.e. it belongs to the same domain.
It has not been visited before (not in seen) and is not forbidden by any robots.txt Disallow rule (checked via isForbidden()). As a side effect, an approved URL is immediately added to the seen Set to prevent re-queuing.

Used internally to filter candidate URLs before adding them to the crawl batch queue.

Parameters

site

string

required

An absolute URL string to evaluate, e.g. 'https://www.example.com/about'.

Returns

boolean — true if the URL is valid, on the same domain, unseen, and not forbidden.

`getApproved()`

spinney.getApproved(hrefs: string[]): string[]

Accepts an array of raw href attribute values collected from a single HTML page, runs each one through getURL() to resolve relative paths to absolute URLs, then filters the results through isApproved(). The returned array contains only the URLs that are valid, same-domain, unseen, and not robots.txt-forbidden. This is the primary mechanism by which Spinney builds the next batch of URLs to crawl.

Parameters

hrefs

string[]

required

Raw href attribute values collected from an HTML page, e.g. ['/about', 'https://example.com/blog', '#anchor', 'https://external.com'].

Returns

string[] — filtered array of approved, fully-qualified absolute URLs ready to be added to the crawl queue.

`toArray()`

spinney.toArray(data: any): any[]

Returns data wrapped in an array. If data is already an array it is returned as-is; otherwise it is wrapped in [data]. Used internally to normalise values before array operations.

Parameters

data

any

required

Any value. Arrays are passed through; all other values are wrapped.

Returns

any[]

`isArrayEmpty()`

spinney.isArrayEmpty(data: any): boolean

Returns true if data is not an array, or if it is an array with a length of 0. Implemented as Not(Array.isArray(data)) || data.length === 0, where Not(condition) returns true when condition is strictly false. Used internally as the base case for the recursive _setUp batch loop — when the pending sites array is empty, the crawl completes and subscriber.complete() is called.

Parameters

data

any

required

The value to check. Typically the current batch of pending URLs.

Returns

boolean — true if data is not an array or is an empty array.

`setForbidden()`

spinney.setForbidden({ forbidden }: { forbidden: Set<string> }): void

Assigns a new Set<string> of disallowed path patterns to the instance’s forbidden field, replacing any previously stored set. Called automatically by setUp() with the result of parsing the site’s robots.txt via ParseText. Can also be called directly if you need to inject a custom forbidden set before or between crawls.

Parameters

forbidden

Set<string>

required

A Set of path strings matching robots.txt Disallow: entries, e.g. new Set(['/private', '/admin']).

Returns

void

Get Started

Guides

API Reference

Spinney Public Methods Reference — Complete API Guide

`pause()`

`resume()`

`getURL()`

Parameters

Returns

Throws

Examples

`isMatch()`

Parameters

Returns

Examples

`isApproved()`

Parameters

Returns

`getApproved()`

Parameters

Returns

`toArray()`

Parameters

Returns

`isArrayEmpty()`

Parameters

Returns

`setForbidden()`

Parameters

Returns

Build docs developers (and LLMs) love

Get Started

Guides

API Reference

Documentation Index

​pause()

​resume()

​getURL()

​Parameters

​Returns

​Throws

​Examples

​isMatch()

​Parameters

​Returns

​Examples

​isApproved()

​Parameters

​Returns

​getApproved()

​Parameters

​Returns

​toArray()

​Parameters

​Returns

​isArrayEmpty()

​Parameters

​Returns

​setForbidden()

​Parameters

​Returns

Build docs developers (and LLMs) love

`pause()`

`resume()`

`getURL()`

Parameters

Returns

Throws

Examples

`isMatch()`

Parameters

Returns

Examples

`isApproved()`

Parameters

Returns

`getApproved()`

Parameters

Returns

`toArray()`

Parameters

Returns

`isArrayEmpty()`

Parameters

Returns

`setForbidden()`

Parameters

Returns