Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Amaculus/screaming-frog-api/llms.txt
Use this file to discover all available pages before exploring further.
Class constructors
All constructors are class methods onCrawl. Use Crawl.load() for auto-detected loading, or the named constructors for explicit control.
Crawl.load()
Auto-detect and load a crawl from any supported source.
Path to the crawl source. Accepts a directory, file path, or DB crawl UUID. Auto-detection is based on path suffix and directory contents.
Force a specific loader. One of
"auto", "exports", "csv", "duckdb", "sqlite", "db", "derby", "dbseospider", "seospider", "db_id".Backend to use when loading
.seospider files. One of "duckdb", "derby", "csv".Backend to use when loading by DB crawl ID. One of
"duckdb", "derby", "csv".Backend to use when loading
.dbseospider files. One of "duckdb", "derby".Path for the DuckDB analytics cache. Defaults to a sibling file next to the source.
Namespace to use within a multi-crawl DuckDB file.
Tabs to materialize into the DuckDB cache. Pass
"all" to materialize every mapped tab.Cache refresh strategy.
"auto" rebuilds only when the Derby source changed. Also accepts "replace" or "skip".Whether to create a
.dbseospider sidecar file when loading .seospider crawls.Enable automatic CSV export fallback for Derby-backed crawls when a tab or column is missing.
Tabs to export when using CLI-backed loaders.
Named export profile. Use
"kitchen_sink" for the bundled full-tab profile.Crawl
Crawl.from_exports()
Load from a directory of CSV export files.
Path to the directory containing exported
.csv files.Crawl
Crawl.from_database()
Load from a SQLite database file (legacy backend, limited tab support).
Path to the SQLite
.db or .sqlite file.Crawl
Crawl.from_duckdb()
Load from a DuckDB analytics cache file.
Path to the
.duckdb file.Namespace to read within a multi-crawl DuckDB file.
Crawl
Crawl.duckdb_namespaces()
List all crawl namespaces stored in a DuckDB file.
Path to the
.duckdb file.list[str]
Crawl.from_derby()
Load directly from a Derby (.dbseospider) database.
Path to the Derby database directory or
.dbseospider archive.Analysis backend.
"duckdb" (default) promotes to a DuckDB analytics cache; "derby" queries Derby directly.Path for the DuckDB cache. Defaults to a sibling file next to the source.
Namespace for the DuckDB cache.
Raw Derby tables to export into DuckDB.
Mapped tabs to materialize into DuckDB. Use
"all" for every available tab.Cache refresh strategy.
"auto", "replace", or "skip".Fall back to CLI CSV exports for tabs or columns unavailable in Derby.
Crawl
Crawl.from_seospider()
Load from a Screaming Frog .seospider crawl file. Runs the Screaming Frog CLI internally.
Path to the
.seospider file.Backend to use. One of
"duckdb", "derby", "csv".Create a
.dbseospider sidecar archive next to the source crawl.Overwrite an existing
.dbseospider sidecar.Temporarily set
storage.mode=DB in spider.config before loading.Tabs to export when using the CSV backend.
Named export profile (e.g.
"kitchen_sink").Crawl
Crawl.from_db_id()
Load a DB-mode crawl by its UUID from the local ProjectInstanceData directory.
The UUID of the DB-mode crawl folder inside
ProjectInstanceData.Backend to use. One of
"duckdb", "derby", "csv".Override the
ProjectInstanceData root directory. Defaults to the standard Screaming Frog data path.Crawl
Views and queries
crawl.internal
Sitewide internal page view. Returns an InternalView object backed by the internal page model.
InternalView
crawl.pages()
Sitewide page view backed by the internal model. Use .filter() and .select() to narrow results.
PageView
crawl.links()
Sitewide inlinks or outlinks view.
Link direction.
"in" for inlinks, "out" for outlinks.LinkView
crawl.tab()
Access any export tab by name (CSV filename without extension, or normalized name).
Tab name. Case-insensitive;
snake_case and title-case forms accepted. Extension optional.TabView
crawl.section()
Scope page and link views to a URL path prefix or full URL prefix.
URL path prefix (e.g.
"/blog") or full URL prefix (e.g. "https://example.com/blog").CrawlSection
crawl.search()
Search across the sitewide page view.
Search string.
Limit search to these column names. Searches all string fields when
None.Whether the search is case-sensitive.
SearchRowView
crawl.tabs
List available tab names for the current backend.
list[str]
crawl.query()
Build a chainable SQL query against a raw backend table (DB-backed crawls only).
Schema name (e.g.
"APP").Table name (e.g.
"URLS").QueryView
crawl.raw()
Yield raw rows from a backend table as dicts. DB-backed crawls only.
Fully qualified table name (e.g.
"APP.URLS").Iterator[dict[str, Any]]
crawl.sql()
Execute a raw SQL query and yield rows as dicts. DB-backed crawls only.
SQL query string. Use
? for parameterized values.Query parameters corresponding to
? placeholders.Iterator[dict[str, Any]]
Graph helpers
crawl.inlinks()
Return all inlinks for a given URL.
The destination URL to look up inlinks for.
Iterator[Link]
crawl.outlinks()
Return all outlinks from a given URL.
The source URL to look up outlinks for.
Iterator[Link]
Chain helpers
crawl.redirect_chains()
Iterate redirect chain rows, optionally filtered by hop count and loop flag.
Minimum number of redirect hops.
None means no lower bound.Maximum number of redirect hops.
None means no upper bound.Filter by loop status.
True returns only loops; False excludes loops; None returns all.Iterator[dict[str, Any]]
crawl.canonical_chains()
Iterate canonical chain rows.
Minimum number of canonical hops.
Maximum number of canonical hops.
Filter by loop status.
Iterator[dict[str, Any]]
crawl.redirect_and_canonical_chains()
Iterate mixed redirect and canonical chain rows.
Minimum total hops.
Maximum total hops.
Filter by loop status.
Iterator[dict[str, Any]]
Audit report helpers
All report helpers return a flatlist[dict[str, Any]] of issue rows, ready to export or load into a dataframe.
crawl.summary()
Return a compact crawl-level summary dict with counts for pages, broken links, orphans, redirect chains, and issue families.
Core counts (
pages, tabs, broken_pages) are always populated. Issue-family and chain totals may be None on lean DuckDB caches until those tabs are materialized.dict[str, Any]
crawl.broken_links_report()
Return broken internal URLs with inlink counts and sampled inlink sources.
Minimum HTTP status code to include.
Maximum HTTP status code to include.
Maximum number of sampled inlink sources per broken URL. Pass
None to include all.list[dict[str, Any]]
crawl.broken_inlinks_report()
Return sitewide inlinks pointing to broken destinations.
Minimum HTTP status code.
Maximum HTTP status code.
list[dict[str, Any]]
crawl.nofollow_inlinks_report()
Return sitewide inlinks marked as nofollow.
Returns list[dict[str, Any]]
crawl.title_meta_audit()
Return page-level rows for missing titles and missing meta descriptions.
Returns list[dict[str, Any]]
crawl.indexability_audit()
Return non-indexable pages with key indexability fields (Indexability, Indexability Status, Canonical, Meta Robots, X-Robots-Tag).
Returns list[dict[str, Any]]
crawl.orphan_pages_report()
Return pages with no incoming internal links.
Exclude self-referencing links when computing inlink counts.
Return only indexable orphan pages.
list[dict[str, Any]]
crawl.security_issues_report()
Return rows from all available security issue tabs (missing HSTS, CSP, mixed content, insecure forms, etc.).
Returns list[dict[str, Any]]
crawl.canonical_issues_report()
Return rows from all available canonical issue tabs (missing, multiple, conflicting, non-indexable, etc.).
Returns list[dict[str, Any]]
crawl.hreflang_issues_report()
Return rows from all available hreflang issue tabs.
Returns list[dict[str, Any]]
crawl.redirect_issues_report()
Return rows from available redirect issue tabs (redirect chains, loops, meta refresh, JS redirect).
Returns list[dict[str, Any]]
crawl.redirect_chain_report()
Collected version of crawl.redirect_chains(). Returns results as a list.
Minimum redirect hops.
Maximum redirect hops.
Filter by loop status.
list[dict[str, Any]]
Tab metadata
crawl.tab_filters()
List available GUI filter names for a tab.
Tab name.
list[str]
crawl.tab_filter_defs()
Return the full filter definition objects for a tab.
Tab name.
list[Any]
crawl.tab_columns()
Return the column names for a tab.
Tab name.
list[str]
crawl.describe_tab()
Return a dict with tab, columns, and filters for a given tab name.
Tab name.
dict[str, Any]
DuckDB export
crawl.export_duckdb()
Export the current crawl into a DuckDB analytics cache file.
Destination path for the DuckDB file.
Raw Derby tables to include.
Mapped tabs to materialize. Pass
"all" for every available tab.What to do when the cache already exists. One of
"replace", "skip", "auto".Label stored in the cache to identify the crawl source.
Namespace within the DuckDB file for multi-crawl storage.
Path
export_duckdb_from_backend()
Export a crawl backend directly to a DuckDB file (lower-level than crawl.export_duckdb()). Used internally; exposed for advanced workflows.
A crawl backend instance.
Destination path for the DuckDB file.
Raw Derby tables to export. Defaults to
DEFAULT_DUCKDB_TABLES.Mapped tabs to materialize.
Cache refresh strategy:
"replace", "skip", or "auto".Label stored in the cache to identify the crawl source.
Namespace within the DuckDB file.
Path
Exported constants
DEFAULT_DUCKDB_TABLES
The default set of raw Derby tables exported when creating a DuckDB cache without specifying tables.
tuple[str, ...]
DEFAULT_DUCKDB_TABS
The default set of mapped tabs materialized when creating a DuckDB cache without specifying tabs.
tuple[str, ...]
Crawl comparison
crawl.compare()
Compare two crawls and return structural changes as a CrawlDiff.
The baseline crawl to compare against.
Field names to use for title comparison. Defaults to
("Title 1", "Title").Field names for redirect URL comparison. Defaults to
("Redirect URL", "Redirect URI", "Redirect Destination").Field names for redirect type comparison. Defaults to
("Redirect Type",).Additional field groups to diff (canonical, meta description, H1-3, word count, indexability, robots directives). Pass a custom dict to override the defaults.
CrawlDiff
Top-level helpers
list_crawls()
Enumerate all DB-mode crawls in the local ProjectInstanceData directory without opening Derby.
Override the
ProjectInstanceData root directory.list[CrawlInfo]
export_duckdb_from_derby()
Export a Derby crawl to a DuckDB file directly (without creating a Crawl instance).
Path to the Derby database directory or
.dbseospider file.Destination path for the DuckDB file.
Raw Derby tables to export.
Mapped tabs to materialize.
Cache refresh strategy.
Path
export_duckdb_from_db_id()
Export a DB-mode crawl by ID to a DuckDB file.
The DB crawl UUID.
Destination path for the DuckDB file.
Raw Derby tables to export.
Mapped tabs to materialize.
Cache refresh strategy.
Path