Pre-built report helpers on the Crawl object for common SEO audit workflows — broken links, title and meta issues, indexability, orphans, security, canonicals, hreflang, and redirects.
Use this file to discover all available pages before exploring further.
Audit helpers are thin, opinionated wrappers that surface the most common SEO issues without requiring you to remember tab names or write manual filters. Each helper returns a list[dict] you can iterate, export, or pass straight into a dataframe.
All helpers work with any crawl backend (DuckDB, Derby, CSV). On lean DuckDB caches, helpers read directly from the prewarmed Derby source so they do not force a full tab materialization first.
Returns broken internal URLs with inlink counts and sampled inlink sources.
crawl.broken_links_report( min_status: int = 400, max_status: int = 599, max_inlinks: int = 25,) -> list[dict]
Parameters
min_status
Lower bound of the HTTP status code range to flag. Defaults to 400.
max_status
Upper bound of the HTTP status code range to flag. Defaults to 599.
max_inlinks
Maximum number of sampled inlink sources to include per broken URL. Defaults to 25.
Each row includes the broken URL, its HTTP status code, the total inlink count, and up to max_inlinks sampled source URLs.
from screamingfrog import Crawlcrawl = Crawl.load("./crawl.dbseospider")broken = crawl.broken_links_report()for row in broken: print(row["Address"], row.get("Status Code"))
To narrow to only client errors, or to widen to all 5xx responses:
# Only 4xx client errorsclient_errors = crawl.broken_links_report(min_status=400, max_status=499)# Only 5xx server errors, show up to 50 inlink sources per URLserver_errors = crawl.broken_links_report(min_status=500, max_status=599, max_inlinks=50)
The example script in examples/broken_links_report.py shows how to print inlink sources for each broken URL:
from screamingfrog import Crawlimport syscrawl = Crawl.load(sys.argv[1] if len(sys.argv) > 1 else "./crawl.dbseospider")for row in crawl.tab("response_codes_internal_client_error_(4xx)"): url = str(row.get("Address") or "") code = row.get("Status Code") if not url: continue print(f"{code}: {url}") inlinks = list(crawl.inlinks(url)) for link in inlinks[:25]: print(f" <- {link.source} ({link.anchor_text or ''})") if len(inlinks) > 25: print(f" ... {len(inlinks) - 25} more")
Surfaces pages with missing titles or missing meta descriptions as flat issue rows.
crawl.title_meta_audit() -> list[dict]
Each row includes at minimum the Address and an Issue field describing what is missing (Missing Title, Missing Meta Description, etc.).
This helper runs DuckDB-first when internal_all is already cached, and falls back to the high-level internal model on lean caches — so it is fast regardless of cache state.
from screamingfrog import Crawlcrawl = Crawl.load("./crawl.dbseospider")for row in crawl.title_meta_audit(): print(row.get("Address"), "|", row.get("Issue"))
The examples/title_meta_audit.py script shows a manual fallback approach that also works against CSV exports:
from screamingfrog import Crawlimport syscrawl = Crawl.load(sys.argv[1] if len(sys.argv) > 1 else "./crawl.dbseospider")# Missing titles — tries the page_titles_missing tab first, falls back to internal scanprint("Missing titles:")try: for row in crawl.tab("page_titles_missing"): address = row.get("Address") if address: print(f" {address}")except Exception: for page in crawl.internal: title = page.data.get("Title 1") or page.data.get("Title") if not title: print(f" {page.address}")# Missing meta descriptionsprint("\nMissing meta descriptions:")try: for row in crawl.tab("meta_description_missing"): address = row.get("Address") if address: print(f" {address}")except Exception: for page in crawl.internal: meta = page.data.get("Meta Description 1") or page.data.get("Meta Description") if not meta: print(f" {page.address}")
Returns non-indexable pages with the key indexability fields that explain why.
crawl.indexability_audit() -> list[dict]
Typical fields in each row: Address, Indexability, Indexability Status, Meta Robots 1, X-Robots-Tag 1, Canonical Link Element 1.
from screamingfrog import Crawlcrawl = Crawl.load("./crawl.dbseospider")for row in crawl.indexability_audit(): print(row.get("Address"), "|", row.get("Indexability Status"))
To group non-indexable pages by reason:
from collections import Counterfrom screamingfrog import Crawlcrawl = Crawl.load("./crawl.dbseospider")counts = Counter( row.get("Indexability Status", "Unknown") for row in crawl.indexability_audit())for reason, n in counts.most_common(): print(f"{n:>6} {reason}")
Returns pages with canonical issues (e.g., non-indexable canonicalised pages, canonical points to redirect).
crawl.canonical_issues_report() -> list[dict]
from screamingfrog import Crawlcrawl = Crawl.load("./crawl.dbseospider")for row in crawl.canonical_issues_report(): print(row.get("Address"), row.get("Canonical Link Element 1"))
Returns pages with hreflang issues (e.g., missing return tags, incorrect language codes).
crawl.hreflang_issues_report() -> list[dict]
Some hreflang edge cases (specifically incorrect language-code cases) do not yet have exact Derby parity and may differ slightly from the Screaming Frog UI.
from screamingfrog import Crawlcrawl = Crawl.load("./crawl.dbseospider")for row in crawl.hreflang_issues_report(): print(row.get("Address"), row.get("Issue"))