Skip to main content
Linkspector provides powerful pattern matching capabilities to ignore certain links or transform them before checking.

Ignore Patterns

The ignorePatterns section allows you to define regular expressions that match URLs to be ignored during link checking.

Configuration

ignorePatterns:
  - pattern: '^https://example.com/skip/.*$'
  - pattern: "^(ftp)://[^\\s/$?#]*\\.[^\\s]*$"
  - pattern: '^https://localhost.*$'

Behavior

  • URLs matching any of the specified patterns will be skipped
  • Patterns use regular expression syntax
  • Each pattern is defined as an object with a pattern key
  • Case-sensitive by default

Common Use Cases

ignorePatterns:
  - pattern: "^(ftp)://[^\\s/$?#]*\\.[^\\s]*$"
This pattern skips all FTP protocol links.

Ignore Localhost URLs

ignorePatterns:
  - pattern: '^https?://localhost.*$'
  - pattern: '^https?://127\\.0\\.0\\.1.*$'
This skips all localhost and 127.0.0.1 URLs.

Ignore Specific Domains

ignorePatterns:
  - pattern: '^https://private-internal.company.com/.*$'
  - pattern: '^https://staging.example.com/.*$'
This skips all links to specific internal or staging domains.

Ignore URL Paths

ignorePatterns:
  - pattern: '^https://example.com/skip/.*$'
  - pattern: '.*/admin/.*'
  - pattern: '.*/api/v1/.*'
This skips links to specific paths on any domain.
Be careful with overly broad patterns like '.*' which would ignore all links.

Regular Expression Tips

  • Use ^ to match the start of the URL
  • Use $ to match the end of the URL
  • Use .* to match any characters
  • Use \\. to match a literal period
  • Use [^\\s] to match any non-whitespace character
  • Remember to escape special characters in YAML strings

Replacement Patterns

The replacementPatterns section lets you define regular expressions and replacement strings to modify URLs before checking them.

Configuration

replacementPatterns:
  - pattern: "(https?://example.com)/(\\w+)/(\\d+)"
    replacement: '$1/id/$3'
  - pattern: "\\[([^\\]]+)\\]\\((https?://example.com)/file\\)"
    replacement: '<a href="$2/file">$1</a>'

Behavior

  • Patterns use regular expression syntax with capture groups
  • Replacements can reference capture groups using $1, $2, etc.
  • Each entry requires both pattern and replacement keys
  • Transformations are applied before link checking

Common Use Cases

Normalize URL Format

replacementPatterns:
  - pattern: "(https?://example.com)/(\\w+)/(\\d+)"
    replacement: '$1/id/$3'
This transforms URLs like:
  • https://example.com/user/123https://example.com/id/123
  • https://example.com/post/456https://example.com/id/456

Convert Short URLs

replacementPatterns:
  - pattern: "https://short.link/(\\w+)"
    replacement: 'https://full-domain.com/redirect/$1'
This transforms short URLs to their full format before checking.

Update Deprecated Paths

replacementPatterns:
  - pattern: "(https?://api.example.com)/v1/"
    replacement: '$1/v2/'
This automatically updates old API version paths to new ones.
replacementPatterns:
  - pattern: "\\[([^\\]]+)\\]\\((https?://example.com)/file\\)"
    replacement: '<a href="$2/file">$1</a>'
This transforms Markdown links to HTML anchor tags.
Replacement patterns are useful when you’ve migrated URLs but haven’t updated your documentation yet, or when dealing with dynamic URL structures.

Capture Groups

Capture groups allow you to extract parts of the URL and use them in the replacement:
replacementPatterns:
  # Capture group 1: protocol and domain
  # Capture group 2: category
  # Capture group 3: numeric ID
  - pattern: "(https?://example.com)/(\\w+)/(\\d+)"
    replacement: '$1/id/$3'
In the replacement:
  • $1 refers to the first capture group (https?://example.com)
  • $2 refers to the second capture group (\\w+)
  • $3 refers to the third capture group (\\d+)

Base URL

The baseUrl option sets the base URL used when checking relative links in your files.

Configuration

baseUrl: https://example.com

Behavior

  • When a relative link is found, it’s converted to an absolute URL using the base URL
  • Only applies to relative links (links that don’t start with a protocol)
  • Useful when your documentation will be hosted at a specific domain

Example

With this configuration:
baseUrl: https://docs.example.com
Relative links are resolved as:
  • /getting-startedhttps://docs.example.com/getting-started
  • ../api/referencehttps://docs.example.com/api/reference
  • images/logo.pnghttps://docs.example.com/images/logo.png
Absolute URLs (starting with http:// or https://) are not affected by the baseUrl setting.

Use Case: Static Site Testing

When building a static site that will be deployed to a specific domain:
baseUrl: https://www.mysite.com

dirs:
  - ./content
  - ./pages
This ensures relative links are tested against the actual production domain.

Complete Example

Here’s a comprehensive example combining all link pattern options:
# Set base URL for relative links
baseUrl: https://docs.example.com

# Ignore certain URL patterns
ignorePatterns:
  # Skip localhost URLs
  - pattern: '^https?://localhost.*$'
  - pattern: '^https?://127\\.0\\.0\\.1.*$'
  
  # Skip FTP links
  - pattern: "^(ftp)://[^\\s/$?#]*\\.[^\\s]*$"
  
  # Skip internal admin pages
  - pattern: '.*/admin/.*'
  
  # Skip specific paths
  - pattern: '^https://example.com/skip/.*$'

# Transform URLs before checking
replacementPatterns:
  # Update API version in URLs
  - pattern: "(https?://api.example.com)/v1/"
    replacement: '$1/v2/'
  
  # Normalize ID URLs
  - pattern: "(https?://example.com)/(\\w+)/(\\d+)"
    replacement: '$1/id/$3'

# Other configuration...
dirs:
  - ./docs
This configuration:
  • Resolves relative links using https://docs.example.com as the base
  • Skips localhost, FTP, and admin URLs
  • Skips specific paths on example.com
  • Updates API v1 URLs to v2
  • Normalizes ID-based URLs to a consistent format

Build docs developers (and LLMs) love