How it works

Phisherman processes every URL through a multi-stage pipeline: a cache lookup, parallel checker execution, score aggregation, and a cache write. This page walks through each stage in detail.

Request lifecycle

HTTP request
     │
     ▼
 Rate limiter
     │
     ▼
 Redis cache check  ──── hit ────▶  return cached ScanResult
     │ miss
     ▼
 CheckerRegistry.runAll(url)
  ┌──┴─────────────────────────────────────┐
  │  heuristics  openphish  gsb  urlhaus   │  (parallel, 2500 ms timeout each)
  │  phishtank   phishstats               │
  └──┬─────────────────────────────────────┘
     │ []CheckResult
     ▼
 Aggregate score  →  Math.min(100, sum)
     │
     ▼
 Determine verdict  (safe / suspicious / phishing)
     │
     ▼
 Cache result in Redis  (if non-safe, or SCAN_CACHE_SAFE_RESULTS=true)
     │
     ▼
 Return ScanResult

The CheckerRegistry

CheckerRegistry is a small registry class that holds a list of Checker objects and runs them all in parallel.

// src/CheckerRegistry.ts
class CheckerRegistry {
    private checkers: Checker[] = [];

    register(checker: Checker) {
        this.checkers.push(checker);
    }

    async runAll(url: string): Promise<{ checks: CheckResult[]; timing: Record<string, number> }> {
        const timing: Record<string, number> = {};
        const TIMEOUT_MS = 2500; // 2.5s maximum per checker

        const checks = await Promise.all(
            this.checkers.map(async (checker) => {
                const start = Date.now();
                try {
                    const checkPromise = checker.check(url);
                    const timeoutPromise = new Promise<CheckResult>((_, reject) =>
                        setTimeout(() => reject(new Error("Timeout")), TIMEOUT_MS)
                    );

                    const result = await Promise.race([checkPromise, timeoutPromise]);
                    timing[checker.name] = Date.now() - start;
                    return result;
                } catch (err: any) {
                    timing[checker.name] = Date.now() - start;
                    if (err.message === "Timeout") {
                        console.warn(`Checker ${checker.name} timed out for ${url}`);
                        return { score: 0, reason: `Checker ${checker.name} timed out` };
                    }
                    console.error(`Checker ${checker.name} failed:`, err);
                    return { score: 0, reason: `Checker ${checker.name} error` };
                }
            })
        );

        return { checks, timing };
    }
}

Key properties of this design:

All checkers run concurrently via Promise.all — latency is bounded by the slowest checker, not the sum of all checkers.
Each checker races against a 2500 ms timeout via Promise.race. A timed-out checker contributes a score of 0 and does not fail the request.
Per-checker execution time is recorded in the timing map and returned in the ScanResult as executionTimeMs.

Checker registration

Checkers are registered in Scanner.ts at startup:

// src/Scanner.ts
registry.register(HeuristicsChecker);
registry.register(OpenPhishChecker);
registry.register(SafeBrowsingChecker);
registry.register(URLHausChecker);
registry.register(PhishTankChecker);
// registry.register(WebRiskChecker);  // disabled
registry.register(PhishStatsChecker);

Each checker implements the Checker interface:

// src/types.ts
export interface Checker {
    name: string;
    check: (url: string) => Promise<CheckResult>;
}

Result caching

Scan results are cached in Redis to avoid re-running the full checker pipeline for recently seen URLs.

// src/Scanner.ts
const RESULT_CACHE_TTL_SECONDS = 300; // 5 minutes
const SCAN_CACHE_HASH = "scan_results";
const SCAN_CACHE_EXPIRY_ZSET = "scan_results_expiry";

Cache key — The URL is hashed with SHA-256:

function scanCacheId(url: string) {
  return crypto.createHash("sha256").update(url).digest("hex");
}

Storage structure — To avoid Redis key explosion, all scan results are stored as fields of a single hash (scan_results). A companion sorted set (scan_results_expiry) stores each field ID with its expiry timestamp as the score, enabling efficient batch cleanup. Cache read — On a cache hit, Phisherman checks the exp field against Date.now(). Expired entries are deleted opportunistically before the fresh scan runs. Caching safe results — By default, URLs that resolve to a safe verdict are not cached, because they are high-volume and low-value to retain. Set SCAN_CACHE_SAFE_RESULTS=true to cache them:

const CACHE_SAFE_RESULTS = (process.env.SCAN_CACHE_SAFE_RESULTS || "").toLowerCase() === "true";

// ...
if (CACHE_SAFE_RESULTS || result.verdict !== "safe") {
  // write to cache
}

Background feed refresh

CacheManager runs a background loop that keeps all threat feed data current. It is started once at server startup:

// src/CacheManager.ts
async start(intervalMs: number = 3600000) { // Default 1 hour
    if (this.interval) return;
    await this.runAll();  // run immediately on startup
    this.interval = setInterval(() => this.runAll(), intervalMs);
}

On each cycle, runAll() invokes every registered RefreshTask in sequence, then runs three cleanup routines:

Cleanup step	What it removes
`cleanupScanResults()`	Expired entries from `scan_results` hash + ZSET
`cleanupWhois()`	Expired WHOIS lookups from `whois_data` hash + ZSET
`cleanupHashCaches()`	Expired entries from GSB, GWR, and DNS `HashCache` instances

Each feed source registers its own refresh task with a source-specific interval:

Source	Refresh interval
URLHaus	5 minutes
OpenPhish	15 minutes
PhishTank	60 minutes
PhishStats	90 minutes

Feed refresh is checked on every invocation — not on a separate per-source timer. The CacheManager loop fires every hour by default, but each source compares Date.now() against its own last-update timestamp and only refetches if its individual interval has elapsed.

Get Started

Core Concepts

Configuration

Request lifecycle

The CheckerRegistry

Checker registration

Result caching

Background feed refresh

Build docs developers (and LLMs) love

Get Started

Core Concepts

Configuration

​Request lifecycle

​The CheckerRegistry

​Checker registration

​Result caching

​Background feed refresh

Build docs developers (and LLMs) love

Request lifecycle

The CheckerRegistry

Checker registration

Result caching

Background feed refresh