Skip to main content
Phisherman processes every URL through a multi-stage pipeline: a cache lookup, parallel checker execution, score aggregation, and a cache write. This page walks through each stage in detail.

Request lifecycle

HTTP request


 Rate limiter


 Redis cache check  ──── hit ────▶  return cached ScanResult
     │ miss

 CheckerRegistry.runAll(url)
  ┌──┴─────────────────────────────────────┐
  │  heuristics  openphish  gsb  urlhaus   │  (parallel, 2500 ms timeout each)
  │  phishtank   phishstats               │
  └──┬─────────────────────────────────────┘
     │ []CheckResult

 Aggregate score  →  Math.min(100, sum)


 Determine verdict  (safe / suspicious / phishing)


 Cache result in Redis  (if non-safe, or SCAN_CACHE_SAFE_RESULTS=true)


 Return ScanResult

The CheckerRegistry

CheckerRegistry is a small registry class that holds a list of Checker objects and runs them all in parallel.
// src/CheckerRegistry.ts
class CheckerRegistry {
    private checkers: Checker[] = [];

    register(checker: Checker) {
        this.checkers.push(checker);
    }

    async runAll(url: string): Promise<{ checks: CheckResult[]; timing: Record<string, number> }> {
        const timing: Record<string, number> = {};
        const TIMEOUT_MS = 2500; // 2.5s maximum per checker

        const checks = await Promise.all(
            this.checkers.map(async (checker) => {
                const start = Date.now();
                try {
                    const checkPromise = checker.check(url);
                    const timeoutPromise = new Promise<CheckResult>((_, reject) =>
                        setTimeout(() => reject(new Error("Timeout")), TIMEOUT_MS)
                    );

                    const result = await Promise.race([checkPromise, timeoutPromise]);
                    timing[checker.name] = Date.now() - start;
                    return result;
                } catch (err: any) {
                    timing[checker.name] = Date.now() - start;
                    if (err.message === "Timeout") {
                        console.warn(`Checker ${checker.name} timed out for ${url}`);
                        return { score: 0, reason: `Checker ${checker.name} timed out` };
                    }
                    console.error(`Checker ${checker.name} failed:`, err);
                    return { score: 0, reason: `Checker ${checker.name} error` };
                }
            })
        );

        return { checks, timing };
    }
}
Key properties of this design:
  • All checkers run concurrently via Promise.all — latency is bounded by the slowest checker, not the sum of all checkers.
  • Each checker races against a 2500 ms timeout via Promise.race. A timed-out checker contributes a score of 0 and does not fail the request.
  • Per-checker execution time is recorded in the timing map and returned in the ScanResult as executionTimeMs.

Checker registration

Checkers are registered in Scanner.ts at startup:
// src/Scanner.ts
registry.register(HeuristicsChecker);
registry.register(OpenPhishChecker);
registry.register(SafeBrowsingChecker);
registry.register(URLHausChecker);
registry.register(PhishTankChecker);
// registry.register(WebRiskChecker);  // disabled
registry.register(PhishStatsChecker);
Each checker implements the Checker interface:
// src/types.ts
export interface Checker {
    name: string;
    check: (url: string) => Promise<CheckResult>;
}

Result caching

Scan results are cached in Redis to avoid re-running the full checker pipeline for recently seen URLs.
// src/Scanner.ts
const RESULT_CACHE_TTL_SECONDS = 300; // 5 minutes
const SCAN_CACHE_HASH = "scan_results";
const SCAN_CACHE_EXPIRY_ZSET = "scan_results_expiry";
Cache key — The URL is hashed with SHA-256:
function scanCacheId(url: string) {
  return crypto.createHash("sha256").update(url).digest("hex");
}
Storage structure — To avoid Redis key explosion, all scan results are stored as fields of a single hash (scan_results). A companion sorted set (scan_results_expiry) stores each field ID with its expiry timestamp as the score, enabling efficient batch cleanup. Cache read — On a cache hit, Phisherman checks the exp field against Date.now(). Expired entries are deleted opportunistically before the fresh scan runs. Caching safe results — By default, URLs that resolve to a safe verdict are not cached, because they are high-volume and low-value to retain. Set SCAN_CACHE_SAFE_RESULTS=true to cache them:
const CACHE_SAFE_RESULTS = (process.env.SCAN_CACHE_SAFE_RESULTS || "").toLowerCase() === "true";

// ...
if (CACHE_SAFE_RESULTS || result.verdict !== "safe") {
  // write to cache
}

Background feed refresh

CacheManager runs a background loop that keeps all threat feed data current. It is started once at server startup:
// src/CacheManager.ts
async start(intervalMs: number = 3600000) { // Default 1 hour
    if (this.interval) return;
    await this.runAll();  // run immediately on startup
    this.interval = setInterval(() => this.runAll(), intervalMs);
}
On each cycle, runAll() invokes every registered RefreshTask in sequence, then runs three cleanup routines:
Cleanup stepWhat it removes
cleanupScanResults()Expired entries from scan_results hash + ZSET
cleanupWhois()Expired WHOIS lookups from whois_data hash + ZSET
cleanupHashCaches()Expired entries from GSB, GWR, and DNS HashCache instances
Each feed source registers its own refresh task with a source-specific interval:
SourceRefresh interval
URLHaus5 minutes
OpenPhish15 minutes
PhishTank60 minutes
PhishStats90 minutes
Feed refresh is checked on every invocation — not on a separate per-source timer. The CacheManager loop fires every hour by default, but each source compares Date.now() against its own last-update timestamp and only refetches if its individual interval has elapsed.

Build docs developers (and LLMs) love