Feed overview
| Feed | Source | Redis key(s) | Refresh interval | Match type |
|---|---|---|---|---|
| URLHaus | urlhaus.abuse.ch | urlhaus_blacklist | 5 minutes | Exact URL |
| OpenPhish | openphish.com | openphish_urls, openphish_hosts | 15 minutes | Exact URL or hostname |
| PhishTank | data.phishtank.com | phishtank_urls | 60 minutes | Exact URL |
| PhishStats | api.phishstats.info | phishstats_urls, phishstats_hosts | 90 minutes | Exact URL or hostname |
| Google Safe Browsing | safebrowsing.googleapis.com | gsb_cache_hash | Per-URL (1 hr TTL) | API lookup |
A Google Web Risk checker (
WebRiskChecker) exists in the source at src/checkers/googleWebRisk.ts but is currently disabled. It is commented out in Scanner.ts and does not run.Feed details
URLHaus (abuse.ch)
URLHaus (abuse.ch)
URLHaus publishes a live list of URLs serving active malware. Phisherman streams the CSV feed directly into Redis.Feed URL:
Redis key:
Refresh interval: every 5 minutesThe feed is a quoted CSV with the format:Phisherman reads column index 2 (the URL) from each line. The feed is consumed as a stream using Node.js An atomic key swap is used so the live set is never partially populated during a refresh:At scan time, a single
https://urlhaus.abuse.ch/downloads/csv-online/Redis key:
urlhaus_blacklist (Redis Set)Refresh interval: every 5 minutesThe feed is a quoted CSV with the format:
readline — no full in-memory buffering is needed. URLs are written to Redis in batches of 1000 to keep request sizes manageable:SISMEMBER lookup is performed against urlhaus_blacklist. A match returns a score of 100.OpenPhish
OpenPhish
OpenPhish publishes a plain-text feed of active phishing URLs, one per line.Feed URL:
Redis keys:
Refresh interval: every 15 minutesDuring refresh, Phisherman populates both a URL set and a hostname set from the feed. This enables two levels of matching at scan time:An exact URL match returns 100; a hostname-only match returns 80. The refresh uses an atomic rename swap on both sets.
https://openphish.com/feed.txtRedis keys:
openphish_urls (Set), openphish_hosts (Set)Refresh interval: every 15 minutesDuring refresh, Phisherman populates both a URL set and a hostname set from the feed. This enables two levels of matching at scan time:
PhishTank
PhishTank
PhishTank provides a community-verified list of phishing URLs as a gzip-compressed CSV.Feed URL (default):
Redis key:
Refresh interval: every 60 minutes
Failure cooldown: 15 minutesPhisherman prefers the CSV.GZ format because it can be decompressed and parsed in a streaming, constant-memory way. The JSON dump is intentionally skipped to avoid large in-memory buffering on small instances:The gzip stream is decompressed inline:If the primary feed fails, Phisherman automatically falls back to the JSON endpoint. If the fallback also fails, a At scan time, only an exact URL match is checked. A match returns a score of 100.
https://data.phishtank.com/data/online-valid.csv.gzRedis key:
phishtank_urls (Set)Refresh interval: every 60 minutes
Failure cooldown: 15 minutesPhisherman prefers the CSV.GZ format because it can be decompressed and parsed in a streaming, constant-memory way. The JSON dump is intentionally skipped to avoid large in-memory buffering on small instances:
phishtank_last_fail timestamp is written to Redis and no further refresh attempts are made for 15 minutes.You can override the feed URL with an environment variable:PhishStats
PhishStats
PhishStats provides a JSON API of recently reported phishing URLs.Feed URL:
Redis keys:
Refresh interval: every 90 minutesThe feed returns an array of entries in the formatAs with OpenPhish, matching is attempted at two levels: exact URL (score 100) then hostname (score 80).
https://api.phishstats.info/api/phishing?_sort=-id&_size=20000Redis keys:
phishstats_urls (Set), phishstats_hosts (Set)Refresh interval: every 90 minutesThe feed returns an array of entries in the format
{ id, url, ip, ... }. Phisherman extracts the url field from each entry and also extracts the hostname to populate a separate set:Google Safe Browsing
Google Safe Browsing
Google Safe Browsing is the only feed that does not use a bulk-refresh model. Each URL is checked against the API individually at scan time and the result is cached per-URL.API endpoint:
Redis storage:
Requires:If the A match returns a score of 50.
https://safebrowsing.googleapis.com/v4/threatMatches:findRedis storage:
gsb_cache_hash (HashCache, 1-hour TTL per URL)Requires:
GOOGLE_SAFE_API_KEY environment variableThe request checks for four threat types:GOOGLE_SAFE_API_KEY environment variable is not set, the checker returns { score: 0 } immediately without making a network request.Results are cached using the HashCache utility (see below). Valid results are cached for 1 hour; error responses (e.g. billing issues) are cached for 15 minutes to prevent hammering the API on a broken key:The HashCache utility
Several caches (GSB, Google Web Risk, DNS) use a sharedHashCache class instead of individual Redis keys. This prevents key explosion when caching a large number of per-URL or per-host results.
How it works:
- Each entry’s key (e.g. a URL) is hashed with SHA-256 and truncated to 32 hex characters to produce a stable, short field ID.
- The field ID is used as a field in a single Redis hash (e.g.
gsb_cache_hash). - A companion sorted set (e.g.
gsb_cache_expiry) stores each field ID with its expiry timestamp as the score. - On
get, if the entry exists but itsexphas passed, it is deleted opportunistically. - On the
CacheManagercleanup cycle, all entries withscore <= nowin the ZSET are removed in bulk.
HashCache instance (*_hash and *_expiry) exist regardless of how many entries are stored.