The crawler is the first stage in the signal pipeline. It connects to a Telegram channel via the GramJS client, retrieves raw message text for one or more calendar days, hands each message to the screen service for parsing, and then upserts any successfully-parsed signals into theDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/backtest-kit/backtest-ollama-crontab/llms.txt
Use this file to discover all available pages before exploring further.
parser-items MongoDB collection. Because every upsert is keyed on { channel, messageId }, the crawler is fully idempotent — you can re-run it over any date range without creating duplicate records.
CrawlerService
CrawlerService (packages/core/src/lib/services/core/CrawlerService.ts) is the low-level workhorse. It delegates scraping to ScraperService and parsing to CryptoYodaScreenService, then persists the results through ParserDbService.
crawlDay(stamp)
Crawls a single calendar day identified by a moment-stamp (an integer like 20260115).
crawlRange(fromStamp, toStamp)
Crawls an inclusive range of days in parallel. For each day in the range, it calls cryptoYodaScreenService.screenDay(date), then processes every returned message:
data is null (failed parsing) are logged and skipped. Only messages whose type equals "crypto_yoda_channel" are written to the database — this guard allows multiple screen services with different channel types to be added later without code changes.
CrawlerMainService
CrawlerMainService (packages/core/src/lib/services/main/CrawlerMainService.ts) is the orchestration layer. The two strategy crontabs call into it rather than calling CrawlerService directly, so mode-awareness and frame-lookup logic stay in one place.
crawlLiveFrame(when: Date)
Called by the 15-minute live-mode crontab. It:
Guard against backtest mode
Calls
getMode() and returns early if the runtime is "backtest". This prevents the live handler from accidentally running when backtest-kit replays history.Compute today's stamp
Converts the
when timestamp to an integer moment-stamp with getMomentStamp(when).Crawl today
Delegates to
crawlerService.crawlDay(stamp) — fetches and upserts all of today’s channel messages.crawlBacktestFrame(when: Date)
Called once at strategy startup by the backtest-prepare crontab. It:
Resolve the active frame
Reads
frameName from getContext() and looks up the matching entry in listFrameSchema() to obtain startDate and endDate.Crawl the full range
Converts both dates to moment-stamps and passes them to
crawlerService.crawlRange(fromStamp, toStamp), which fetches every day in the frame in parallel.ScraperService
ScraperService (packages/core/src/lib/services/core/ScraperService.ts) owns the raw Telegram I/O. It calls getTelegram() to obtain an authenticated GramJS client, then streams messages from the channel for a specific calendar day:
Channel Configuration
The channel being scraped is identified by the constant:CryptoYodaScreenService. This string is used as both the GramJS channel identifier passed to ScraperService.scrapeDay() and as the type tag on each parsed message.
To target a different Telegram channel, create a new ScreenService following the same pattern as CryptoYodaScreenService:
- Define a
CHANNEL_NAMEconstant with the new channel’s identifier. - Define a
SIGNAL_FORMATobject with regex patterns for the new channel’s message structure (see the Parser guide). - Implement
screenDay(date)andparseDay(messages)methods. - Inject the new screen service into
CrawlerServiceand add itsscreenDaycall toRUN_CRAWLER_FN.
Data Deduplication
parserDbService.create() performs an upsert keyed on the compound unique index { channel, messageId }. This means:
- Re-crawling the same date range never creates duplicate records.
- If a Telegram message is edited after the first crawl, the updated content is written over the old record on the next crawl.
- Backtest preparation can be re-run safely — only new messages (not yet in the collection) result in actual writes.
The
messageId field is the Telegram-assigned integer message ID, which is stable and monotonically increasing within each channel. It is not a UUID generated by the application.