Parse and Import UZSE Trade HTML Pages into MongoDB

import-trades.ts is the second step in the data pipeline. It reads every trades_page_*.html file produced by the download step, extracts trade rows from the HTML tables using regex parsing, and inserts them one by one into the trade-results MongoDB collection. Re-importing the same files is safe — each record is fingerprinted with a SHA1 hash and duplicates are silently skipped.

Usage

npx tsx scripts/import-trades.ts

The script takes no CLI arguments. It automatically discovers all trades_page_*.html files in the tmp/ directory (sorted numerically by page number) and processes them in order.

Run download-trades first to populate tmp/ with HTML pages before executing this script. If no matching files are found the script exits with an error.

Requirements

MongoDB must be running and accessible. The connection URI is read from the MONGO_URI environment variable, which defaults to mongodb://localhost:27017/backtest when not set.

Parsed Fields

Each <tr> row in the HTML table is parsed into the following fields:

Field	Type	Description
`time`	`Date`	Trade timestamp, parsed from Russian-language date text
`symbol`	`string`	ISIN code extracted from the security cell, e.g. `UZ7011340005`
`issuer`	`string`	Full name of the issuing company
`securityType`	`string`	Type of security (e.g. ordinary share)
`market`	`string`	Market segment the trade occurred in
`platform`	`string`	Trading platform or session
`tradePrice`	`number`	Execution price per unit
`quantity`	`number`	Number of securities exchanged
`volume`	`number`	Total monetary volume of the trade
`hash`	`string`	SHA1 fingerprint used as the unique deduplication key

Russian Date Parsing

UZSE renders trade timestamps in Russian, for example:

17 апреля 2026, 16:02

The script maps Russian month names to zero-based month indices:

const RU_MONTHS: Record<string, number> = {
  января: 0,  февраля: 1,  марта: 2,   апреля: 3,
  мая: 4,     июня: 5,     июля: 6,    августа: 7,
  сентября: 8, октября: 9, ноября: 10, декабря: 11,
};

Rows whose date cannot be parsed are silently dropped before insertion.

Deduplication via SHA1

Before inserting, a SHA1 fingerprint is computed for every row from the concatenation of:

symbol|time (ISO)|tradePrice|quantity|volume|pageIndex|rowIndex|urlKey

where urlKey is derived from the page’s query string as begin|end|search_key. The hash field carries a unique index in MongoDB. Any insert that triggers error code 11000 (duplicate key) is counted as skipped — it is not treated as a failure.

Output

Progress is printed per file, followed by a final summary:

Found 3 file(s)
MongoDB connected
trades_page_1.html: 42 rows
trades_page_2.html: 42 rows
trades_page_3.html: 17 rows
Done. Inserted: 101, skipped (duplicates): 0
Tmp cleaned.

After all records are inserted, the script automatically deletes every .html file from tmp/ to keep the working directory clean.

Process Flow

Discover HTML files

Reads tmp/ and collects all files matching trades_page_*.html, sorted by page number ascending.

Connect to MongoDB

Opens a Mongoose connection using MONGO_URI and waits for the connection to be ready.

Parse and insert rows

For each file, the HTML is parsed with regex to extract table rows. Each row is hashed and inserted into the trade-results collection. Duplicate hashes are skipped with a counter increment.

Clean up tmp/

All .html files are removed from tmp/ upon successful completion.

Next Step

With trade records in MongoDB, run Build Candles to aggregate them into OHLCV candlestick data across all supported timeframes.

Get Started

Data Pipeline

Backtesting & Analysis

Infrastructure

Reference

Parse and Import UZSE Trade HTML Pages into MongoDB

Usage

Requirements

Parsed Fields

Russian Date Parsing

Deduplication via SHA1

Output

Process Flow

Next Step

Build docs developers (and LLMs) love

Get Started

Data Pipeline

Backtesting & Analysis

Infrastructure

Reference

Documentation Index

​Usage

​Requirements

​Parsed Fields

​Russian Date Parsing

​Deduplication via SHA1

​Output

​Process Flow

​Next Step

Build docs developers (and LLMs) love

Usage

Requirements

Parsed Fields

Russian Date Parsing

Deduplication via SHA1

Output

Process Flow

Next Step