Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Anny26022/chartsmaze_clone/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Incremental updates allow you to refresh market data daily without re-fetching the entire historical dataset. The pipeline intelligently detects existing data and only fetches what’s new.
Runtime: ~2-5 minutes (vs ~30 minutes for first-time full fetch)
How Incremental Updates Work
Smart OHLCV Incremental Logic
The fetch_all_ohlcv.py script implements intelligent incremental fetching:
Check for existing data
Scans ohlcv_data/ directory for existing CSV files per stock:ohlcv_data/
├── RELIANCE.csv
├── TCS.csv
├── INFY.csv
└── ... (2,775+ files)
Identify last recorded date
Reads the last row of each CSV to find the most recent date:# Example: RELIANCE.csv last row
Date,Open,High,Low,Close,Volume
...
2026-03-02,2734.50,2756.00,2720.00,2745.30,8523400
Last date: 2026-03-02 Fetch only missing dates
Requests data from (last_date + 1 day) to today:{
"START": 1709510400, // 2026-03-03 timestamp
"END": 1709683200 // 2026-03-04 (today)
}
Only fetches 1-2 days of new data instead of 500+ days. Append new rows
Appends new data to existing CSV:2026-03-03,2745.30,2768.00,2738.50,2761.20,7234500
2026-03-04,2761.20,2780.50,2755.00,2772.80,8901200
Other Data Sources
Most other fetchers re-fetch fresh data daily (lightweight):
| Script | Update Strategy | Runtime |
|---|
fetch_fundamental_data.py | Full refresh (quarterly data changes slowly) | ~18s |
fetch_company_filings.py | Fetches last 100 filings (new ones appear daily) | ~45s |
fetch_market_news.py | Fetches last 50 news items per stock | ~30s |
fetch_corporate_actions.py | Fetches upcoming + 2yr history | ~8s |
fetch_bulk_block_deals.py | Fetches last 30 days | ~5s |
fetch_circuit_stocks.py | Live snapshot (today’s circuits) | ~3s |
fetch_surveillance_lists.py | Current ASM/GSM lists | ~4s |
fetch_incremental_price_bands.py | Today’s band changes CSV | ~2s |
Total Phase 1-2 runtime: ~2 minutes (same as full pipeline)
Running Daily Updates
Ensure OHLCV data exists
Verify the ohlcv_data/ directory from previous run:cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/
ls -lh ohlcv_data/ | wc -l
Should show ~2,775+ CSV files. Run the full pipeline
python3 run_full_pipeline.py
The pipeline automatically detects existing OHLCV data and runs incrementally.Monitor incremental fetch
Watch Phase 2.5 output:📊 PHASE 2.5: OHLCV History (Smart Incremental)
────────────────────────────────────────
▶ Running fetch_all_ohlcv.py...
Fetching live snapshots for stocks (Today's data)...
Processing 2775 stocks with 15 threads...
[████████████████████████████] 2775/2775 (100%)
Updated: 2775 | Skipped: 0 | Failed: 0
✅ fetch_all_ohlcv.py (142.5s)
First run: ~30 min (fetching 500+ days per stock)Incremental run: ~2-5 min (fetching 1-2 days per stock) Verify updated output
Check the timestamp of the output file:ls -lh all_stocks_fundamental_analysis.json.gz
-rw-r--r-- 1 user user 2.1M Mar 4 16:15 all_stocks_fundamental_analysis.json.gz
Adjust Thread Count
For faster incremental updates, increase parallelization:
Edit fetch_all_ohlcv.py line 14:
MAX_THREADS = 20 # Default: 15
Trade-offs:
- Higher threads = faster execution
- Too many threads = rate limiting or connection errors
- Recommended range: 10-25 threads
Skip OHLCV for Quick Refresh
If you only need fundamental/event updates without price data:
Edit run_full_pipeline.py line 64:
Result: Runtime drops to ~4 minutes, but these fields will be zero:
- ADR (Average Daily Range)
- RVOL (Relative Volume)
- ATH (All-Time High) and % from ATH
- All returns calculations (1D, 1W, 1M, 3M, 6M, 1Y)
Run with FETCH_OHLCV = True later to backfill.
Selective Script Execution
If you only need specific data updated, run individual scripts:
Update Only Fundamental Data
python3 fetch_dhan_data.py
python3 fetch_fundamental_data.py
python3 bulk_market_analyzer.py
Runtime: ~1 minute
Update Only Technical Indicators
python3 fetch_dhan_data.py # For live prices
python3 fetch_advanced_indicators.py
python3 fetch_all_ohlcv.py # Incremental
python3 advanced_metrics_processor.py
Runtime: ~3-4 minutes
Update Only Events & News
python3 fetch_company_filings.py
python3 fetch_market_news.py
python3 fetch_corporate_actions.py
python3 fetch_bulk_block_deals.py
python3 add_corporate_events.py
Runtime: ~2 minutes
Note: After selective execution, run compression manually:
python3 -c "import gzip, json
with open('all_stocks_fundamental_analysis.json', 'rb') as f_in:
with gzip.open('all_stocks_fundamental_analysis.json.gz', 'wb', compresslevel=9) as f_out:
f_out.write(f_in.read())"
Automated Daily Updates
Using Cron (Linux/Mac)
Schedule automatic execution after market close:
Add pipeline job
Run daily at 4:00 PM IST (after market close):# Mon-Fri at 4:00 PM
0 16 * * 1-5 cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/ && /usr/bin/python3 run_full_pipeline.py >> ~/pipeline.log 2>&1
Monitor execution
Check log file after 4 PM:
Using systemd Timer (Linux)
For more control and better logging:
Create service file
sudo nano /etc/systemd/system/edl-pipeline.service
[Unit]
Description=ChartsMaze EDL Pipeline Daily Update
After=network.target
[Service]
Type=oneshot
User=YOUR_USERNAME
WorkingDirectory=/home/YOUR_USERNAME/workspace/source/DO NOT DELETE EDL PIPELINE
ExecStart=/usr/bin/python3 run_full_pipeline.py
StandardOutput=append:/var/log/edl-pipeline.log
StandardError=append:/var/log/edl-pipeline.log
[Install]
WantedBy=multi-user.target
Create timer file
sudo nano /etc/systemd/system/edl-pipeline.timer
[Unit]
Description=Run EDL Pipeline Daily at 4 PM
[Timer]
OnCalendar=Mon-Fri 16:00:00
Persistent=true
[Install]
WantedBy=timers.target
Enable and start timer
sudo systemctl daemon-reload
sudo systemctl enable edl-pipeline.timer
sudo systemctl start edl-pipeline.timer
Check timer status
sudo systemctl status edl-pipeline.timer
systemctl list-timers edl-pipeline.timer
Monitoring & Alerts
Log Analysis
The pipeline outputs structured logs. Parse for key metrics:
# Extract runtime
grep "Total Time:" ~/pipeline.log | tail -1
# Check for failures
grep "Failed:" ~/pipeline.log | tail -1
# List failed scripts
grep "❌" ~/pipeline.log | tail -20
# Verify output file created
grep "Output:" ~/pipeline.log | tail -1
Error Notifications
Send email if pipeline fails:
#!/bin/bash
cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/
python3 run_full_pipeline.py > /tmp/pipeline_output.log 2>&1
if [ $? -ne 0 ]; then
mail -s "EDL Pipeline Failed" [email protected] < /tmp/pipeline_output.log
fi
Slack Webhook Integration
Notify Slack on completion:
# Add to end of run_full_pipeline.py
import requests
webhook_url = "https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
if failed == 0:
message = f"✅ EDL Pipeline completed in {total_time/60:.1f} min"
else:
message = f"⚠️ EDL Pipeline completed with {failed} failures in {total_time/60:.1f} min"
requests.post(webhook_url, json={"text": message})
Data Validation
Verify Output Integrity
After each update, validate the output:
import gzip
import json
from datetime import datetime
# Decompress and load
with gzip.open('all_stocks_fundamental_analysis.json.gz', 'rb') as f:
data = json.load(f)
# Basic checks
assert len(data) > 2700, f"Too few stocks: {len(data)}"
assert all('Symbol' in stock for stock in data), "Missing symbols"
assert all('Stock Price(₹)' in stock for stock in data), "Missing prices"
print(f"✅ Validation passed: {len(data)} stocks")
# Check data freshness
sample = data[0]
if 'Latest Quarter' in sample:
print(f"Latest quarter data: {sample['Latest Quarter']}")
Compare with Previous Run
import gzip
import json
# Load current and previous
with gzip.open('all_stocks_fundamental_analysis.json.gz', 'rb') as f:
current = json.load(f)
with gzip.open('all_stocks_fundamental_analysis.json.gz.backup', 'rb') as f:
previous = json.load(f)
# Find stocks with significant changes
for curr, prev in zip(current, previous):
if curr['Symbol'] != prev['Symbol']:
continue
curr_price = curr.get('Stock Price(₹)', 0)
prev_price = prev.get('Stock Price(₹)', 0)
if prev_price > 0:
change_pct = ((curr_price - prev_price) / prev_price) * 100
if abs(change_pct) > 5: # 5% threshold
print(f"{curr['Symbol']}: {prev_price:.2f} → {curr_price:.2f} ({change_pct:+.2f}%)")
Backup Strategy
Archive Previous Versions
Before each update, backup the previous output:
#!/bin/bash
cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/
# Create dated backup
if [ -f all_stocks_fundamental_analysis.json.gz ]; then
cp all_stocks_fundamental_analysis.json.gz \
"backups/all_stocks_$(date +%Y%m%d_%H%M%S).json.gz"
fi
# Keep only last 7 days
find backups/ -name "all_stocks_*.json.gz" -mtime +7 -delete
# Run pipeline
python3 run_full_pipeline.py
OHLCV Data Backup
The ohlcv_data/ directory grows over time (~200 MB). Backup weekly:
# Weekly backup (run on Sundays)
tar -czf ohlcv_backup_$(date +%Y%m%d).tar.gz ohlcv_data/
mv ohlcv_backup_*.tar.gz ~/backups/
# Keep only last 4 weeks
find ~/backups/ -name "ohlcv_backup_*.tar.gz" -mtime +28 -delete
Troubleshooting Incremental Updates
OHLCV Not Updating Incrementally
Symptom: Phase 2.5 still takes 30 minutes instead of 2-5 minutes
Cause: CSV files may be corrupted or have incorrect last dates
Solutions:
-
Check a sample CSV for integrity:
tail -5 ohlcv_data/RELIANCE.csv
Last row should have today’s or yesterday’s date.
-
Verify last date parsing:
import csv
with open('ohlcv_data/RELIANCE.csv', 'r') as f:
rows = list(csv.DictReader(f))
print(f"Last date: {rows[-1]['Date']}")
-
If corrupted, delete specific CSV to re-fetch:
rm ohlcv_data/RELIANCE.csv
python3 fetch_all_ohlcv.py # Will re-fetch full history for RELIANCE
Missing Recent Data
Symptom: Latest quarter or news not showing in output
Cause: Source API may not have published data yet
Solutions:
- Wait 1-2 hours after market close for data availability
- Check source manually (Dhan ScanX website)
- Re-run pipeline after delay
Stale Event Markers
Symptom: Old events still showing (e.g., “Results Recently Out” from 10 days ago)
Cause: Event marker logic uses fixed time windows (7 days for results, 15 days for insider trading)
Solution: This is expected behavior. Events auto-expire after their window:
- Results: 7 days
- Insider Trading: 15 days
- Block Deals: 7 days
If marker persists beyond window, check add_corporate_events.py logic.
Incremental Fetch Skipping Dates
Symptom: Some dates missing in OHLCV (e.g., 2026-03-03 present, 2026-03-04 missing)
Cause: Market holiday or trading halt
Solution: This is normal. OHLCV only contains trading days. Non-trading days (weekends, holidays) are automatically skipped.
Best Practices for Incremental Updates
- Run once daily after market close (after 3:30 PM IST)
- Keep FETCH_OHLCV=True for continuous incremental updates
- Monitor first few incremental runs to ensure 2-5 min runtime
- Backup before first incremental run to test rollback
- Validate output after each run with automated checks
- Archive old outputs with date stamps for historical analysis
- Set up failure alerts to catch issues immediately
- Test manual execution before automating with cron/systemd
Next Steps