Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/Anny26022/chartsmaze_clone/llms.txt

Use this file to discover all available pages before exploring further.

Overview

The full pipeline executes 16 scripts in strict dependency order to produce all_stocks_fundamental_analysis.json.gz - a comprehensive dataset of 2,775+ Indian stocks with 86 fields per stock covering fundamentals, technicals, events, and sentiment. Expected Runtime: ~4 minutes (without OHLCV) | ~34 minutes (with OHLCV first-time fetch)

Quick Start

1

Navigate to pipeline directory

cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/
2

Run the pipeline

python3 run_full_pipeline.py
The script will automatically:
  • Fetch data from all sources (Dhan ScanX, NSE)
  • Build the master JSON structure
  • Enrich with technical indicators, events, and news
  • Compress output to .json.gz format
  • Clean up intermediate files
3

Verify output

Check for the final compressed file:
ls -lh all_stocks_fundamental_analysis.json.gz
Expected size: ~2 MB (compressed from ~50-60 MB raw JSON)

Configuration Options

Edit run_full_pipeline.py to customize behavior:

OHLCV Data Fetching

FETCH_OHLCV = True  # Default: True
When True:
  • First run: Downloads complete OHLCV history (~30 min for all stocks)
  • Subsequent runs: Incremental update only (~2-5 min)
  • Enables: ADR, RVOL, ATH, % from ATH, returns calculations
When False:
  • Skips OHLCV entirely
  • ADR, RVOL, ATH fields will be 0
  • Runtime: ~4 minutes

Optional Standalone Data

FETCH_OPTIONAL = False  # Default: False
When True: Also fetches (not included in master JSON):
  • all_indices_list.json - 194 market indices
  • etf_data_response.json - 361 ETFs

Auto-Cleanup

CLEANUP_INTERMEDIATE = True  # Default: True
When True: Removes intermediate files after successful completion, keeping only:
  • all_stocks_fundamental_analysis.json.gz
  • sector_analytics.json.gz
  • market_breadth.json.gz
  • ohlcv_data/ directory (if FETCH_OHLCV=True)
When False: Retains all intermediate JSON files for debugging

Pipeline Phases

The pipeline executes in strict order:

Phase 1: Core Data (Foundation)

1. fetch_dhan_data.py          โ†’ dhan_data_response.json + master_isin_map.json
2. fetch_fundamental_data.py   โ†’ fundamental_data.json
3. NSE CSV download            โ†’ nse_equity_list.csv (listing dates)
Critical: fetch_dhan_data.py must succeed - it creates master_isin_map.json which all other scripts need.

Phase 2: Data Enrichment (Fetching)

3.  fetch_company_filings.py       โ†’ company_filings/*.json
4.  fetch_new_announcements.py     โ†’ all_company_announcements.json
5.  fetch_advanced_indicators.py   โ†’ advanced_indicator_data.json
6.  fetch_market_news.py           โ†’ market_news/*.json
7.  fetch_corporate_actions.py     โ†’ upcoming/history_corporate_actions.json
8.  fetch_surveillance_lists.py    โ†’ nse_asm_list.json, nse_gsm_list.json
9.  fetch_circuit_stocks.py        โ†’ upper/lower_circuit_stocks.json
10. fetch_bulk_block_deals.py      โ†’ bulk_block_deals.json
11. fetch_incremental_price_bands.py โ†’ incremental_price_bands.json
12. fetch_complete_price_bands.py    โ†’ complete_price_bands.json
13. fetch_all_indices.py           โ†’ all_indices_list.json

Phase 2.5: OHLCV History (Smart Incremental)

14. fetch_all_ohlcv.py         โ†’ ohlcv_data/*.csv
15. fetch_indices_ohlcv.py     โ†’ (indices OHLCV)
Smart Incremental Logic:
  • Checks existing CSV files in ohlcv_data/
  • Only fetches missing dates since last update
  • First run: Fetches up to 2 years of history per stock
  • Daily updates: Only fetches 1-2 days of new data

Phase 3: Base Analysis

16. bulk_market_analyzer.py    โ†’ all_stocks_fundamental_analysis.json (BASE)
Creates the master JSON structure with fundamental data for all stocks.

Phase 4: Enrichment (Order Matters!)

17. advanced_metrics_processor.py   โ†’ Adds ADR, RVOL, ATH, Turnover
18. process_earnings_performance.py โ†’ Adds post-earnings returns
19. enrich_fno_data.py              โ†’ Adds F&O flag, Lot Size, Next Expiry
20. process_market_breadth.py       โ†’ Generates sector analytics
21. process_historical_market_breadth.py โ†’ Generates breadth charts
22. add_corporate_events.py         โ†’ Adds Events, Announcements, News (LAST!)
Critical: add_corporate_events.py MUST run last as it performs final JSON injection.

Phase 5: Compression

Compress all output files:
- all_stocks_fundamental_analysis.json โ†’ .json.gz
- sector_analytics.json โ†’ .json.gz
- market_breadth.csv โ†’ .json.gz

Compression ratio: ~90-95% size reduction

Output Files

Primary Output

Location: ~/workspace/source/DO NOT DELETE EDL PIPELINE/all_stocks_fundamental_analysis.json.gz Format: Gzip-compressed JSON array Structure:
[
  {
    "Symbol": "RELIANCE",
    "Name": "Reliance Industries Limited",
    "Market Cap(Cr.)": 1850000,
    "Stock Price(โ‚น)": 2734.50,
    "P/E": 28.5,
    "ROE(%)": 15.2,
    "Latest Quarter": "Dec 2025",
    "Net Profit Latest": 18200,
    "QoQ % Net Profit Latest": 5.3,
    "YoY % Net Profit Latest": 12.7,
    "RSI (14)": 62.5,
    "Event Markers": "๐Ÿ’ธ: Dividend (15-Mar)",
    "Recent Announcements": [...],
    "News Feed": [...]
    // ... 86 total fields
  },
  // ... 2,775+ stocks
]
Decompression:
import gzip
import json

with gzip.open('all_stocks_fundamental_analysis.json.gz', 'rb') as f:
    data = json.load(f)

print(f"Total stocks: {len(data)}")
print(f"Fields per stock: {len(data[0])}")

Secondary Outputs

FileSizeDescription
sector_analytics.json.gz~500 KBSector-wise aggregated metrics
market_breadth.json.gz~8 MBHistorical market breadth data
ohlcv_data/*.csv~200 MBIndividual stock OHLCV history
all_indices_list.json~85 KBMarket indices data (if FETCH_OPTIONAL=True)

Runtime Breakdown

First-Time Execution (with OHLCV)

Phase 1: Core Data                    ~30s
Phase 2: Data Enrichment              ~90s
Phase 2.5: OHLCV History (first)      ~30 min
Phase 3: Base Analysis                ~20s
Phase 4: Enrichment                   ~45s
Phase 5: Compression                  ~15s
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Total:                                ~34 min

Daily Update (with incremental OHLCV)

Phase 1: Core Data                    ~30s
Phase 2: Data Enrichment              ~90s
Phase 2.5: OHLCV Incremental          ~2-5 min
Phase 3: Base Analysis                ~20s
Phase 4: Enrichment                   ~45s
Phase 5: Compression                  ~15s
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Total:                                ~6-9 min

Without OHLCV

Phase 1: Core Data                    ~30s
Phase 2: Data Enrichment              ~90s
Phase 3: Base Analysis                ~20s
Phase 4: Enrichment                   ~30s
Phase 5: Compression                  ~15s
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Total:                                ~4 min

Console Output Example

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
  EDL PIPELINE - FULL DATA REFRESH
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

๐Ÿ“ฆ PHASE 1: Core Data (Foundation)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  โ–ถ Running fetch_dhan_data.py...
  โœ… fetch_dhan_data.py (12.3s)
  โ–ถ Running fetch_fundamental_data.py...
  โœ… fetch_fundamental_data.py (18.7s)
  โ–ถ Downloading NSE Listing Dates...
  โœ… NSE Listing Dates downloaded.

๐Ÿ“ก PHASE 2: Data Enrichment (Fetching)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  โ–ถ Running fetch_company_filings.py...
  โœ… fetch_company_filings.py (45.2s)
  ...

๐Ÿ“Š PHASE 2.5: OHLCV History (Smart Incremental)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  โ–ถ Running fetch_all_ohlcv.py...
  โœ… fetch_all_ohlcv.py (142.5s)

๐Ÿ”ฌ PHASE 3: Base Analysis (Building Master JSON)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  โ–ถ Running bulk_market_analyzer.py...
  โœ… bulk_market_analyzer.py (19.8s)

โœจ PHASE 4: Enrichment (Injecting into Master JSON)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  โ–ถ Running advanced_metrics_processor.py...
  โœ… advanced_metrics_processor.py (8.2s)
  ...

๐Ÿ“ฆ PHASE 5: Compression (.json โ†’ .json.gz)
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  ๐Ÿ“ฆ Compressed: 58.3 MB โ†’ 2.1 MB (96% reduction)

๐Ÿงน CLEANUP: Removing intermediate files...
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
  ๐Ÿ—‘๏ธ  Cleaned: 13 files + 2 dirs (56.2 MB freed)

โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
  PIPELINE COMPLETE
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•
  Total Time:  245.7s (4.1 min)
  Successful:  22/22
  Failed:      0/22

  ๐Ÿ“„ Output: all_stocks_fundamental_analysis.json.gz (2.1 MB)
  ๐Ÿ“ฆ Compression: 58.3 MB โ†’ 2.1 MB (96% smaller)
  ๐Ÿงน Only .json.gz + ohlcv_data/ remain. All intermediate data purged.
โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

Troubleshooting

Pipeline Fails at fetch_dhan_data.py

Error: CRITICAL: fetch_dhan_data.py failed. Cannot continue. Cause: This script fetches the master stock list and creates master_isin_map.json which all other scripts need. Solutions:
  • Check internet connectivity
  • Verify Dhan API endpoint is accessible
  • Check if rate-limited (wait 5 minutes and retry)
  • Inspect error message in console output

OHLCV Fetch Takes Too Long

Symptom: Phase 2.5 exceeds 30 minutes Solutions:
  • First run is expected to take ~30 min for full history
  • Reduce thread count: Edit fetch_all_ohlcv.py, set MAX_THREADS = 10 (line 14)
  • For faster daily updates, keep existing ohlcv_data/ directory - it will only fetch new dates
  • If not needed immediately, set FETCH_OHLCV = False and run later

Script Times Out

Error: โฐ {script_name} TIMED OUT (>30 min) Cause: Individual script timeout is set to 30 minutes (1800 seconds) Solutions:
  • Check network stability
  • Increase timeout in run_full_pipeline.py line 117: timeout=3600 (1 hour)
  • Run the individual script manually to see detailed error

Compression Fails

Error: Files to compress not found Cause: Phase 3 or Phase 4 failed to produce expected output files Solutions:
  • Check console for which Phase 4 script failed
  • Run pipeline with CLEANUP_INTERMEDIATE = False to inspect intermediate files
  • Verify all_stocks_fundamental_analysis.json exists before compression

Memory Issues

Symptom: Process killed or out of memory errors Solutions:
  • Free up system RAM (close other applications)
  • Reduce parallelization: Lower thread counts in fetcher scripts
  • Process in batches: Set FETCH_OPTIONAL = False
  • Pipeline requires ~2-4 GB RAM for full execution

Partial Data in Output

Symptom: Some stocks missing fields or empty values Cause: Non-critical enrichment scripts failed but pipeline continued Solutions:
  • Check console output for failed scripts (marked with โŒ)
  • Pipeline continues even if enrichment fails (line 126: return True)
  • Re-run pipeline to retry failed fetches
  • Some data sources may be temporarily unavailable (ASM/GSM lists, news feed)

Manual Script Execution

If you need to run individual scripts for debugging:
cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/

# Core data (must run first)
python3 fetch_dhan_data.py
python3 fetch_fundamental_data.py

# Any enrichment script (requires master_isin_map.json)
python3 fetch_company_filings.py
python3 fetch_market_news.py

# OHLCV (requires dhan_data_response.json)
python3 fetch_all_ohlcv.py

# Base analysis (requires all fetched data)
python3 bulk_market_analyzer.py

# Enrichment (requires all_stocks_fundamental_analysis.json to exist)
python3 advanced_metrics_processor.py
python3 add_corporate_events.py  # Must be last!

Best Practices

Daily Updates

  • Run once per day after market close (after 3:30 PM IST)
  • Keep FETCH_OHLCV = True for incremental updates
  • OHLCV incremental fetch only takes 2-5 minutes
  • Set up a cron job for automated daily execution:
# Run at 4 PM IST daily
0 16 * * 1-5 cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/ && python3 run_full_pipeline.py >> pipeline.log 2>&1

First-Time Setup

  • Allow 30-40 minutes for first run with OHLCV
  • Verify output file exists and is properly formatted
  • Test decompression with a JSON parser
  • Keep intermediate files for first run (CLEANUP_INTERMEDIATE = False)

Production Environment

  • Monitor disk space (OHLCV data grows to ~200 MB)
  • Archive old .json.gz files with timestamps
  • Set up error alerting for pipeline failures
  • Keep logs of each run for debugging

Next Steps

Build docs developers (and LLMs) love