Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Anny26022/chartsmaze_clone/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The full pipeline executes 16 scripts in strict dependency order to produce all_stocks_fundamental_analysis.json.gz - a comprehensive dataset of 2,775+ Indian stocks with 86 fields per stock covering fundamentals, technicals, events, and sentiment.
Expected Runtime: ~4 minutes (without OHLCV) | ~34 minutes (with OHLCV first-time fetch)
Quick Start
Navigate to pipeline directory
cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/
Run the pipeline
python3 run_full_pipeline.py
The script will automatically:
- Fetch data from all sources (Dhan ScanX, NSE)
- Build the master JSON structure
- Enrich with technical indicators, events, and news
- Compress output to
.json.gz format
- Clean up intermediate files
Verify output
Check for the final compressed file:ls -lh all_stocks_fundamental_analysis.json.gz
Expected size: ~2 MB (compressed from ~50-60 MB raw JSON)
Configuration Options
Edit run_full_pipeline.py to customize behavior:
OHLCV Data Fetching
FETCH_OHLCV = True # Default: True
When True:
- First run: Downloads complete OHLCV history (~30 min for all stocks)
- Subsequent runs: Incremental update only (~2-5 min)
- Enables: ADR, RVOL, ATH, % from ATH, returns calculations
When False:
- Skips OHLCV entirely
- ADR, RVOL, ATH fields will be 0
- Runtime: ~4 minutes
Optional Standalone Data
FETCH_OPTIONAL = False # Default: False
When True: Also fetches (not included in master JSON):
all_indices_list.json - 194 market indices
etf_data_response.json - 361 ETFs
Auto-Cleanup
CLEANUP_INTERMEDIATE = True # Default: True
When True: Removes intermediate files after successful completion, keeping only:
all_stocks_fundamental_analysis.json.gz
sector_analytics.json.gz
market_breadth.json.gz
ohlcv_data/ directory (if FETCH_OHLCV=True)
When False: Retains all intermediate JSON files for debugging
Pipeline Phases
The pipeline executes in strict order:
Phase 1: Core Data (Foundation)
1. fetch_dhan_data.py โ dhan_data_response.json + master_isin_map.json
2. fetch_fundamental_data.py โ fundamental_data.json
3. NSE CSV download โ nse_equity_list.csv (listing dates)
Critical: fetch_dhan_data.py must succeed - it creates master_isin_map.json which all other scripts need.
Phase 2: Data Enrichment (Fetching)
3. fetch_company_filings.py โ company_filings/*.json
4. fetch_new_announcements.py โ all_company_announcements.json
5. fetch_advanced_indicators.py โ advanced_indicator_data.json
6. fetch_market_news.py โ market_news/*.json
7. fetch_corporate_actions.py โ upcoming/history_corporate_actions.json
8. fetch_surveillance_lists.py โ nse_asm_list.json, nse_gsm_list.json
9. fetch_circuit_stocks.py โ upper/lower_circuit_stocks.json
10. fetch_bulk_block_deals.py โ bulk_block_deals.json
11. fetch_incremental_price_bands.py โ incremental_price_bands.json
12. fetch_complete_price_bands.py โ complete_price_bands.json
13. fetch_all_indices.py โ all_indices_list.json
Phase 2.5: OHLCV History (Smart Incremental)
14. fetch_all_ohlcv.py โ ohlcv_data/*.csv
15. fetch_indices_ohlcv.py โ (indices OHLCV)
Smart Incremental Logic:
- Checks existing CSV files in
ohlcv_data/
- Only fetches missing dates since last update
- First run: Fetches up to 2 years of history per stock
- Daily updates: Only fetches 1-2 days of new data
Phase 3: Base Analysis
16. bulk_market_analyzer.py โ all_stocks_fundamental_analysis.json (BASE)
Creates the master JSON structure with fundamental data for all stocks.
Phase 4: Enrichment (Order Matters!)
17. advanced_metrics_processor.py โ Adds ADR, RVOL, ATH, Turnover
18. process_earnings_performance.py โ Adds post-earnings returns
19. enrich_fno_data.py โ Adds F&O flag, Lot Size, Next Expiry
20. process_market_breadth.py โ Generates sector analytics
21. process_historical_market_breadth.py โ Generates breadth charts
22. add_corporate_events.py โ Adds Events, Announcements, News (LAST!)
Critical: add_corporate_events.py MUST run last as it performs final JSON injection.
Phase 5: Compression
Compress all output files:
- all_stocks_fundamental_analysis.json โ .json.gz
- sector_analytics.json โ .json.gz
- market_breadth.csv โ .json.gz
Compression ratio: ~90-95% size reduction
Output Files
Primary Output
Location: ~/workspace/source/DO NOT DELETE EDL PIPELINE/all_stocks_fundamental_analysis.json.gz
Format: Gzip-compressed JSON array
Structure:
[
{
"Symbol": "RELIANCE",
"Name": "Reliance Industries Limited",
"Market Cap(Cr.)": 1850000,
"Stock Price(โน)": 2734.50,
"P/E": 28.5,
"ROE(%)": 15.2,
"Latest Quarter": "Dec 2025",
"Net Profit Latest": 18200,
"QoQ % Net Profit Latest": 5.3,
"YoY % Net Profit Latest": 12.7,
"RSI (14)": 62.5,
"Event Markers": "๐ธ: Dividend (15-Mar)",
"Recent Announcements": [...],
"News Feed": [...]
// ... 86 total fields
},
// ... 2,775+ stocks
]
Decompression:
import gzip
import json
with gzip.open('all_stocks_fundamental_analysis.json.gz', 'rb') as f:
data = json.load(f)
print(f"Total stocks: {len(data)}")
print(f"Fields per stock: {len(data[0])}")
Secondary Outputs
| File | Size | Description |
|---|
sector_analytics.json.gz | ~500 KB | Sector-wise aggregated metrics |
market_breadth.json.gz | ~8 MB | Historical market breadth data |
ohlcv_data/*.csv | ~200 MB | Individual stock OHLCV history |
all_indices_list.json | ~85 KB | Market indices data (if FETCH_OPTIONAL=True) |
Runtime Breakdown
First-Time Execution (with OHLCV)
Phase 1: Core Data ~30s
Phase 2: Data Enrichment ~90s
Phase 2.5: OHLCV History (first) ~30 min
Phase 3: Base Analysis ~20s
Phase 4: Enrichment ~45s
Phase 5: Compression ~15s
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Total: ~34 min
Daily Update (with incremental OHLCV)
Phase 1: Core Data ~30s
Phase 2: Data Enrichment ~90s
Phase 2.5: OHLCV Incremental ~2-5 min
Phase 3: Base Analysis ~20s
Phase 4: Enrichment ~45s
Phase 5: Compression ~15s
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Total: ~6-9 min
Without OHLCV
Phase 1: Core Data ~30s
Phase 2: Data Enrichment ~90s
Phase 3: Base Analysis ~20s
Phase 4: Enrichment ~30s
Phase 5: Compression ~15s
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Total: ~4 min
Console Output Example
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
EDL PIPELINE - FULL DATA REFRESH
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ฆ PHASE 1: Core Data (Foundation)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โถ Running fetch_dhan_data.py...
โ
fetch_dhan_data.py (12.3s)
โถ Running fetch_fundamental_data.py...
โ
fetch_fundamental_data.py (18.7s)
โถ Downloading NSE Listing Dates...
โ
NSE Listing Dates downloaded.
๐ก PHASE 2: Data Enrichment (Fetching)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โถ Running fetch_company_filings.py...
โ
fetch_company_filings.py (45.2s)
...
๐ PHASE 2.5: OHLCV History (Smart Incremental)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โถ Running fetch_all_ohlcv.py...
โ
fetch_all_ohlcv.py (142.5s)
๐ฌ PHASE 3: Base Analysis (Building Master JSON)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โถ Running bulk_market_analyzer.py...
โ
bulk_market_analyzer.py (19.8s)
โจ PHASE 4: Enrichment (Injecting into Master JSON)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โถ Running advanced_metrics_processor.py...
โ
advanced_metrics_processor.py (8.2s)
...
๐ฆ PHASE 5: Compression (.json โ .json.gz)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ฆ Compressed: 58.3 MB โ 2.1 MB (96% reduction)
๐งน CLEANUP: Removing intermediate files...
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐๏ธ Cleaned: 13 files + 2 dirs (56.2 MB freed)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
PIPELINE COMPLETE
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Total Time: 245.7s (4.1 min)
Successful: 22/22
Failed: 0/22
๐ Output: all_stocks_fundamental_analysis.json.gz (2.1 MB)
๐ฆ Compression: 58.3 MB โ 2.1 MB (96% smaller)
๐งน Only .json.gz + ohlcv_data/ remain. All intermediate data purged.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Troubleshooting
Pipeline Fails at fetch_dhan_data.py
Error: CRITICAL: fetch_dhan_data.py failed. Cannot continue.
Cause: This script fetches the master stock list and creates master_isin_map.json which all other scripts need.
Solutions:
- Check internet connectivity
- Verify Dhan API endpoint is accessible
- Check if rate-limited (wait 5 minutes and retry)
- Inspect error message in console output
OHLCV Fetch Takes Too Long
Symptom: Phase 2.5 exceeds 30 minutes
Solutions:
- First run is expected to take ~30 min for full history
- Reduce thread count: Edit
fetch_all_ohlcv.py, set MAX_THREADS = 10 (line 14)
- For faster daily updates, keep existing
ohlcv_data/ directory - it will only fetch new dates
- If not needed immediately, set
FETCH_OHLCV = False and run later
Script Times Out
Error: โฐ {script_name} TIMED OUT (>30 min)
Cause: Individual script timeout is set to 30 minutes (1800 seconds)
Solutions:
- Check network stability
- Increase timeout in
run_full_pipeline.py line 117: timeout=3600 (1 hour)
- Run the individual script manually to see detailed error
Compression Fails
Error: Files to compress not found
Cause: Phase 3 or Phase 4 failed to produce expected output files
Solutions:
- Check console for which Phase 4 script failed
- Run pipeline with
CLEANUP_INTERMEDIATE = False to inspect intermediate files
- Verify
all_stocks_fundamental_analysis.json exists before compression
Memory Issues
Symptom: Process killed or out of memory errors
Solutions:
- Free up system RAM (close other applications)
- Reduce parallelization: Lower thread counts in fetcher scripts
- Process in batches: Set
FETCH_OPTIONAL = False
- Pipeline requires ~2-4 GB RAM for full execution
Partial Data in Output
Symptom: Some stocks missing fields or empty values
Cause: Non-critical enrichment scripts failed but pipeline continued
Solutions:
- Check console output for failed scripts (marked with โ)
- Pipeline continues even if enrichment fails (line 126:
return True)
- Re-run pipeline to retry failed fetches
- Some data sources may be temporarily unavailable (ASM/GSM lists, news feed)
Manual Script Execution
If you need to run individual scripts for debugging:
cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/
# Core data (must run first)
python3 fetch_dhan_data.py
python3 fetch_fundamental_data.py
# Any enrichment script (requires master_isin_map.json)
python3 fetch_company_filings.py
python3 fetch_market_news.py
# OHLCV (requires dhan_data_response.json)
python3 fetch_all_ohlcv.py
# Base analysis (requires all fetched data)
python3 bulk_market_analyzer.py
# Enrichment (requires all_stocks_fundamental_analysis.json to exist)
python3 advanced_metrics_processor.py
python3 add_corporate_events.py # Must be last!
Best Practices
Daily Updates
- Run once per day after market close (after 3:30 PM IST)
- Keep
FETCH_OHLCV = True for incremental updates
- OHLCV incremental fetch only takes 2-5 minutes
- Set up a cron job for automated daily execution:
# Run at 4 PM IST daily
0 16 * * 1-5 cd ~/workspace/source/DO\ NOT\ DELETE\ EDL\ PIPELINE/ && python3 run_full_pipeline.py >> pipeline.log 2>&1
First-Time Setup
- Allow 30-40 minutes for first run with OHLCV
- Verify output file exists and is properly formatted
- Test decompression with a JSON parser
- Keep intermediate files for first run (
CLEANUP_INTERMEDIATE = False)
Production Environment
- Monitor disk space (OHLCV data grows to ~200 MB)
- Archive old
.json.gz files with timestamps
- Set up error alerting for pipeline failures
- Keep logs of each run for debugging
Next Steps