The EDL Pipeline automatically compresses output files toDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/Anny26022/chartsmaze_clone/llms.txt
Use this file to discover all available pages before exploring further.
.json.gz format, reducing storage requirements by 60-70% while maintaining full data integrity.
Compression Overview
Compression is performed in Phase 5 of the pipeline using gzip compression level 9 (maximum compression).Files Compressed
The pipeline compresses three primary output files:Stock Analysis
all_stocks_fundamental_analysis.json~50 MB → ~2 MB (96% reduction)Sector Analytics
sector_analytics.jsonPerformance by sector/industryMarket Breadth
market_breadth.csvDaily breadth metricsCompression Implementation
The compression logic is implemented inrun_full_pipeline.py (lines 136-166):
Compression Statistics
Typical compression results:| File | Raw Size | Compressed | Reduction |
|---|---|---|---|
all_stocks_fundamental_analysis.json | ~50 MB | ~2 MB | 96% |
sector_analytics.json | ~500 KB | ~50 KB | 90% |
market_breadth.csv | ~100 KB | ~10 KB | 90% |
| Total | ~51 MB | ~2 MB | 96% |
The compression ratio varies based on data structure. JSON with repetitive field names compresses exceptionally well with gzip.
Reading Compressed Files
Python
Command Line
pandas
Compression Level Comparison
Gzip supports compression levels 1-9:| Level | Speed | Ratio | Use Case |
|---|---|---|---|
| 1 | Fastest | ~85% | Real-time processing |
| 6 | Balanced | ~92% | Default gzip |
| 9 | Slowest | ~96% | EDL Pipeline (max compression) |
Disabling Compression
If you need uncompressed output for compatibility:-
Edit
run_full_pipeline.py(line 297): -
Keep raw JSON files in cleanup:
Edit
INTERMEDIATE_FILES(line 92) to remove:
Storage Considerations
Disk Space Requirements
- With compression: ~2 MB (final output)
- Without compression: ~50 MB (raw JSON)
- With OHLCV data: +200 MB (CSV files, not compressed)
Backup Strategy
Performance Impact
Compression Performance Benchmarks
Compression Performance Benchmarks
- Time to compress: ~2 seconds for 50 MB JSON
- Time to decompress: ~1 second (reading into memory)
- CPU overhead: ~5% of total pipeline runtime
- I/O savings: 96% less disk writes, 96% faster network transfers
Next Steps
Cleanup Options
Learn about intermediate file cleanup after compression
Working with Output
Parse and analyze compressed output files