The ETL pipeline begins by generating three interdependent synthetic datasets that model real-world candy retail behaviour. All three scripts useDocumentation Index
Fetch the complete documentation index at: https://mintlify.com/andresshm/fini-marketing-intelligence/llms.txt
Use this file to discover all available pages before exploring further.
random.seed(42), guaranteeing that every run produces identical data — essential for reproducible analytics, model training, and dashboard development. Products are created first, then customers, and finally sales (which reference both). The outputs land in data/raw/ as CSV files ready for validation and loading.
All three generators call
random.seed(42) at the top of their script. This fixed seed ensures that the exact same products, customers, and sales records are produced on every run, making the full pipeline deterministic and safe to share across development environments.- Products
- Customers
- Sales
Products Generator
etl/generate_products.py defines a catalogue of 20 Fini candy products, each with a category, a seasonal affinity, a randomly derived cost and price, and a randomised launch date.Product catalogue
The 20 products are hardcoded as a list of(name, category, season) tuples, ensuring the catalogue never changes between runs:| Product | Category | Season |
|---|---|---|
| Tropical Mix | Gummies | All Year |
| Sour Cola Bottles | Gummies | All Year |
| Watermelon Slices | Gummies | Summer |
| Strawberry Belts | Belts | All Year |
| Rainbow Belts | Belts | All Year |
| Halloween Mix | Seasonal | Halloween |
| Christmas Mix | Seasonal | Christmas |
| Marshmallow Twist | Marshmallow | All Year |
| Watermelon Marshmallow | Marshmallow | Summer |
| Regaliz Twist | Licorice | All Year |
| Sour Worms | Gummies | All Year |
| Bubblegum Bottles | Gummies | Summer |
| Candy Bananas | Foam | All Year |
| Jelly Hearts | Gummies | Valentine |
| Mini Burgers | Novelty | All Year |
| Fried Eggs | Foam | All Year |
| Sharks | Gummies | Summer |
| Fruit Rings | Gummies | All Year |
| Spooky Teeth | Seasonal | Halloween |
| Snowflakes | Seasonal | Christmas |
Pricing logic
For each product,unit_cost is drawn uniformly from €0.20–€1.00, a markup multiplier is drawn from 1.8–2.8, and unit_price is their product rounded to two decimal places:2022, 2023, 2024, or 2025) plus a random month (1–12) and day (1–28):Full source
data/raw/products.csv — 20 rows, columns: product_id, product_name, category, season, launch_date, unit_cost, unit_price.Running the generators
Run the scripts in order — sales depends on the product and customer CSVs already existing:
Alternatively, use the full pipeline runner which executes all steps automatically: