The visualisations notebook (Documentation Index
Fetch the complete documentation index at: https://mintlify.com/Gema-Villanueva/proyecto-eda-roles-datos/llms.txt
Use this file to discover all available pages before exploring further.
04-visualizations.ipynb) is the deliverable-facing layer of the project. It was produced for the client DataTalent Solutions S.L. and translates the cleaned, EDA-enriched datasets into publication-quality charts, statistical tests, and interactive dashboards. The notebook is version 2.0 and is designed to consume the post-EDA data from data/eda/, with an automatic fallback that generates a realistic simulated dataset if no real data is available.
Libraries
squarify and statsmodels are optional. If squarify is not installed, treemap charts fall back to horizontal bar charts. If statsmodels is absent, OLS trendlines in Plotly scatter plots are disabled and scipy is used for Q-Q plots instead.Data loading and fallback behaviour
The notebook uses a priority-resolution strategy for loading data:Look for post-EDA data
data/eda/jobs_eda.csv (produced by 03_eda.ipynb). If found, uses the validated, in-memory-enriched version of the dataset.Fall back to clean data
data/eda/ does not exist or the file is missing, tries data/clean/jobs_all_clean.csv instead and prints a warning recommending that 03_eda.ipynb be run first.Generate a simulated dataset
simular_dataset(n=600, seed=42) generates a 600-record synthetic dataset with realistic distributions of roles, cities, modalities, seniority levels, sectors, salary ranges, and binary skill columns. This allows the notebook to run end-to-end for development and demo purposes without any real data.Analysis blocks
| Block | Title | Content | Output image |
|---|---|---|---|
| 0 | Configuration & Data Quality | Corporate palette setup, null heatmap, data-type distribution | 00_calidad_datos.png |
| 1 | Vacancy Distribution & Volume | Offers by city (bar), work modality (donut), seniority (bar); treemap of roles per city | 01_distribucion_volumen.png, 01b_treemap_roles_ciudad.png |
| 2 | Salary & Compensation Analysis | Boxplot by seniority (Kruskal-Wallis test), violin by modality, salary by city (± σ), salary histogram with normal curve | 02_analisis_salarial.png, 02b_salario_rol.png |
| 3 | Heatmaps & Correlations | Skill co-occurrence heatmap, skill×seniority heatmap, numerical correlation matrix | 03_heatmaps_correlaciones.png |
| 4 | Tech Stack & Used vs Wanted Gap | Lollipop Used vs Wanted gap chart, stacked bar by technology category, dot plot of most demanded skills | 04_tecnologias.png |
| 5 | Advanced Statistical Analysis | Q-Q plots, OLS regression (experience → salary), Mann-Whitney U test (remote vs on-site salary), percentile scatter by role | 05_estadistica_avanzada.png |
| 6 | Interactive Visualisations (Plotly) | Box chart: salary × seniority × role; scatter: experience vs salary with OLS trendline; sunburst: city → modality → role; interactive heatmap: sector × city salary | (rendered in notebook) |
| 7 | Executive Summary & Export | 9-KPI panel card grid; file inventory | 07_panel_kpis.png |
Block details
Block 0 — Configuration and data quality
Block 0 — Configuration and data quality
PALETA) used consistently across all charts:| Key | Hex | Usage |
|---|---|---|
primary | #1A365D | Main bars, titles, annotations |
secondary | #2B6CB0 | Secondary bars, pie slices |
accent | #4299E1 | Highlights, IQR markers |
warm | #ED8936 | Salary histograms, error bars |
success | #48BB78 | Junior category, positive indicators |
muted | #A0AEC0 | Subdued labels |
00_calidad_datos.png.Block 1 — Vacancy distribution and volume
Block 1 — Vacancy distribution and volume
- Top 8 cities — horizontal bar chart of offer count per
city_clean. - Work modality — donut chart showing percentage split of
remote_modality(Presencial / Híbrido / Remoto). - Seniority — annotated bar chart with count and percentage per level (Junior / Mid / Senior).
01b_treemap_roles_ciudad.png. If squarify is not installed, this falls back to a horizontal bar chart.Block 2 — Salary and compensation analysis
Block 2 — Salary and compensation analysis
salary_clean_outlier). The four sub-charts are:- Boxplot by seniority — includes a Kruskal-Wallis p-value annotation to show whether salary differences between Junior/Mid/Senior are statistically significant.
- Violin by modality — shows the full salary distribution shape for each work modality.
- Mean salary by city (±σ) — horizontal bar chart with standard-deviation error bars; only cities with ≥ 5 salary observations are included.
- Salary histogram — overlays a normal-distribution curve, median line, and mean line for visual skewness assessment.
02b_salario_rol.png) adds a strip-plot overlay on top of boxplots grouped by job_title.Block 3 — Heatmaps and correlations
Block 3 — Heatmaps and correlations
job_skills_long for real data).Block 4 — Tech stack and Used vs Wanted gap
Block 4 — Tech stack and Used vs Wanted gap
technology_rankings_used and technology_rankings_wanted (from data/eda/ or data/clean/). The lollipop gap chart shows technologies where the “wanted” percentage exceeds the “used” percentage — a positive gap indicates growing professional demand that has not yet been fully adopted. A positive gap for AWS and Docker suggests emerging cloud and containerisation demand.Block 5 — Advanced statistical analysis
Block 5 — Advanced statistical analysis
- Q-Q plots — assess salary normality per seniority group.
- OLS regression —
experience_yearsas predictor ofsalary_clean; rendered withstatsmodelsif available, otherwise disabled. - Mann-Whitney U test — compares remote vs on-site salary distributions without assuming normality.
- Percentile scatter by role — shows P10, P25, median, P75, P90 salary ranges per
job_titlefor a compact inter-role salary comparison.
Block 6 — Interactive Plotly visualisations
Block 6 — Interactive Plotly visualisations
- Chart 22 —
px.box: salary distribution by seniority, coloured by role. Supports zoom and hover. - Chart 23 —
px.scatter: experience vs salary, coloured by seniority, with OLS trendline (requiresstatsmodels). Hover shows job title and city. - Chart 24 —
px.sunburst: hierarchical breakdown city → modality → role for the top 6 cities. - Chart 25 —
go.Heatmap: median salary by sector and city, with in-cell text labels.
Block 7 — Executive summary and export
Block 7 — Executive summary and export
07_panel_kpis.png) on a dark background with nine business-critical metrics:| KPI | Description |
|---|---|
| Total Ofertas | Total offer count |
| Salario Mediano | Median salary (outliers excluded) |
| Salario Medio | Mean salary (outliers excluded) |
| Top Ciudad | City with most offers |
| Modalidad + frecuente | Most common work modality |
| % Remoto/Híbrido | Percentage of remote or hybrid offers |
| Rol más demandado | Most frequent job title |
| P10 Salarial | 10th percentile salary |
| P90 Salarial | 90th percentile salary |
images/ with its size in KB.Exported chart files
All PNG files are saved toimages/ at 200 DPI:
| File | Contents |
|---|---|
00_calidad_datos.png | Null heatmap and data-type distribution |
01_distribucion_volumen.png | City bar chart, modality donut, seniority bars |
01b_treemap_roles_ciudad.png | Treemap (or fallback bar chart) of roles per city |
02_analisis_salarial.png | Salary boxplot, violin, city bars, histogram |
02b_salario_rol.png | Salary boxplot + strip by job title |
03_heatmaps_correlaciones.png | Skill co-occurrence and correlation heatmaps |
04_tecnologias.png | Used vs Wanted gap lollipop and technology stacks |
05_estadistica_avanzada.png | Q-Q plots, OLS regression, Mann-Whitney, percentiles |
07_panel_kpis.png | Executive KPI panel (dark background) |
Analytical conclusions
Madrid & Barcelona dominate
Hybrid is the new standard
Python + SQL lead skills
Salary differences are significant
02_cleaning.ipynb. The notebook is self-contained and reproducible: run the cells in order after placing the clean CSVs in data/clean/ or the post-EDA exports in data/eda/.