Phase 4 is where every statistical finding from Phases 1–3 is translated into a visual argument and an actionable recommendation. Working exclusively from the cleaned outputs produced in Phase 2, this phase constructs 12 purposefully sequenced charts — from a high-level bias orientation through salary distributions, skill comparisons, and engagement analytics, to a reskilling ROI analysis purpose-built for DataTalent Solutions S.L.’s strategic planning. Together, these visualizations form a self-contained analytical narrative that can be presented directly to stakeholders without prior exposure to the technical pipeline.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/MajoRodri/HRIA/llms.txt
Use this file to discover all available pages before exploring further.
Notebook
Phase4_Visualization.ipynb
Libraries
| Library | Purpose |
|---|---|
pandas | Data preparation, filtering, and aggregation for each chart |
NumPy | Array handling and statistical helpers |
matplotlib | Figure/axis management, layout control, custom styling |
seaborn | High-level statistical chart API (boxplots, heatmaps, scatter plots) |
Visualizations
The 12 charts are produced in narrative order — each one builds on or contextualizes the one before it.| # | Title | Chart Type | Key Question Answered |
|---|---|---|---|
| Intro | Critical Biases — MNAR + Geographic | Combined annotation panel | What data limitations must the reader hold in mind? |
| 1 | Annual Salary Distribution | Histogram | What does the salary landscape look like overall? |
| 1.1 | Normal vs. Lognormal Distribution Fit | Overlay comparison | Which theoretical distribution best describes salary data? |
| 2 | Salary by Experience Level | Boxplot | How does compensation scale with seniority? |
| 3 | Skills Comparison — Global Market vs Spain | Grouped bar chart | Which skills matter globally vs. in the Spanish market? |
| 4 | Top 10 Industries by Job Count | Vertical bar chart | Where is data talent demand concentrated? |
| 5 | Views vs Applications per Job | Scatter plot + trend line | Does posting visibility translate into applications? |
| 6 | Correlation Heatmap | Annotated heatmap | Which numerical variables move together? |
| 7 | Job Distribution by Contract Type | Pie / bar chart | How is the market split by employment type? |
| 8 | Mean Salary by Experience Level with Confidence Intervals | Bar chart + error bars | How reliable are the mean salary estimates per level? |
| 9 | Median Salary by Data Role | Horizontal bar chart | Which specific data roles pay the most? |
| 10 | Junior Accessibility vs Median Salary by Role | Scatter plot | Which roles are both well-paid and accessible to juniors? |
| 11 | Reskilling ROI Analysis | Multi-metric bar chart | Which role transitions offer the best salary uplift? |
Sample Visualization Code
Blues palette encodes seniority visually, and the interquartile ranges immediately reveal that director and executive levels have substantially wider salary variance than entry or associate levels — a fact that would be invisible in a mean-only bar chart.
Reskilling ROI Analysis (Viz 11)
The final visualization is unique to Phase 4 and serves as the bridge between statistical findings and strategic recommendations. It models the potential annual salary uplift for a junior professional (entry-level, assumed starting salary at the entry-level median) who transitions into each major data role after a reskilling investment. The analysis accounts for:- Current median salary at entry level as the baseline
- Median salary at mid-senior level for each target role as the destination
- Skill gap complexity (estimated as the number of top-10 required skills not typically held by general IT graduates) as a proxy for reskilling cost
Key Visual Findings
- Salary distribution is approximately normal (skew = 0.442, kurtosis = −0.248), validating the use of parametric tests in Phase 3. The lognormal fit (Viz 1.1) is marginally inferior to the normal fit within the IQR-cleaned range.
- Director and executive roles show the highest median salary but also the highest variance — the boxplot whiskers extend significantly further than lower levels, reflecting the wide range of company sizes and industries that carry these titles.
- Tech skills dominate demand across all data roles: IT infrastructure skills and DATA-specific skills (Python, SQL, ML frameworks) account for the top positions in both the global ranking and the Spain-specific ranking shown in Viz 3, though cloud platform tools (particularly Azure) rank higher in the Spanish segment.
- Views and applications are positively correlated (Viz 5), but the relationship is noisy — some postings accumulate views without applications, likely reflecting job-seeker interest without perceived fit or accessibility.
Final Recommendations Summary
Based on all four phases of the HRIA analysis, the following strategic priorities are recommended for DataTalent Solutions S.L.:- Focus recruitment on mid-senior data engineers and ML engineers — these roles show the strongest combination of market demand and salary premium within the data talent landscape.
- Invest in reskilling programs targeting junior → data role transitions — the ROI analysis (Viz 11) demonstrates measurable and quantifiable salary uplift, making the business case for internal training programs straightforward to present to stakeholders.
- Adjust all salary benchmarks for the Spanish market — the LinkedIn dataset is US-centric (> 95 % US postings). Spanish market salaries for equivalent data roles are typically 40–60 % of the US figures shown here; applying US benchmarks directly would result in uncompetitive offers.
Key Insights
Consolidated findings from all four phases — salary patterns, skill demand, and structural
observations about the LinkedIn data landscape.
Recommendations
Detailed strategic recommendations for DataTalent Solutions S.L. based on the full
four-phase HRIA analysis.