Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/MajoRodri/HRIA/llms.txt

Use this file to discover all available pages before exploring further.

Phase 4 is where every statistical finding from Phases 1–3 is translated into a visual argument and an actionable recommendation. Working exclusively from the cleaned outputs produced in Phase 2, this phase constructs 12 purposefully sequenced charts — from a high-level bias orientation through salary distributions, skill comparisons, and engagement analytics, to a reskilling ROI analysis purpose-built for DataTalent Solutions S.L.’s strategic planning. Together, these visualizations form a self-contained analytical narrative that can be presented directly to stakeholders without prior exposure to the technical pipeline.

Notebook

Phase4_Visualization.ipynb

Libraries

LibraryPurpose
pandasData preparation, filtering, and aggregation for each chart
NumPyArray handling and statistical helpers
matplotlibFigure/axis management, layout control, custom styling
seabornHigh-level statistical chart API (boxplots, heatmaps, scatter plots)

Visualizations

The 12 charts are produced in narrative order — each one builds on or contextualizes the one before it.
#TitleChart TypeKey Question Answered
IntroCritical Biases — MNAR + GeographicCombined annotation panelWhat data limitations must the reader hold in mind?
1Annual Salary DistributionHistogramWhat does the salary landscape look like overall?
1.1Normal vs. Lognormal Distribution FitOverlay comparisonWhich theoretical distribution best describes salary data?
2Salary by Experience LevelBoxplotHow does compensation scale with seniority?
3Skills Comparison — Global Market vs SpainGrouped bar chartWhich skills matter globally vs. in the Spanish market?
4Top 10 Industries by Job CountVertical bar chartWhere is data talent demand concentrated?
5Views vs Applications per JobScatter plot + trend lineDoes posting visibility translate into applications?
6Correlation HeatmapAnnotated heatmapWhich numerical variables move together?
7Job Distribution by Contract TypePie / bar chartHow is the market split by employment type?
8Mean Salary by Experience Level with Confidence IntervalsBar chart + error barsHow reliable are the mean salary estimates per level?
9Median Salary by Data RoleHorizontal bar chartWhich specific data roles pay the most?
10Junior Accessibility vs Median Salary by RoleScatter plotWhich roles are both well-paid and accessible to juniors?
11Reskilling ROI AnalysisMulti-metric bar chartWhich role transitions offer the best salary uplift?

Sample Visualization Code

import seaborn as sns
import matplotlib.pyplot as plt

fig, ax = plt.subplots(figsize=(10, 6))
sns.boxplot(
    data=df_sal,
    x='formatted_experience_level',
    y='salary_annual',
    order=['entry level', 'associate', 'mid-senior level',
           'director', 'executive'],
    palette='Blues'
)
ax.set_title('Annual Salary by Experience Level')
ax.set_xlabel('Experience Level')
ax.set_ylabel('Annual Salary (USD)')
plt.tight_layout()
plt.show()
This boxplot (Viz 2) is one of the most information-dense charts in the notebook. The ordered x-axis enforces the natural compensation ladder, the Blues palette encodes seniority visually, and the interquartile ranges immediately reveal that director and executive levels have substantially wider salary variance than entry or associate levels — a fact that would be invisible in a mean-only bar chart.

Reskilling ROI Analysis (Viz 11)

The final visualization is unique to Phase 4 and serves as the bridge between statistical findings and strategic recommendations. It models the potential annual salary uplift for a junior professional (entry-level, assumed starting salary at the entry-level median) who transitions into each major data role after a reskilling investment. The analysis accounts for:
  • Current median salary at entry level as the baseline
  • Median salary at mid-senior level for each target role as the destination
  • Skill gap complexity (estimated as the number of top-10 required skills not typically held by general IT graduates) as a proxy for reskilling cost
Key finding: Data Engineering and Machine Learning Engineering roles offer the highest absolute salary uplift for junior-to-mid transitions. Data Analysis roles offer a lower ceiling but a smaller skill gap, meaning faster time-to-uplift. Data Science roles sit between the two on both dimensions.

Key Visual Findings

  • Salary distribution is approximately normal (skew = 0.442, kurtosis = −0.248), validating the use of parametric tests in Phase 3. The lognormal fit (Viz 1.1) is marginally inferior to the normal fit within the IQR-cleaned range.
  • Director and executive roles show the highest median salary but also the highest variance — the boxplot whiskers extend significantly further than lower levels, reflecting the wide range of company sizes and industries that carry these titles.
  • Tech skills dominate demand across all data roles: IT infrastructure skills and DATA-specific skills (Python, SQL, ML frameworks) account for the top positions in both the global ranking and the Spain-specific ranking shown in Viz 3, though cloud platform tools (particularly Azure) rank higher in the Spanish segment.
  • Views and applications are positively correlated (Viz 5), but the relationship is noisy — some postings accumulate views without applications, likely reflecting job-seeker interest without perceived fit or accessibility.

Final Recommendations Summary

Based on all four phases of the HRIA analysis, the following strategic priorities are recommended for DataTalent Solutions S.L.:
  1. Focus recruitment on mid-senior data engineers and ML engineers — these roles show the strongest combination of market demand and salary premium within the data talent landscape.
  2. Invest in reskilling programs targeting junior → data role transitions — the ROI analysis (Viz 11) demonstrates measurable and quantifiable salary uplift, making the business case for internal training programs straightforward to present to stakeholders.
  3. Adjust all salary benchmarks for the Spanish market — the LinkedIn dataset is US-centric (> 95 % US postings). Spanish market salaries for equivalent data roles are typically 40–60 % of the US figures shown here; applying US benchmarks directly would result in uncompetitive offers.
Full methodology, supporting statistics, and expanded commentary for each recommendation are available on the Recommendations page.

Key Insights

Consolidated findings from all four phases — salary patterns, skill demand, and structural observations about the LinkedIn data landscape.

Recommendations

Detailed strategic recommendations for DataTalent Solutions S.L. based on the full four-phase HRIA analysis.

Build docs developers (and LLMs) love