HRIA is a structured, reproducible exploratory data analysis project built for DataTalent Solutions S.L., an HR consultancy specializing in tech and data-role recruitment. Across five Jupyter notebooks, the project ingests 11 interrelated CSV files from the LinkedIn Job Postings dataset, merges them into a single master dataset, and delivers statistical analysis, bias detection, and visualization — ready to inform talent strategy and fair hiring decisions.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/MajoRodri/HRIA/llms.txt
Use this file to discover all available pages before exploring further.
Introduction
Understand the project goals, methodology, and the business questions it answers.
Quickstart
Clone the repo, install dependencies, and run your first notebook in minutes.
Dataset Overview
Explore the 11-file LinkedIn dataset structure and its 123,849 job postings.
Bias Analysis
Discover the 8 structural biases detected in the data and their business impact.
What HRIA Does
HRIA answers five core business questions for tech-focused HR teams:Explore the raw data
Phase 1 loads all 11 CSV files, profiles their shape, missing-value patterns, and relational structure — establishing a clear picture of data quality before any transformation.
Clean and merge
Phase 2 joins all tables into one master DataFrame, normalizes salaries to annual USD, filters to data-related roles, and removes outliers — producing three publication-ready CSVs.
Analyze statistically
Phase 3 computes descriptive statistics, correlation matrices, and groupby summaries by experience level, contract type, industry, and skill — and formally identifies 8 data biases.
Analysis at a Glance
| Metric | Value |
|---|---|
| Total job postings | 123,849 |
| Data-role postings | 19,725 |
| Postings with clean salary | 6,108 |
| Median annual salary (data roles) | ~$124,800 |
| Skill categories covered | 35 |
| Industries represented | 422 |
| Biases formally detected | 8 |
| Visualizations produced | 12+ |
Phase 1: Exploration
Load and profile all 11 source files.
Phase 2: Cleaning
Merge, normalize, and filter the dataset.
Phase 3: Statistics
Descriptive stats, correlations, and bias detection.
Phase 4: Visualization
12+ charts and final recommendations.
Key Insights
Top findings across all four phases.
Recommendations
Actionable strategy for DataTalent Solutions.