Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/MajoRodri/HRIA/llms.txt

Use this file to discover all available pages before exploring further.

HRIA is a structured, reproducible exploratory data analysis project built for DataTalent Solutions S.L., an HR consultancy specializing in tech and data-role recruitment. Across five Jupyter notebooks, the project ingests 11 interrelated CSV files from the LinkedIn Job Postings dataset, merges them into a single master dataset, and delivers statistical analysis, bias detection, and visualization — ready to inform talent strategy and fair hiring decisions.

Introduction

Understand the project goals, methodology, and the business questions it answers.

Quickstart

Clone the repo, install dependencies, and run your first notebook in minutes.

Dataset Overview

Explore the 11-file LinkedIn dataset structure and its 123,849 job postings.

Bias Analysis

Discover the 8 structural biases detected in the data and their business impact.

What HRIA Does

HRIA answers five core business questions for tech-focused HR teams:
1

Explore the raw data

Phase 1 loads all 11 CSV files, profiles their shape, missing-value patterns, and relational structure — establishing a clear picture of data quality before any transformation.
2

Clean and merge

Phase 2 joins all tables into one master DataFrame, normalizes salaries to annual USD, filters to data-related roles, and removes outliers — producing three publication-ready CSVs.
3

Analyze statistically

Phase 3 computes descriptive statistics, correlation matrices, and groupby summaries by experience level, contract type, industry, and skill — and formally identifies 8 data biases.
4

Visualize and conclude

Phase 4 produces 12+ charts — salary distributions, skill comparisons, heatmaps, and reskilling ROI — with written interpretation and actionable recommendations.

Analysis at a Glance

MetricValue
Total job postings123,849
Data-role postings19,725
Postings with clean salary6,108
Median annual salary (data roles)~$124,800
Skill categories covered35
Industries represented422
Biases formally detected8
Visualizations produced12+

Phase 1: Exploration

Load and profile all 11 source files.

Phase 2: Cleaning

Merge, normalize, and filter the dataset.

Phase 3: Statistics

Descriptive stats, correlations, and bias detection.

Phase 4: Visualization

12+ charts and final recommendations.

Key Insights

Top findings across all four phases.

Recommendations

Actionable strategy for DataTalent Solutions.

Build docs developers (and LLMs) love