Data Bias Overview: 8 Structural Biases in HRIA Data

Bias analysis is the practice of systematically identifying, naming, and quantifying the ways a dataset deviates from a true representation of the population it is meant to describe. In HR analytics, bias is not merely a statistical inconvenience — it directly determines whether salary benchmarks are fair, whether skill demand signals are trustworthy, and whether hiring recommendation models reinforce or correct existing inequalities. HRIA formally identifies 8 structural biases present in the 124K LinkedIn Job Postings dataset, each with a documented mechanism, a quantified impact, and a recommended mitigation strategy. No analysis in this project should be interpreted without understanding these limitations.

The 8 Structural Biases

#	Bias Name	Type	Affected Column(s)	Impact Level
1	MNAR Salary	Missing Not At Random	`salary_annual`, `min_salary`, `max_salary`	🔴 Critical
2	Geographic	Representation	`location`, `comp_country`	🟠 High
3	Selection	Sampling	All columns	🟠 High
4	Gender Proxy	Undisclosed Attribute	Role title (proxy)	🟡 Medium
5	Temporal	Time-based	`listed_time`, `original_listed_time`	🟡 Medium
6	Skill Aggregation	Granularity	`job_skills_list`, `skill_abr`	🟡 Medium
7	Survivorship	Sampling	`postings.csv` (active only)	🟡 Medium
8	Applies Undercounting	Measurement	`applies`	🟢 Low–Medium

Why This Matters

Any predictive model trained on this dataset without explicit bias correction will produce results that are misleading at best and actively harmful at worst. Concretely:

Salary prediction models trained on the full dataset will be biased toward the 24% of companies that disclose pay — typically larger, higher-paying firms — causing salary estimates to be systematically overstated.
Skill demand rankings aggregated at the global level will reflect US market priorities, not Spanish labor market realities — directly misaligning hiring recommendations for DataTalent Solutions S.L.’s clients.
Application volume metrics built on the applies column will systematically undercount demand for roles that use external application links, producing a distorted picture of which roles are competitive.
Hiring fairness models that use role titles as implicit proxies for gender will encode and amplify historical occupational segregation patterns.

The goal of this bias documentation is not to disqualify the dataset — it remains rich and analytically valuable — but to ensure every downstream conclusion is scoped to the population it actually represents, not the population it is assumed to represent.

Models trained on this dataset without bias mitigation may perpetuate geographic and gender-based salary inequalities. Salary benchmarks derived from this data reflect a US-dominant, high-disclosure subset of the labor market and should not be applied directly to Spanish compensation analysis without explicit geographic filtering and disclosure-rate adjustment.

Individual Bias Pages

MNAR Salary Bias

76% of postings hide salary data — and it’s not random. Less competitive employers strategically omit pay.

Geographic Bias

The dataset is overwhelmingly US-centric. Spanish salary and skill benchmarks require significant adjustment.

Selection Bias

LinkedIn captures only publicly posted roles, excluding referrals, internal promotions, and agency placements.

Gender Proxy Bias

No gender field exists. Role-title proxies are an imperfect and ethically constrained substitute.

Temporal Bias

The dataset’s time window shapes which industries and salary levels appear most common.

Skill Aggregation Bias

35 broad categories flatten Python vs SQL, PyTorch vs TensorFlow, and every nuance in between.

Survivorship Bias

Only active postings are captured. Quickly filled and never-posted jobs are invisible.

Applies Undercounting

The applies column only counts Easy Apply submissions — external-link jobs record zero.

Overview

Dataset

Analysis Phases

Bias Analysis

Findings & Recommendations

The 8 Structural Biases

Why This Matters

Individual Bias Pages

MNAR Salary Bias

Geographic Bias

Selection Bias

Gender Proxy Bias

Temporal Bias

Skill Aggregation Bias

Survivorship Bias

Applies Undercounting

Build docs developers (and LLMs) love

Overview

Dataset

Analysis Phases

Bias Analysis

Findings & Recommendations

Documentation Index

​The 8 Structural Biases

​Why This Matters

​Individual Bias Pages

MNAR Salary Bias

Geographic Bias

Selection Bias

Gender Proxy Bias

Temporal Bias

Skill Aggregation Bias

Survivorship Bias

Applies Undercounting

Build docs developers (and LLMs) love

The 8 Structural Biases

Why This Matters

Individual Bias Pages