Bias analysis is the practice of systematically identifying, naming, and quantifying the ways a dataset deviates from a true representation of the population it is meant to describe. In HR analytics, bias is not merely a statistical inconvenience — it directly determines whether salary benchmarks are fair, whether skill demand signals are trustworthy, and whether hiring recommendation models reinforce or correct existing inequalities. HRIA formally identifies 8 structural biases present in the 124K LinkedIn Job Postings dataset, each with a documented mechanism, a quantified impact, and a recommended mitigation strategy. No analysis in this project should be interpreted without understanding these limitations.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/MajoRodri/HRIA/llms.txt
Use this file to discover all available pages before exploring further.
The 8 Structural Biases
| # | Bias Name | Type | Affected Column(s) | Impact Level |
|---|---|---|---|---|
| 1 | MNAR Salary | Missing Not At Random | salary_annual, min_salary, max_salary | 🔴 Critical |
| 2 | Geographic | Representation | location, comp_country | 🟠 High |
| 3 | Selection | Sampling | All columns | 🟠 High |
| 4 | Gender Proxy | Undisclosed Attribute | Role title (proxy) | 🟡 Medium |
| 5 | Temporal | Time-based | listed_time, original_listed_time | 🟡 Medium |
| 6 | Skill Aggregation | Granularity | job_skills_list, skill_abr | 🟡 Medium |
| 7 | Survivorship | Sampling | postings.csv (active only) | 🟡 Medium |
| 8 | Applies Undercounting | Measurement | applies | 🟢 Low–Medium |
Why This Matters
Any predictive model trained on this dataset without explicit bias correction will produce results that are misleading at best and actively harmful at worst. Concretely:- Salary prediction models trained on the full dataset will be biased toward the 24% of companies that disclose pay — typically larger, higher-paying firms — causing salary estimates to be systematically overstated.
- Skill demand rankings aggregated at the global level will reflect US market priorities, not Spanish labor market realities — directly misaligning hiring recommendations for DataTalent Solutions S.L.’s clients.
- Application volume metrics built on the
appliescolumn will systematically undercount demand for roles that use external application links, producing a distorted picture of which roles are competitive. - Hiring fairness models that use role titles as implicit proxies for gender will encode and amplify historical occupational segregation patterns.
Individual Bias Pages
MNAR Salary Bias
76% of postings hide salary data — and it’s not random. Less competitive employers strategically omit pay.
Geographic Bias
The dataset is overwhelmingly US-centric. Spanish salary and skill benchmarks require significant adjustment.
Selection Bias
LinkedIn captures only publicly posted roles, excluding referrals, internal promotions, and agency placements.
Gender Proxy Bias
No gender field exists. Role-title proxies are an imperfect and ethically constrained substitute.
Temporal Bias
The dataset’s time window shapes which industries and salary levels appear most common.
Skill Aggregation Bias
35 broad categories flatten Python vs SQL, PyTorch vs TensorFlow, and every nuance in between.
Survivorship Bias
Only active postings are captured. Quickly filled and never-posted jobs are invisible.
Applies Undercounting
The applies column only counts Easy Apply submissions — external-link jobs record zero.