One of the most important questions in HR analytics is whether compensation and opportunity are distributed equitably across gender. It is also one of the questions this dataset is structurally least equipped to answer. LinkedIn job postings contain no gender field — not for the hiring company, not for the role, and not for any applicant. Gender is entirely absent as an explicit variable. This absence is not a minor gap; it is the defining limitation for any fairness or equity analysis built on this data. Understanding why gender is missing, what imperfect proxies exist, and why those proxies carry serious ethical risks is essential before any analysis touches on occupational equity.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/MajoRodri/HRIA/llms.txt
Use this file to discover all available pages before exploring further.
Why Gender Is Absent
LinkedIn does not capture or expose applicant gender in its job postings data. This is partly a product of:- Privacy regulation: EU GDPR and US equal employment opportunity law restrict the collection and use of protected attributes in hiring contexts
- Platform design: LinkedIn’s job posting product is designed for employers to describe roles, not to capture demographic data about applicants
- API and data licensing constraints: even where LinkedIn collects demographic signals internally, this information is not exposed in dataset exports or through the API that produced this dataset
The Proxy Approach
In the absence of direct gender data, some analysts use occupational gender coding as a proxy: mapping job titles to historically male-coded or female-coded occupations based on workforce composition data from labor surveys. This approach, explored in the Phase 3 and Phase 4 analyses of HRIA, works as follows: Female-coded role proxies (occupations with historically high female workforce representation):- HR Coordinator, HR Generalist, HR Manager
- Administrative Assistant, Executive Assistant
- Recruiter, Talent Acquisition Specialist
- Marketing Coordinator, Content Writer
- Nurse, Healthcare Coordinator
- Software Engineer, Data Scientist, Data Engineer
- DevOps Engineer, Machine Learning Engineer
- Financial Analyst, Investment Banker
- Operations Manager, Supply Chain Manager
Limitations of Proxy Inference
The proxy approach is analytically useful only for descriptive occupational analysis and carries critical limitations:- Proxies perpetuate stereotypes: labeling “HR Coordinator” as female-coded encodes existing occupational segregation as a fact of nature rather than a historical artifact of discrimination. Using this proxy in a model amplifies rather than corrects the underlying inequality.
- Proxies do not capture individual gender: a job title tells you the historical gender composition of an occupation, not the gender of the person who holds or applies for a specific role. An individual male HR Coordinator is misclassified; a non-binary Software Engineer is erased entirely.
- Occupational gender coding shifts over time: the proportion of women in data science has grown significantly in recent years. Proxies calibrated on historical data become stale and will produce incorrect classifications for recently integrated occupations.
- Non-binary and gender-diverse identities are invisible: even a perfect proxy system that correctly classified male and female occupations would entirely fail to represent the growing share of the workforce that identifies outside the binary.
- Interaction effects are lost: salary disparities at the intersection of gender and race, gender and disability, or gender and immigration status cannot be detected from title-based gender proxies alone.
Ethical Implications for Salary Gap Analysis
A direct gender pay gap analysis — comparing salary distributions for male-identified vs female-identified workers — cannot be performed from this dataset alone. Performing it using title-based proxies would produce a figure that is:- Measuring occupational segregation (which roles pay differently), not individual-level pay discrimination
- Potentially actionable as exploratory evidence but not as a compliance or auditing conclusion
- Liable to misinterpretation if presented to clients as a “gender pay gap” figure
Recommended Approach
| Use Case | Recommended Data Source |
|---|---|
| Descriptive occupational gender analysis | LinkedIn data + occupational gender coding (proxy only, clearly labeled) |
| Gender pay gap quantification | INE Encuesta de Estructura Salarial, Eurostat |
| Pay equity audit for a specific company | Internal HR data with self-reported gender, paired with role and compensation data |
| Trend analysis (gender in tech) | LinkedIn + Stack Overflow Developer Survey + Eurostat STEM data |