Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/MajoRodri/HRIA/llms.txt

Use this file to discover all available pages before exploring further.

Selection bias occurs when the process by which data enters a dataset is not random with respect to the population the dataset is meant to represent. In this case, the population of interest is all hiring activity in the labor markets covered by the dataset. The data collection process captures only one narrow slice of that activity: jobs that employers chose to post publicly on LinkedIn during the crawl window. This is not a representative sample of all hiring. It is a sample of a specific hiring behavior — public job advertising on one platform — and that behavior is systematically correlated with employer type, role type, and seniority level in ways that distort every downstream analysis.

What Is Excluded

The LinkedIn dataset is silent on a large share of actual labor market activity. Excluded hiring channels include:

Employee Referrals

Referrals are the single largest hiring channel in many technology companies. Roles filled through internal referral networks are frequently never posted publicly — they move from “open requisition” directly to “candidate referred” without a job ad. The LinkedIn dataset contains zero evidence of this hiring volume.

Internal Promotions and Transfers

When a company promotes a software engineer to senior engineer, or transfers a product manager between divisions, no job posting is created. Internal mobility — a major driver of career progression — is entirely absent from this dataset.

Staffing Agency Placements

Many companies fill contract, temporary, and even permanent roles through staffing agencies that maintain their own candidate pools. These roles may never appear on LinkedIn; the agency matches candidates from its database without posting publicly.

Direct-Apply Company Career Pages

Large employers (especially in regulated industries like healthcare, finance, and government) maintain proprietary applicant tracking systems and career portals. They may post on LinkedIn, or they may not. Roles posted only on a company’s own careers page are excluded from this dataset. Senior leadership roles, board positions, and niche specialist roles are frequently filled through executive search firms (headhunters) working entirely outside public job boards. These represent some of the highest-salary, highest-impact hires and are entirely absent from the dataset.

Company Profile Bias

The types of employers who actively post on LinkedIn are not a cross-section of all employers:
  • Large technology companies are over-represented — they have dedicated talent acquisition teams, employer branding budgets, and LinkedIn Recruiter subscriptions
  • SMEs and micro-businesses are under-represented — smaller companies often rely on word-of-mouth, local networks, and direct referrals rather than platform-based sourcing
  • Startups in early stages frequently hire through founder networks and angel investor communities before they have LinkedIn Recruiter accounts
  • Public sector and NGO employers in Spain and the EU use official government employment portals (SEPE, EU Jobs) and are less consistently represented on LinkedIn
This means the dataset’s view of skill demand, salary ranges, and industry activity is disproportionately shaped by the hiring behavior of large, LinkedIn-active companies.

Impact on Analysis

DimensionEstimated Impact
Share of market activity captured~50% or less of actual hiring
Company size representationSkewed toward large-cap, LinkedIn-active employers
Role type representationSkewed toward publicly advertised, individual contributor roles
Seniority representationMid-level roles over-represented; C-suite and entry-level under-represented
Skill demand signalReflects public job description language, not actual day-to-day role requirements
Skill demand rankings derived from LinkedIn job descriptions reflect what companies write in public job ads — which is shaped by HR template culture, keyword optimization for LinkedIn’s algorithm, and employer branding goals. This is not the same as what employees actually do on the job.

Mitigation

The most honest mitigation for selection bias is scope discipline: every conclusion drawn from this dataset should be explicitly scoped to “publicly posted LinkedIn roles” rather than “the labor market” in general.
  • Frame all findings as applying to “publicly posted LinkedIn roles” in the stated time window
  • Do not extrapolate salary or skill demand findings to sectors or company types that are structurally under-represented on LinkedIn
  • Complement LinkedIn data with salary surveys (e.g., Hays Spain Salary Guide, Glassdoor), ATS aggregate data, or government labor statistics (INE, Eurostat) when making market-wide claims
  • Weight or stratify analyses by company size when available to partially correct for large-company over-representation
The survivorship bias page covers a related but distinct issue: even within the universe of publicly posted LinkedIn jobs, only postings that were active during the crawl window are captured. Jobs posted and filled before the crawl, or pulled by the employer before crawl time, are also absent. See Survivorship Bias for details.

Build docs developers (and LLMs) love