Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/MajoRodri/HRIA/llms.txt

Use this file to discover all available pages before exploring further.

Skill demand analysis is one of the primary value propositions of a LinkedIn-based HR intelligence product. Employers want to know which skills are in demand, how demand is shifting, and what skill profiles define competitive candidates in their sector. The LinkedIn dataset provides a job_skills table that links postings to skill tags — but these tags are aggregated into 35 broad categories that collapse meaningfully distinct skills into undifferentiated buckets. The result is a skill demand signal that is real but coarse: useful for macro-level category trends, but blind to the fine-grained tool and technology distinctions that actually determine whether a candidate is qualified for a specific role.

The Numbers

MetricValue
Total skill assignments in dataset213,768
Distinct skill categories (skill_abr)35
Average skill assignments per posting~1.7
skills_desc column (free text) null rate~98%
With 213,768 skill assignments spread across only 35 categories, each category carries an average of ~6,108 skill assignments — an enormous amount of information compressed into a single label. The skills_desc column, which was presumably intended to carry free-text skill descriptions and could have provided the granularity the abbreviations lack, is 98% null.

The 35-Category Problem

LinkedIn’s skill category abbreviations include labels such as: IT · DATA · MRKT · DSGN · WRT · SALE · MGMT · FINC · ENGG · HLTH · LEGL · EDUC · OPER · COMS · MNFG Consider what collapses into a single IT or DATA tag:
Collapsed into DATACollapsed into IT
PythonJavaScript
SQLTypeScript
RJava
Apache SparkGo / Rust
dbt (data build tool)Kubernetes
TableauAWS / Azure / GCP
Power BILinux system administration
Apache AirflowNetwork security
TensorFlowSalesforce / SAP
PyTorchAPI development
A posting requiring a senior dbt engineer with Snowflake and Airflow experience produces the same DATA tag as a posting requiring a junior Excel analyst. A requirement for PyTorch-specific deep learning expertise is indistinguishable from a requirement for Tableau dashboarding. From skill_abr alone, these roles are identical.

Consequence for DataTalent Solutions

This aggregation creates a specific problem for client-facing skill gap analysis:
  • Tool-specific hiring signals are invisible: DataTalent cannot determine whether Spanish companies are hiring for Tableau analysts, Power BI developers, or dbt engineers — all appear as “DATA” demand
  • Seniority cannot be inferred from skill tags: a DATA tag on a junior analyst posting looks the same as a DATA tag on a Principal Data Architect posting
  • Adjacent skill combinations are lost: the combination of DATA + MGMT might indicate a data product manager role or a data governance lead — two very different profiles that would require different candidate pipelines
  • Emerging technology demand is invisible: new tools (e.g., dbt, Polars, LangChain) that emerged after LinkedIn’s skill taxonomy was defined may be tagged inconsistently or not at all
Do not present skill_abr category rankings to clients as a fine-grained skill demand analysis. The 35-category taxonomy cannot distinguish between adjacent tools in the same technology family. Frame these rankings as macro-level category trends only.

Inspecting Skill Categories

skills_map = pd.read_csv('archive/mappings/skills.csv')
print(skills_map.to_string())  # All 35 categories

top_skills = (
    df['job_skills_list'].str.split(', ')
    .explode()
    .value_counts()
    .head(10)
)
print(top_skills)
This code reveals the full skill category label set and the frequency distribution of skill assignments across postings. Running it establishes the ceiling of what skill_abr-based analysis can tell you before any mitigation is applied.

Mitigation Strategies

Short-term: Description Field NLP

The description field contains free-text job descriptions and is the richest source of fine-grained skill signals in the dataset. NLP-based skill extraction can partially compensate for aggregation bias:
  • Named entity recognition (NER) to extract technology names, tool names, and certification requirements
  • Keyword matching against external skill taxonomies (ESCO, ONET, Lightcast) to tag individual tools
  • Frequency analysis of technology mentions across description text to reconstruct tool-specific demand rankings

Long-term: External Skill Taxonomy Integration

Supplement the LinkedIn skill tags with established external taxonomies that provide the granularity the dataset lacks:
TaxonomyCoverageUse Case
ESCO (European Skills/Competences)EU-focused, multilingualSpanish market alignment
O*NET (US Dept of Labor)US-focused, highly granularUS benchmark comparisons
Lightcast (formerly Burning Glass)Commercial, real-timeFine-grained tool demand
SFIA (Skills Framework for Information Age)IT-specific, leveledSeniority calibration
The ESCO taxonomy is available in Spanish and aligns with EU labor market frameworks. It is the most appropriate external taxonomy for DataTalent Solutions’ Spanish client work. ESCO provides over 13,890 skills and competences at a granularity that the 35-category LinkedIn taxonomy cannot match.

Build docs developers (and LLMs) love