Documentation Index
Fetch the complete documentation index at: https://mintlify.com/MajoRodri/HRIA/llms.txt
Use this file to discover all available pages before exploring further.
Skill demand analysis is one of the primary value propositions of a LinkedIn-based HR intelligence product. Employers want to know which skills are in demand, how demand is shifting, and what skill profiles define competitive candidates in their sector. The LinkedIn dataset provides a job_skills table that links postings to skill tags — but these tags are aggregated into 35 broad categories that collapse meaningfully distinct skills into undifferentiated buckets. The result is a skill demand signal that is real but coarse: useful for macro-level category trends, but blind to the fine-grained tool and technology distinctions that actually determine whether a candidate is qualified for a specific role.
The Numbers
| Metric | Value |
|---|
| Total skill assignments in dataset | 213,768 |
Distinct skill categories (skill_abr) | 35 |
| Average skill assignments per posting | ~1.7 |
skills_desc column (free text) null rate | ~98% |
With 213,768 skill assignments spread across only 35 categories, each category carries an average of ~6,108 skill assignments — an enormous amount of information compressed into a single label. The skills_desc column, which was presumably intended to carry free-text skill descriptions and could have provided the granularity the abbreviations lack, is 98% null.
The 35-Category Problem
LinkedIn’s skill category abbreviations include labels such as:
IT · DATA · MRKT · DSGN · WRT · SALE · MGMT · FINC · ENGG · HLTH · LEGL · EDUC · OPER · COMS · MNFG
Consider what collapses into a single IT or DATA tag:
Collapsed into DATA | Collapsed into IT |
|---|
| Python | JavaScript |
| SQL | TypeScript |
| R | Java |
| Apache Spark | Go / Rust |
| dbt (data build tool) | Kubernetes |
| Tableau | AWS / Azure / GCP |
| Power BI | Linux system administration |
| Apache Airflow | Network security |
| TensorFlow | Salesforce / SAP |
| PyTorch | API development |
A posting requiring a senior dbt engineer with Snowflake and Airflow experience produces the same DATA tag as a posting requiring a junior Excel analyst. A requirement for PyTorch-specific deep learning expertise is indistinguishable from a requirement for Tableau dashboarding. From skill_abr alone, these roles are identical.
Consequence for DataTalent Solutions
This aggregation creates a specific problem for client-facing skill gap analysis:
- Tool-specific hiring signals are invisible: DataTalent cannot determine whether Spanish companies are hiring for Tableau analysts, Power BI developers, or dbt engineers — all appear as “DATA” demand
- Seniority cannot be inferred from skill tags: a
DATA tag on a junior analyst posting looks the same as a DATA tag on a Principal Data Architect posting
- Adjacent skill combinations are lost: the combination of
DATA + MGMT might indicate a data product manager role or a data governance lead — two very different profiles that would require different candidate pipelines
- Emerging technology demand is invisible: new tools (e.g., dbt, Polars, LangChain) that emerged after LinkedIn’s skill taxonomy was defined may be tagged inconsistently or not at all
Do not present skill_abr category rankings to clients as a fine-grained skill demand analysis. The 35-category taxonomy cannot distinguish between adjacent tools in the same technology family. Frame these rankings as macro-level category trends only.
Inspecting Skill Categories
skills_map = pd.read_csv('archive/mappings/skills.csv')
print(skills_map.to_string()) # All 35 categories
top_skills = (
df['job_skills_list'].str.split(', ')
.explode()
.value_counts()
.head(10)
)
print(top_skills)
This code reveals the full skill category label set and the frequency distribution of skill assignments across postings. Running it establishes the ceiling of what skill_abr-based analysis can tell you before any mitigation is applied.
Mitigation Strategies
Short-term: Description Field NLP
The description field contains free-text job descriptions and is the richest source of fine-grained skill signals in the dataset. NLP-based skill extraction can partially compensate for aggregation bias:
- Named entity recognition (NER) to extract technology names, tool names, and certification requirements
- Keyword matching against external skill taxonomies (ESCO, ONET, Lightcast) to tag individual tools
- Frequency analysis of technology mentions across
description text to reconstruct tool-specific demand rankings
Long-term: External Skill Taxonomy Integration
Supplement the LinkedIn skill tags with established external taxonomies that provide the granularity the dataset lacks:
| Taxonomy | Coverage | Use Case |
|---|
| ESCO (European Skills/Competences) | EU-focused, multilingual | Spanish market alignment |
| O*NET (US Dept of Labor) | US-focused, highly granular | US benchmark comparisons |
| Lightcast (formerly Burning Glass) | Commercial, real-time | Fine-grained tool demand |
| SFIA (Skills Framework for Information Age) | IT-specific, leveled | Seniority calibration |
The ESCO taxonomy is available in Spanish and aligns with EU labor market frameworks. It is the most appropriate external taxonomy for DataTalent Solutions’ Spanish client work. ESCO provides over 13,890 skills and competences at a granularity that the 35-category LinkedIn taxonomy cannot match.