Documentation Index
Fetch the complete documentation index at: https://mintlify.com/MajoRodri/HRIA/llms.txt
Use this file to discover all available pages before exploring further.
The 11 CSV files in the LinkedIn Job Postings dataset are linked through a consistent set of foreign keys — job_id ties every job-level satellite table back to postings.csv, and company_id ties every company-level satellite back to companies/companies.csv. Two lookup tables in mappings/ provide human-readable labels for the abbreviated codes used throughout the relational tables. Together, these files form a normalized relational structure that must be joined and aggregated before any analysis can begin. This page documents every column in every file, including data types, observed null rates, and the relationships that connect tables to one another.
postings.csv — 31 Columns
The main fact table. Each row represents a single job posting as it appeared on LinkedIn. Columns fall into two broad categories.
Numeric Columns (16)
| Column | Type | Description |
|---|
job_id | int64 | Primary key — unique identifier for the posting |
max_salary | float64 | Upper bound of the advertised salary range |
min_salary | float64 | Lower bound of the advertised salary range |
med_salary | float64 | Midpoint salary (rarely populated) |
normalized_salary | float64 | Platform-normalized annual salary estimate |
company_id | float64 | Foreign key → companies/companies.csv |
views | float64 | Number of times the posting was viewed |
applies | float64 | Number of applications submitted (Easy Apply only) |
remote_allowed | float64 | Binary flag: 1 = remote permitted, 0 = not (87.7% null) |
original_listed_time | int64 | Unix timestamp of original listing date |
listed_time | int64 | Unix timestamp of most recent listing refresh |
expiry | float64 | Unix timestamp of posting expiry |
closed_time | float64 | Unix timestamp when posting was closed (99.1% null) |
sponsored | int64 | Binary flag: 1 = sponsored/promoted listing |
zip_code | float64 | ZIP code of the posting location |
fips | float64 | FIPS county code |
Categorical Columns (15)
| Column | Type | Description |
|---|
company_name | object | Employer name as it appears on LinkedIn |
title | object | Job title as posted |
description | object | Full text of the job description |
pay_period | object | Salary pay period: HOURLY, MONTHLY, WEEKLY, YEARLY |
location | object | Freeform location string (city, state, country) |
formatted_work_type | object | Standardized work arrangement: On-site / Remote / Hybrid |
work_type | object | Raw work type code |
formatted_experience_level | object | Standardized seniority: Entry / Mid-Senior / Director / etc. |
application_type | object | How candidates apply: OffSiteApply or SimpleOnsiteApply |
application_url | object | External application URL (for off-site applications) |
job_posting_url | object | Direct URL to the LinkedIn posting |
skills_desc | object | Free-text skills description embedded in the posting (98% null) |
posting_domain | object | Domain of the external application URL |
currency | object | Currency code for salary values (e.g. USD) |
compensation_type | object | Compensation structure: BASE_SALARY, FIXED, etc. |
Missing Value Rates — postings.csv
| Column | Null Rate | Notes |
|---|
closed_time | 99.1% | Practically unusable — dropped in preprocessing |
skills_desc | 98.0% | Use jobs/job_skills.csv instead |
med_salary | 94.9% | Low coverage; min/max range preferred |
remote_allowed | 87.7% | Companies rarely disclose; use formatted_work_type |
applies | 80.6% | Only Easy Apply submissions are tracked |
min_salary / max_salary | ~75% | Only 24% of postings include a salary range |
formatted_experience_level | 23.7% | Imputed with mode (“Mid-Senior Level”) in Phase 2 |
companies/companies.csv — 10 Columns
One row per company. Links to postings.csv via company_id.
| Column | Type | Description |
|---|
company_id | int64 | Primary key |
name | object | Company display name |
description | object | Company summary text |
company_size | float64 | Encoded headcount bucket (1 = 1–10 employees … 7 = 10,001+) |
state | object | US state abbreviation |
country | object | ISO country code |
city | object | City name |
zip_code | object | Postal code |
address | object | Street address |
url | object | Company LinkedIn profile URL |
jobs/salaries.csv — 8 Columns
Salary data scraped from individual postings. Covers 40,785 rows — a superset of the postings that have salary data in postings.csv itself. Links to postings.csv via job_id.
| Column | Type | Description |
|---|
salary_id | int64 | Primary key for this salary record |
job_id | int64 | Foreign key → postings.csv |
max_salary | float64 | Upper bound of the advertised range |
med_salary | float64 | Midpoint salary |
min_salary | float64 | Lower bound of the advertised range |
pay_period | object | Pay cadence: HOURLY, MONTHLY, WEEKLY, or YEARLY |
currency | object | Currency code (predominantly USD) |
compensation_type | object | Compensation structure: BASE_SALARY, FIXED, etc. |
The pay_period column is critical for salary normalization. All four values appear in the data:
pay_period Value | Multiplier to Annual | Example Raw Value → Annual |
|---|
HOURLY | × 2,080 (40h × 52w) | 45/hr→93,600 |
MONTHLY | × 12 | 8,000/mo→96,000 |
WEEKLY | × 52 | 2,000/wk→104,000 |
YEARLY | × 1 (no change) | 120,000→120,000 |
jobs/job_skills.csv — 2 Columns
Maps job postings to skill categories. One row per job–skill pair; a single posting can have many skill assignments (213,768 rows total for 123,849 postings).
| Column | Type | Description |
|---|
job_id | int64 | Foreign key → postings.csv |
skill_abr | object | Abbreviated skill code — decoded via mappings/skills.csv |
The skill_abr field stores short codes such as IT, DATA, MRKT, DSGN, WRT, and 30 others. These are not human-readable without the mapping table.
Always join jobs/job_skills.csv with mappings/skills.csv on skill_abr before visualizing or aggregating skill demand. The raw abbreviations (e.g. ITSM, MNGT, SALE) are opaque and will produce misleading axis labels in charts if used directly. The mappings/skills.csv lookup resolves each code to a full category name like “Information Technology”, “Management”, or “Sales”.
jobs/benefits.csv — 3 Columns
Lists benefits associated with each posting. One row per job–benefit pair (67,943 rows). Links to postings.csv via job_id.
| Column | Type | Description |
|---|
job_id | int64 | Foreign key → postings.csv |
inferred | bool | True if benefit was inferred from description text; False if explicitly listed |
type | object | Benefit category label |
Common values for type include: Medical insurance, Vision insurance, Dental insurance, 401(K), Paid maternity leave, Paid paternity leave, Disability insurance, Student loan assistance, Commuter benefits, Tuition assistance.
companies/company_industries.csv and jobs/job_industries.csv
Both files share the same 2-column structure:
| Column | Type | Description |
|---|
company_id / job_id | int64 | Foreign key to the parent table |
industry_id | int64 | Foreign key → mappings/industries.csv |
mappings/industries.csv resolves each industry_id to a human-readable industry_name string across 422 industry categories.
Primary / Foreign Key Relationships
The table below summarizes every join path available in the dataset:
| Left Table | Join Key | Right Table | Relationship |
|---|
postings.csv | job_id | jobs/salaries.csv | 1:1 (some postings have no salary row) |
postings.csv | job_id | jobs/job_skills.csv | 1:N |
postings.csv | job_id | jobs/benefits.csv | 1:N |
postings.csv | job_id | jobs/job_industries.csv | 1:N |
postings.csv | company_id | companies/companies.csv | N:1 |
companies/companies.csv | company_id | companies/company_industries.csv | 1:N |
companies/companies.csv | company_id | companies/company_specialities.csv | 1:N |
companies/companies.csv | company_id | companies/employee_counts.csv | 1:N |
jobs/job_skills.csv | skill_abr | mappings/skills.csv | N:1 |
jobs/job_industries.csv | industry_id | mappings/industries.csv | N:1 |
companies/company_industries.csv | industry_id | mappings/industries.csv | N:1 |