Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/MajoRodri/HRIA/llms.txt

Use this file to discover all available pages before exploring further.

The 11 CSV files in the LinkedIn Job Postings dataset are linked through a consistent set of foreign keys — job_id ties every job-level satellite table back to postings.csv, and company_id ties every company-level satellite back to companies/companies.csv. Two lookup tables in mappings/ provide human-readable labels for the abbreviated codes used throughout the relational tables. Together, these files form a normalized relational structure that must be joined and aggregated before any analysis can begin. This page documents every column in every file, including data types, observed null rates, and the relationships that connect tables to one another.

postings.csv — 31 Columns

The main fact table. Each row represents a single job posting as it appeared on LinkedIn. Columns fall into two broad categories.

Numeric Columns (16)

ColumnTypeDescription
job_idint64Primary key — unique identifier for the posting
max_salaryfloat64Upper bound of the advertised salary range
min_salaryfloat64Lower bound of the advertised salary range
med_salaryfloat64Midpoint salary (rarely populated)
normalized_salaryfloat64Platform-normalized annual salary estimate
company_idfloat64Foreign key → companies/companies.csv
viewsfloat64Number of times the posting was viewed
appliesfloat64Number of applications submitted (Easy Apply only)
remote_allowedfloat64Binary flag: 1 = remote permitted, 0 = not (87.7% null)
original_listed_timeint64Unix timestamp of original listing date
listed_timeint64Unix timestamp of most recent listing refresh
expiryfloat64Unix timestamp of posting expiry
closed_timefloat64Unix timestamp when posting was closed (99.1% null)
sponsoredint64Binary flag: 1 = sponsored/promoted listing
zip_codefloat64ZIP code of the posting location
fipsfloat64FIPS county code

Categorical Columns (15)

ColumnTypeDescription
company_nameobjectEmployer name as it appears on LinkedIn
titleobjectJob title as posted
descriptionobjectFull text of the job description
pay_periodobjectSalary pay period: HOURLY, MONTHLY, WEEKLY, YEARLY
locationobjectFreeform location string (city, state, country)
formatted_work_typeobjectStandardized work arrangement: On-site / Remote / Hybrid
work_typeobjectRaw work type code
formatted_experience_levelobjectStandardized seniority: Entry / Mid-Senior / Director / etc.
application_typeobjectHow candidates apply: OffSiteApply or SimpleOnsiteApply
application_urlobjectExternal application URL (for off-site applications)
job_posting_urlobjectDirect URL to the LinkedIn posting
skills_descobjectFree-text skills description embedded in the posting (98% null)
posting_domainobjectDomain of the external application URL
currencyobjectCurrency code for salary values (e.g. USD)
compensation_typeobjectCompensation structure: BASE_SALARY, FIXED, etc.

Missing Value Rates — postings.csv

ColumnNull RateNotes
closed_time99.1%Practically unusable — dropped in preprocessing
skills_desc98.0%Use jobs/job_skills.csv instead
med_salary94.9%Low coverage; min/max range preferred
remote_allowed87.7%Companies rarely disclose; use formatted_work_type
applies80.6%Only Easy Apply submissions are tracked
min_salary / max_salary~75%Only 24% of postings include a salary range
formatted_experience_level23.7%Imputed with mode (“Mid-Senior Level”) in Phase 2

companies/companies.csv — 10 Columns

One row per company. Links to postings.csv via company_id.
ColumnTypeDescription
company_idint64Primary key
nameobjectCompany display name
descriptionobjectCompany summary text
company_sizefloat64Encoded headcount bucket (1 = 1–10 employees … 7 = 10,001+)
stateobjectUS state abbreviation
countryobjectISO country code
cityobjectCity name
zip_codeobjectPostal code
addressobjectStreet address
urlobjectCompany LinkedIn profile URL

jobs/salaries.csv — 8 Columns

Salary data scraped from individual postings. Covers 40,785 rows — a superset of the postings that have salary data in postings.csv itself. Links to postings.csv via job_id.
ColumnTypeDescription
salary_idint64Primary key for this salary record
job_idint64Foreign key → postings.csv
max_salaryfloat64Upper bound of the advertised range
med_salaryfloat64Midpoint salary
min_salaryfloat64Lower bound of the advertised range
pay_periodobjectPay cadence: HOURLY, MONTHLY, WEEKLY, or YEARLY
currencyobjectCurrency code (predominantly USD)
compensation_typeobjectCompensation structure: BASE_SALARY, FIXED, etc.
The pay_period column is critical for salary normalization. All four values appear in the data:
pay_period ValueMultiplier to AnnualExample Raw Value → Annual
HOURLY× 2,080 (40h × 52w)45/hr45/hr → 93,600
MONTHLY× 128,000/mo8,000/mo → 96,000
WEEKLY× 522,000/wk2,000/wk → 104,000
YEARLY× 1 (no change)120,000120,000 → 120,000

jobs/job_skills.csv — 2 Columns

Maps job postings to skill categories. One row per job–skill pair; a single posting can have many skill assignments (213,768 rows total for 123,849 postings).
ColumnTypeDescription
job_idint64Foreign key → postings.csv
skill_abrobjectAbbreviated skill code — decoded via mappings/skills.csv
The skill_abr field stores short codes such as IT, DATA, MRKT, DSGN, WRT, and 30 others. These are not human-readable without the mapping table.
Always join jobs/job_skills.csv with mappings/skills.csv on skill_abr before visualizing or aggregating skill demand. The raw abbreviations (e.g. ITSM, MNGT, SALE) are opaque and will produce misleading axis labels in charts if used directly. The mappings/skills.csv lookup resolves each code to a full category name like “Information Technology”, “Management”, or “Sales”.

jobs/benefits.csv — 3 Columns

Lists benefits associated with each posting. One row per job–benefit pair (67,943 rows). Links to postings.csv via job_id.
ColumnTypeDescription
job_idint64Foreign key → postings.csv
inferredboolTrue if benefit was inferred from description text; False if explicitly listed
typeobjectBenefit category label
Common values for type include: Medical insurance, Vision insurance, Dental insurance, 401(K), Paid maternity leave, Paid paternity leave, Disability insurance, Student loan assistance, Commuter benefits, Tuition assistance.

companies/company_industries.csv and jobs/job_industries.csv

Both files share the same 2-column structure:
ColumnTypeDescription
company_id / job_idint64Foreign key to the parent table
industry_idint64Foreign key → mappings/industries.csv
mappings/industries.csv resolves each industry_id to a human-readable industry_name string across 422 industry categories.

Primary / Foreign Key Relationships

The table below summarizes every join path available in the dataset:
Left TableJoin KeyRight TableRelationship
postings.csvjob_idjobs/salaries.csv1:1 (some postings have no salary row)
postings.csvjob_idjobs/job_skills.csv1:N
postings.csvjob_idjobs/benefits.csv1:N
postings.csvjob_idjobs/job_industries.csv1:N
postings.csvcompany_idcompanies/companies.csvN:1
companies/companies.csvcompany_idcompanies/company_industries.csv1:N
companies/companies.csvcompany_idcompanies/company_specialities.csv1:N
companies/companies.csvcompany_idcompanies/employee_counts.csv1:N
jobs/job_skills.csvskill_abrmappings/skills.csvN:1
jobs/job_industries.csvindustry_idmappings/industries.csvN:1
companies/company_industries.csvindustry_idmappings/industries.csvN:1

Build docs developers (and LLMs) love