Full Dataset Schema: Tables, Columns, and Data Types

The 11 CSV files in the LinkedIn Job Postings dataset are linked through a consistent set of foreign keys — job_id ties every job-level satellite table back to postings.csv, and company_id ties every company-level satellite back to companies/companies.csv. Two lookup tables in mappings/ provide human-readable labels for the abbreviated codes used throughout the relational tables. Together, these files form a normalized relational structure that must be joined and aggregated before any analysis can begin. This page documents every column in every file, including data types, observed null rates, and the relationships that connect tables to one another.

`postings.csv` — 31 Columns

The main fact table. Each row represents a single job posting as it appeared on LinkedIn. Columns fall into two broad categories.

Numeric Columns (16)

Column	Type	Description
`job_id`	`int64`	Primary key — unique identifier for the posting
`max_salary`	`float64`	Upper bound of the advertised salary range
`min_salary`	`float64`	Lower bound of the advertised salary range
`med_salary`	`float64`	Midpoint salary (rarely populated)
`normalized_salary`	`float64`	Platform-normalized annual salary estimate
`company_id`	`float64`	Foreign key → `companies/companies.csv`
`views`	`float64`	Number of times the posting was viewed
`applies`	`float64`	Number of applications submitted (Easy Apply only)
`remote_allowed`	`float64`	Binary flag: 1 = remote permitted, 0 = not (87.7% null)
`original_listed_time`	`int64`	Unix timestamp of original listing date
`listed_time`	`int64`	Unix timestamp of most recent listing refresh
`expiry`	`float64`	Unix timestamp of posting expiry
`closed_time`	`float64`	Unix timestamp when posting was closed (99.1% null)
`sponsored`	`int64`	Binary flag: 1 = sponsored/promoted listing
`zip_code`	`float64`	ZIP code of the posting location
`fips`	`float64`	FIPS county code

Categorical Columns (15)

Column	Type	Description
`company_name`	`object`	Employer name as it appears on LinkedIn
`title`	`object`	Job title as posted
`description`	`object`	Full text of the job description
`pay_period`	`object`	Salary pay period: HOURLY, MONTHLY, WEEKLY, YEARLY
`location`	`object`	Freeform location string (city, state, country)
`formatted_work_type`	`object`	Standardized work arrangement: On-site / Remote / Hybrid
`work_type`	`object`	Raw work type code
`formatted_experience_level`	`object`	Standardized seniority: Entry / Mid-Senior / Director / etc.
`application_type`	`object`	How candidates apply: OffSiteApply or SimpleOnsiteApply
`application_url`	`object`	External application URL (for off-site applications)
`job_posting_url`	`object`	Direct URL to the LinkedIn posting
`skills_desc`	`object`	Free-text skills description embedded in the posting (98% null)
`posting_domain`	`object`	Domain of the external application URL
`currency`	`object`	Currency code for salary values (e.g. USD)
`compensation_type`	`object`	Compensation structure: BASE_SALARY, FIXED, etc.

Missing Value Rates — `postings.csv`

Column	Null Rate	Notes
`closed_time`	99.1%	Practically unusable — dropped in preprocessing
`skills_desc`	98.0%	Use `jobs/job_skills.csv` instead
`med_salary`	94.9%	Low coverage; `min`/`max` range preferred
`remote_allowed`	87.7%	Companies rarely disclose; use `formatted_work_type`
`applies`	80.6%	Only Easy Apply submissions are tracked
`min_salary` / `max_salary`	~75%	Only 24% of postings include a salary range
`formatted_experience_level`	23.7%	Imputed with mode (“Mid-Senior Level”) in Phase 2

`companies/companies.csv` — 10 Columns

One row per company. Links to postings.csv via company_id.

Column	Type	Description
`company_id`	`int64`	Primary key
`name`	`object`	Company display name
`description`	`object`	Company summary text
`company_size`	`float64`	Encoded headcount bucket (1 = 1–10 employees … 7 = 10,001+)
`state`	`object`	US state abbreviation
`country`	`object`	ISO country code
`city`	`object`	City name
`zip_code`	`object`	Postal code
`address`	`object`	Street address
`url`	`object`	Company LinkedIn profile URL

`jobs/salaries.csv` — 8 Columns

Salary data scraped from individual postings. Covers 40,785 rows — a superset of the postings that have salary data in postings.csv itself. Links to postings.csv via job_id.

Column	Type	Description
`salary_id`	`int64`	Primary key for this salary record
`job_id`	`int64`	Foreign key → `postings.csv`
`max_salary`	`float64`	Upper bound of the advertised range
`med_salary`	`float64`	Midpoint salary
`min_salary`	`float64`	Lower bound of the advertised range
`pay_period`	`object`	Pay cadence: HOURLY, MONTHLY, WEEKLY, or YEARLY
`currency`	`object`	Currency code (predominantly `USD`)
`compensation_type`	`object`	Compensation structure: BASE_SALARY, FIXED, etc.

The pay_period column is critical for salary normalization. All four values appear in the data:

`pay_period` Value	Multiplier to Annual	Example Raw Value → Annual
`HOURLY`	× 2,080 (40h × 52w)	$45/hr →$ 93,600
`MONTHLY`	× 12	$8,000/mo →$ 96,000
`WEEKLY`	× 52	$2,000/wk →$ 104,000
`YEARLY`	× 1 (no change)	$120,000 →$ 120,000

`jobs/job_skills.csv` — 2 Columns

Maps job postings to skill categories. One row per job–skill pair; a single posting can have many skill assignments (213,768 rows total for 123,849 postings).

Column	Type	Description
`job_id`	`int64`	Foreign key → `postings.csv`
`skill_abr`	`object`	Abbreviated skill code — decoded via `mappings/skills.csv`

The skill_abr field stores short codes such as IT, DATA, MRKT, DSGN, WRT, and 30 others. These are not human-readable without the mapping table.

Always join jobs/job_skills.csv with mappings/skills.csv on skill_abr before visualizing or aggregating skill demand. The raw abbreviations (e.g. ITSM, MNGT, SALE) are opaque and will produce misleading axis labels in charts if used directly. The mappings/skills.csv lookup resolves each code to a full category name like “Information Technology”, “Management”, or “Sales”.

`jobs/benefits.csv` — 3 Columns

Lists benefits associated with each posting. One row per job–benefit pair (67,943 rows). Links to postings.csv via job_id.

Column	Type	Description
`job_id`	`int64`	Foreign key → `postings.csv`
`inferred`	`bool`	`True` if benefit was inferred from description text; `False` if explicitly listed
`type`	`object`	Benefit category label

Common values for type include: Medical insurance, Vision insurance, Dental insurance, 401(K), Paid maternity leave, Paid paternity leave, Disability insurance, Student loan assistance, Commuter benefits, Tuition assistance.

`companies/company_industries.csv` and `jobs/job_industries.csv`

Both files share the same 2-column structure:

Column	Type	Description
`company_id` / `job_id`	`int64`	Foreign key to the parent table
`industry_id`	`int64`	Foreign key → `mappings/industries.csv`

mappings/industries.csv resolves each industry_id to a human-readable industry_name string across 422 industry categories.

Primary / Foreign Key Relationships

The table below summarizes every join path available in the dataset:

Left Table	Join Key	Right Table	Relationship
`postings.csv`	`job_id`	`jobs/salaries.csv`	1:1 (some postings have no salary row)
`postings.csv`	`job_id`	`jobs/job_skills.csv`	1:N
`postings.csv`	`job_id`	`jobs/benefits.csv`	1:N
`postings.csv`	`job_id`	`jobs/job_industries.csv`	1:N
`postings.csv`	`company_id`	`companies/companies.csv`	N:1
`companies/companies.csv`	`company_id`	`companies/company_industries.csv`	1:N
`companies/companies.csv`	`company_id`	`companies/company_specialities.csv`	1:N
`companies/companies.csv`	`company_id`	`companies/employee_counts.csv`	1:N
`jobs/job_skills.csv`	`skill_abr`	`mappings/skills.csv`	N:1
`jobs/job_industries.csv`	`industry_id`	`mappings/industries.csv`	N:1
`companies/company_industries.csv`	`industry_id`	`mappings/industries.csv`	N:1

Overview

Dataset

Analysis Phases

Bias Analysis

Findings & Recommendations

`postings.csv` — 31 Columns

Numeric Columns (16)

Categorical Columns (15)

Missing Value Rates — `postings.csv`

`companies/companies.csv` — 10 Columns

`jobs/salaries.csv` — 8 Columns

`jobs/job_skills.csv` — 2 Columns

`jobs/benefits.csv` — 3 Columns

`companies/company_industries.csv` and `jobs/job_industries.csv`

Primary / Foreign Key Relationships

Build docs developers (and LLMs) love

Overview

Dataset

Analysis Phases

Bias Analysis

Findings & Recommendations

Documentation Index

​postings.csv — 31 Columns

​Numeric Columns (16)

​Categorical Columns (15)

​Missing Value Rates — postings.csv

​companies/companies.csv — 10 Columns

​jobs/salaries.csv — 8 Columns

​jobs/job_skills.csv — 2 Columns

​jobs/benefits.csv — 3 Columns

​companies/company_industries.csv and jobs/job_industries.csv

​Primary / Foreign Key Relationships

Build docs developers (and LLMs) love

`postings.csv` — 31 Columns

Numeric Columns (16)

Categorical Columns (15)

Missing Value Rates — `postings.csv`

`companies/companies.csv` — 10 Columns

`jobs/salaries.csv` — 8 Columns

`jobs/job_skills.csv` — 2 Columns

`jobs/benefits.csv` — 3 Columns

`companies/company_industries.csv` and `jobs/job_industries.csv`

Primary / Foreign Key Relationships