Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/speedyapply/JobSpy/llms.txt

Use this file to discover all available pages before exploring further.

Each job returned by scrape_jobs() is internally represented as a JobPost Pydantic model. When building the output DataFrame, the model fields are flattened and nested objects (such as location and compensation) are expanded into individual columns.

JobPost fields

id
str | None
Platform-specific job identifier. May be None for some job boards.
title
str
required
Job title as listed by the employer.
company_name
str | None
Name of the hiring company. Maps to the company column in the DataFrame.
job_url
str
required
Canonical URL of the job listing on the job board.
job_url_direct
str | None
Direct URL to the employer’s own job application page, when available. For LinkedIn, this requires linkedin_fetch_description=True.
location
Location | None
Structured location of the job. Flattened to a display string in the DataFrame (e.g., "San Francisco, CA, USA").
description
str | None
Full job description. The format is controlled by the description_format parameter passed to scrape_jobs() — one of "markdown", "html", or "plain".
company_url
str | None
URL of the company’s profile page on the job board.
company_url_direct
str | None
URL of the company’s own website, when available.
job_type
list[JobType] | None
Employment type(s) associated with the posting (e.g., FULL_TIME, CONTRACT). In the DataFrame, this is serialized as a comma-separated string of the primary values (e.g., "fulltime"). See JobType.
compensation
Compensation | None
Salary or compensation data. In the DataFrame, this object is expanded into four separate columns: interval, min_amount, max_amount, and currency.
date_posted
date | None
The date the job was posted, as a Python datetime.date object.
emails
list[str] | None
Email addresses extracted from the job description. In the DataFrame, serialized as a comma-separated string.
is_remote
bool | None
Whether the job is remote. True for remote, False for on-site, None if not specified.
listing_type
str | None
The type of listing (e.g., organic vs. sponsored), when available from the job board.
job_level
str | None
Seniority or experience level (e.g., "Entry level", "Mid-Senior level"). LinkedIn-specific.
company_industry
str | None
The industry the company operates in. Available from LinkedIn and Indeed.
company_addresses
str | None
Physical address(es) of the company. Indeed-specific.
company_num_employees
str | None
Company headcount range as a string (e.g., "1001-5000 employees"). Indeed-specific.
company_revenue
str | None
Company annual revenue as a string (e.g., "$1B to $5B"). Indeed-specific.
company_description
str | None
Short description of the company from its job board profile. Indeed-specific.
URL of the company’s logo image. Indeed-specific.
banner_photo_url
str | None
URL of the company’s banner photo. Indeed-specific.
job_function
str | None
Functional area of the role (e.g., "Engineering", "Product Management"). LinkedIn-specific.
skills
list[str] | None
Skills and technologies listed in the posting. In the DataFrame, serialized as a comma-separated string. Naukri-specific.
experience_range
str | None
Required years of experience as a range string (e.g., "2-5 Yrs"). Naukri-specific.
company_rating
float | None
Aggregate company rating from AmbitionBox. Naukri-specific.
company_reviews_count
int | None
Number of company reviews on AmbitionBox. Naukri-specific.
vacancy_count
int | None
Number of open positions for this listing. Naukri-specific.
work_from_home_type
str | None
Work arrangement type (e.g., "Hybrid", "Remote", "Work from Office"). Naukri-specific.

DataFrame column mapping

The table below maps each JobPost model field to its corresponding column name in the scrape_jobs() output DataFrame.
Model fieldDataFrame columnNotes
idid
(added by scraper)siteSource job board name
job_urljob_url
job_url_directjob_url_direct
titletitle
company_namecompanyRenamed in DataFrame
locationlocationFlattened to display string
date_posteddate_posted
job_typejob_typeComma-separated string
(from compensation)salary_source"direct_data" or "description"
compensation.intervalinterval
compensation.min_amountmin_amount
compensation.max_amountmax_amount
compensation.currencycurrency
is_remoteis_remote
job_leveljob_levelLinkedIn only
job_functionjob_functionLinkedIn only
listing_typelisting_type
emailsemailsComma-separated string
descriptiondescription
company_industrycompany_industryLinkedIn & Indeed
company_urlcompany_url
company_logocompany_logoIndeed only
company_url_directcompany_url_direct
company_addressescompany_addressesIndeed only
company_num_employeescompany_num_employeesIndeed only
company_revenuecompany_revenueIndeed only
company_descriptioncompany_descriptionIndeed only
skillsskillsNaukri only, comma-separated
experience_rangeexperience_rangeNaukri only
company_ratingcompany_ratingNaukri only
company_reviews_countcompany_reviews_countNaukri only
vacancy_countvacancy_countNaukri only
work_from_home_typework_from_home_typeNaukri only

Example

from jobspy import scrape_jobs

jobs = scrape_jobs(
    site_name=["indeed", "linkedin"],
    search_term="data engineer",
    results_wanted=10,
)

# Access individual columns
print(jobs[["title", "company", "location", "min_amount", "max_amount", "currency"]])

# Filter remote jobs with salary data
remote_with_salary = jobs[
    (jobs["is_remote"] == True) & (jobs["min_amount"].notna())
]

Build docs developers (and LLMs) love