Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/speedyapply/JobSpy/llms.txt

Use this file to discover all available pages before exploring further.

scrape_jobs() scrapes job data from one or more job boards concurrently and returns the results as a sorted pandas DataFrame.

Signature

from jobspy import scrape_jobs

jobs: pd.DataFrame = scrape_jobs(
    site_name=None,
    search_term=None,
    google_search_term=None,
    location=None,
    distance=50,
    is_remote=False,
    job_type=None,
    easy_apply=None,
    results_wanted=15,
    country_indeed="usa",
    proxies=None,
    ca_cert=None,
    description_format="markdown",
    linkedin_fetch_description=False,
    linkedin_company_ids=None,
    offset=0,
    hours_old=None,
    enforce_annual_salary=False,
    verbose=0,
    user_agent=None,
)

Parameters

site_name
str | list[str] | Site | list[Site] | None
default:"None"
The job board(s) to scrape. You can pass a single site as a string or Site enum, or a list of either. When None, all supported sites are scraped.Accepted string values: "linkedin", "indeed", "zip_recruiter", "glassdoor", "google", "bayt", "naukri", "bdjobs".
# Single site
scrape_jobs(site_name="indeed")

# Multiple sites
scrape_jobs(site_name=["indeed", "linkedin", "zip_recruiter"])

# Using the enum
from jobspy.model import Site
scrape_jobs(site_name=[Site.INDEED, Site.LINKEDIN])
search_term
str | None
default:"None"
The job title or keyword to search for. Used by all sites except Google (which uses google_search_term instead).
scrape_jobs(search_term="software engineer")
google_search_term
str | None
default:"None"
The search query used exclusively for Google Jobs. Google requires a very specific query syntax — copy the query string directly from the Google Jobs search box in your browser after applying your desired filters.
scrape_jobs(
    site_name="google",
    google_search_term="software engineer jobs near San Francisco, CA since yesterday",
)
The search_term parameter has no effect on Google searches. You must use google_search_term to filter Google results.
location
str | None
default:"None"
The geographic location to search within. Used by LinkedIn, Indeed, Glassdoor, and ZipRecruiter. For Indeed and Glassdoor, pair this with country_indeed to narrow results to a specific city or state.
scrape_jobs(location="San Francisco, CA")
distance
int | None
default:"50"
Search radius in miles from the specified location. Defaults to 50 miles.
is_remote
bool
default:"False"
When True, filters results to remote jobs only.
For Indeed, you can only use one of the following per search: hours_old, job_type + is_remote, or easy_apply. Combining them is not supported.
job_type
str | None
default:"None"
Filters results by employment type. Accepted values: "fulltime", "parttime", "contract", "temporary", "internship".
scrape_jobs(job_type="fulltime")
For Indeed, you can only use one of the following per search: hours_old, job_type + is_remote, or easy_apply.
easy_apply
bool | None
default:"None"
When True, filters for jobs that are hosted directly on the job board site (one-click apply). Note that LinkedIn Easy Apply filtering no longer works reliably.
For Indeed, you can only use one of the following per search: hours_old, job_type + is_remote, or easy_apply. For LinkedIn, you can only use one of: hours_old or easy_apply.
results_wanted
int
default:"15"
Number of job results to retrieve per site. For example, if you pass results_wanted=20 with three sites, you may receive up to 60 total results.
All job board endpoints are capped at approximately 1,000 jobs per search, regardless of this value.
country_indeed
str
default:"usa"
The country to search on Indeed and Glassdoor. Pass the country name as a string (e.g., "usa", "uk", "canada", "germany"). See the full list of supported countries.
scrape_jobs(site_name="indeed", country_indeed="uk")
LinkedIn searches globally and ignores this parameter. ZipRecruiter only searches the US and Canada.
proxies
list[str] | str | None
default:"None"
One or more proxy addresses. Each scraper rotates through the list in a round-robin fashion. Supports HTTP, HTTPS, and SOCKS5 proxies.
scrape_jobs(
    proxies=[
        "user:pass@host:port",
        "208.195.175.46:65095",
        "localhost",
    ]
)
Proxies are essential for LinkedIn, which rate-limits aggressively (typically after the 10th page per IP).
ca_cert
str | None
default:"None"
Path to a CA certificate file, used to verify SSL connections when routing through a proxy.
scrape_jobs(ca_cert="/path/to/ca-bundle.crt")
description_format
str
default:"markdown"
The format in which job descriptions are returned. Accepted values: "markdown", "html", "plain".
scrape_jobs(description_format="html")
linkedin_fetch_description
bool | None
default:"False"
When True, fetches the full job description and direct job URL for each LinkedIn result. This makes an additional HTTP request per job, so it increases total requests by O(n) and is significantly slower.
linkedin_company_ids
list[int] | None
default:"None"
Restricts the LinkedIn search to jobs posted by specific companies, identified by their LinkedIn company IDs.
scrape_jobs(
    site_name="linkedin",
    search_term="engineer",
    linkedin_company_ids=[1441, 9441],
)
offset
int | None
default:"0"
Starts the search from a specific result index. For example, an offset of 25 skips the first 25 results and begins from result 26. Useful for paginating through large result sets.
hours_old
int
default:"None"
Filters jobs by how recently they were posted, in hours. For example, hours_old=72 returns jobs posted within the last 3 days.
ZipRecruiter and Glassdoor round up to the next full day, so results may include jobs slightly older than the specified threshold.
For Indeed, you can only use one of: hours_old, job_type + is_remote, or easy_apply. For LinkedIn, you can only use one of: hours_old or easy_apply.
enforce_annual_salary
bool
default:"False"
When True, converts all salary values to annual equivalents. Hourly wages are multiplied by 2,080, monthly by 12, weekly by 52, and daily by 260.
scrape_jobs(enforce_annual_salary=True)
# hourly $50 → yearly $104,000
verbose
int
default:"0"
Controls how much logging output is printed at runtime.
ValueLevel
0Errors only
1Errors and warnings
2All logs (info, warnings, errors)
user_agent
str
default:"None"
Overrides the default HTTP User-Agent string used when making requests. Useful if the default agent has been blocked or is outdated.
scrape_jobs(user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) ...")

Return value

Returns a pd.DataFrame sorted by site ascending and date_posted descending. Each row represents a single job posting. If no jobs are found across all specified sites, an empty pd.DataFrame() is returned. The DataFrame columns are ordered as follows:
[
    "id",
    "site",
    "job_url",
    "job_url_direct",
    "title",
    "company",
    "location",
    "date_posted",
    "job_type",
    "salary_source",
    "interval",
    "min_amount",
    "max_amount",
    "currency",
    "is_remote",
    "job_level",
    "job_function",
    "listing_type",
    "emails",
    "description",
    "company_industry",
    "company_url",
    "company_logo",
    "company_url_direct",
    "company_addresses",
    "company_num_employees",
    "company_revenue",
    "company_description",
    # Naukri-specific
    "skills",
    "experience_range",
    "company_rating",
    "company_reviews_count",
    "vacancy_count",
    "work_from_home_type",
]
See JobPost schema for a description of each field.

Example

import csv
from jobspy import scrape_jobs

jobs = scrape_jobs(
    site_name=["indeed", "linkedin", "zip_recruiter", "google"],
    search_term="software engineer",
    google_search_term="software engineer jobs near San Francisco, CA since yesterday",
    location="San Francisco, CA",
    results_wanted=20,
    hours_old=72,
    country_indeed="usa",
    description_format="markdown",
    enforce_annual_salary=True,
    verbose=2,
)

print(f"Found {len(jobs)} jobs")
print(jobs.head())

# Export to CSV
jobs.to_csv(
    "jobs.csv",
    quoting=csv.QUOTE_NONNUMERIC,
    escapechar="\\",
    index=False,
)

# Export to Excel
jobs.to_excel("jobs.xlsx", index=False)
results_wanted is applied per site. With results_wanted=20 and four sites, you may receive up to 80 total results.

Build docs developers (and LLMs) love