scrape_jobs()

scrape_jobs() scrapes job data from one or more job boards concurrently and returns the results as a sorted pandas DataFrame.

Signature

from jobspy import scrape_jobs

jobs: pd.DataFrame = scrape_jobs(
    site_name=None,
    search_term=None,
    google_search_term=None,
    location=None,
    distance=50,
    is_remote=False,
    job_type=None,
    easy_apply=None,
    results_wanted=15,
    country_indeed="usa",
    proxies=None,
    ca_cert=None,
    description_format="markdown",
    linkedin_fetch_description=False,
    linkedin_company_ids=None,
    offset=0,
    hours_old=None,
    enforce_annual_salary=False,
    verbose=0,
    user_agent=None,
)

Parameters

site_name

str | list[str] | Site | list[Site] | None

default:"None"

The job board(s) to scrape. You can pass a single site as a string or Site enum, or a list of either. When None, all supported sites are scraped.Accepted string values: "linkedin", "indeed", "zip_recruiter", "glassdoor", "google", "bayt", "naukri", "bdjobs".

# Single site
scrape_jobs(site_name="indeed")

# Multiple sites
scrape_jobs(site_name=["indeed", "linkedin", "zip_recruiter"])

# Using the enum
from jobspy.model import Site
scrape_jobs(site_name=[Site.INDEED, Site.LINKEDIN])

search_term

str | None

default:"None"

The job title or keyword to search for. Used by all sites except Google (which uses google_search_term instead).

scrape_jobs(search_term="software engineer")

google_search_term

str | None

default:"None"

The search query used exclusively for Google Jobs. Google requires a very specific query syntax — copy the query string directly from the Google Jobs search box in your browser after applying your desired filters.

scrape_jobs(
    site_name="google",
    google_search_term="software engineer jobs near San Francisco, CA since yesterday",
)

The search_term parameter has no effect on Google searches. You must use google_search_term to filter Google results.

location

str | None

default:"None"

The geographic location to search within. Used by LinkedIn, Indeed, Glassdoor, and ZipRecruiter. For Indeed and Glassdoor, pair this with country_indeed to narrow results to a specific city or state.

scrape_jobs(location="San Francisco, CA")

distance

int | None

default:"50"

Search radius in miles from the specified location. Defaults to 50 miles.

is_remote

bool

default:"False"

When True, filters results to remote jobs only.

For Indeed, you can only use one of the following per search: hours_old, job_type + is_remote, or easy_apply. Combining them is not supported.

job_type

str | None

default:"None"

Filters results by employment type. Accepted values: "fulltime", "parttime", "contract", "temporary", "internship".

scrape_jobs(job_type="fulltime")

For Indeed, you can only use one of the following per search: hours_old, job_type + is_remote, or easy_apply.

easy_apply

bool | None

default:"None"

When True, filters for jobs that are hosted directly on the job board site (one-click apply). Note that LinkedIn Easy Apply filtering no longer works reliably.

For Indeed, you can only use one of the following per search: hours_old, job_type + is_remote, or easy_apply. For LinkedIn, you can only use one of: hours_old or easy_apply.

results_wanted

int

default:"15"

Number of job results to retrieve per site. For example, if you pass results_wanted=20 with three sites, you may receive up to 60 total results.

All job board endpoints are capped at approximately 1,000 jobs per search, regardless of this value.

country_indeed

str

default:"usa"

The country to search on Indeed and Glassdoor. Pass the country name as a string (e.g., "usa", "uk", "canada", "germany"). See the full list of supported countries.

scrape_jobs(site_name="indeed", country_indeed="uk")

LinkedIn searches globally and ignores this parameter. ZipRecruiter only searches the US and Canada.

proxies

list[str] | str | None

default:"None"

One or more proxy addresses. Each scraper rotates through the list in a round-robin fashion. Supports HTTP, HTTPS, and SOCKS5 proxies.

scrape_jobs(
    proxies=[
        "user:pass@host:port",
        "208.195.175.46:65095",
        "localhost",
    ]
)

Proxies are essential for LinkedIn, which rate-limits aggressively (typically after the 10th page per IP).

ca_cert

str | None

default:"None"

Path to a CA certificate file, used to verify SSL connections when routing through a proxy.

scrape_jobs(ca_cert="/path/to/ca-bundle.crt")

description_format

str

default:"markdown"

The format in which job descriptions are returned. Accepted values: "markdown", "html", "plain".

scrape_jobs(description_format="html")

linkedin_fetch_description

bool | None

default:"False"

When True, fetches the full job description and direct job URL for each LinkedIn result. This makes an additional HTTP request per job, so it increases total requests by O(n) and is significantly slower.

linkedin_company_ids

list[int] | None

default:"None"

Restricts the LinkedIn search to jobs posted by specific companies, identified by their LinkedIn company IDs.

scrape_jobs(
    site_name="linkedin",
    search_term="engineer",
    linkedin_company_ids=[1441, 9441],
)

offset

int | None

default:"0"

Starts the search from a specific result index. For example, an offset of 25 skips the first 25 results and begins from result 26. Useful for paginating through large result sets.

hours_old

int

default:"None"

Filters jobs by how recently they were posted, in hours. For example, hours_old=72 returns jobs posted within the last 3 days.

ZipRecruiter and Glassdoor round up to the next full day, so results may include jobs slightly older than the specified threshold.

For Indeed, you can only use one of: hours_old, job_type + is_remote, or easy_apply. For LinkedIn, you can only use one of: hours_old or easy_apply.

enforce_annual_salary

bool

default:"False"

When True, converts all salary values to annual equivalents. Hourly wages are multiplied by 2,080, monthly by 12, weekly by 52, and daily by 260.

scrape_jobs(enforce_annual_salary=True)
# hourly $50 → yearly $104,000

verbose

int

default:"0"

Controls how much logging output is printed at runtime.

Value	Level
`0`	Errors only
`1`	Errors and warnings
`2`	All logs (info, warnings, errors)

user_agent

str

default:"None"

Overrides the default HTTP User-Agent string used when making requests. Useful if the default agent has been blocked or is outdated.

scrape_jobs(user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) ...")

Return value

Returns a pd.DataFrame sorted by site ascending and date_posted descending. Each row represents a single job posting. If no jobs are found across all specified sites, an empty pd.DataFrame() is returned. The DataFrame columns are ordered as follows:

[
    "id",
    "site",
    "job_url",
    "job_url_direct",
    "title",
    "company",
    "location",
    "date_posted",
    "job_type",
    "salary_source",
    "interval",
    "min_amount",
    "max_amount",
    "currency",
    "is_remote",
    "job_level",
    "job_function",
    "listing_type",
    "emails",
    "description",
    "company_industry",
    "company_url",
    "company_logo",
    "company_url_direct",
    "company_addresses",
    "company_num_employees",
    "company_revenue",
    "company_description",
    # Naukri-specific
    "skills",
    "experience_range",
    "company_rating",
    "company_reviews_count",
    "vacancy_count",
    "work_from_home_type",
]

See JobPost schema for a description of each field.

Example

import csv
from jobspy import scrape_jobs

jobs = scrape_jobs(
    site_name=["indeed", "linkedin", "zip_recruiter", "google"],
    search_term="software engineer",
    google_search_term="software engineer jobs near San Francisco, CA since yesterday",
    location="San Francisco, CA",
    results_wanted=20,
    hours_old=72,
    country_indeed="usa",
    description_format="markdown",
    enforce_annual_salary=True,
    verbose=2,
)

print(f"Found {len(jobs)} jobs")
print(jobs.head())

# Export to CSV
jobs.to_csv(
    "jobs.csv",
    quoting=csv.QUOTE_NONNUMERIC,
    escapechar="\\",
    index=False,
)

# Export to Excel
jobs.to_excel("jobs.xlsx", index=False)

results_wanted is applied per site. With results_wanted=20 and four sites, you may receive up to 80 total results.

Get Started

Guides

Job Boards

Reference

Help

Signature

Parameters

Return value

Example

Build docs developers (and LLMs) love

Get Started

Guides

Job Boards

Reference

Help

Documentation Index

​Signature

​Parameters

​Return value

​Example

Build docs developers (and LLMs) love

Signature

Parameters

Return value

Example