Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/speedyapply/JobSpy/llms.txt

Use this file to discover all available pages before exploring further.

scrape_jobs() is the single entry point for all job scraping in JobSpy. It accepts parameters for every supported job board and returns a unified Pandas DataFrame.

How concurrent scraping works

Internally, scrape_jobs() uses a ThreadPoolExecutor to scrape all requested sites at the same time. Each site runs in its own thread, so scraping five boards takes roughly as long as scraping the slowest one — not the sum of all five.
from concurrent.futures import ThreadPoolExecutor, as_completed
# JobSpy manages this for you — no setup required on your end

Basic usage

from jobspy import scrape_jobs

jobs = scrape_jobs(
    site_name=["indeed", "linkedin", "zip_recruiter"],
    search_term="software engineer",
    location="San Francisco, CA",
    results_wanted=20,
)
print(f"Found {len(jobs)} jobs")
print(jobs.head())

Choosing which sites to scrape

The site_name parameter accepts a string, a list of strings, or a Site enum (or list of Site enums).
from jobspy import scrape_jobs

# Omitting site_name scrapes all supported boards
jobs = scrape_jobs(
    search_term="data scientist",
    location="New York, NY",
)
The supported values for site_name are:
String valueBoard
"linkedin"LinkedIn
"indeed"Indeed
"glassdoor"Glassdoor
"zip_recruiter"ZipRecruiter
"google"Google Jobs
"bayt"Bayt
"naukri"Naukri
"bdjobs"BDJobs

Controlling the number of results

The results_wanted parameter sets how many job results to retrieve per site. If you scrape three sites with results_wanted=20, you may receive up to 60 results total.
jobs = scrape_jobs(
    site_name=["indeed", "linkedin", "zip_recruiter"],
    search_term="product manager",
    location="Austin, TX",
    results_wanted=25,  # up to 25 results from each site
)
All job board endpoints are capped at around 1,000 jobs per search, regardless of results_wanted.

Filtering by recency

Use hours_old to limit results to jobs posted within the last N hours.
jobs = scrape_jobs(
    site_name=["indeed", "linkedin"],
    search_term="devops engineer",
    location="Seattle, WA",
    results_wanted=30,
    hours_old=24,  # only jobs posted in the last 24 hours
)
ZipRecruiter and Glassdoor round hours_old up to the next full day.

Controlling log output

The verbose parameter controls how much JobSpy prints during scraping.
ValueBehavior
0Errors only (default)
1Errors and warnings
2All logs
jobs = scrape_jobs(
    site_name="indeed",
    search_term="backend engineer",
    location="Chicago, IL",
    verbose=0,  # silent except for errors
)

Full example

import csv
from jobspy import scrape_jobs

jobs = scrape_jobs(
    site_name=["indeed", "linkedin", "zip_recruiter", "google"],
    search_term="software engineer",
    google_search_term="software engineer jobs near San Francisco, CA since yesterday",
    location="San Francisco, CA",
    results_wanted=20,
    hours_old=72,
    country_indeed="USA",
    verbose=1,
)

print(f"Found {len(jobs)} jobs")
jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False)
The google_search_term parameter is the only way to filter Google Jobs results. Copy the query string from the Google Jobs search box after applying filters in your browser.

Build docs developers (and LLMs) love