Scraping jobs

scrape_jobs() is the single entry point for all job scraping in JobSpy. It accepts parameters for every supported job board and returns a unified Pandas DataFrame.

How concurrent scraping works

Internally, scrape_jobs() uses a ThreadPoolExecutor to scrape all requested sites at the same time. Each site runs in its own thread, so scraping five boards takes roughly as long as scraping the slowest one — not the sum of all five.

from concurrent.futures import ThreadPoolExecutor, as_completed
# JobSpy manages this for you — no setup required on your end

Basic usage

from jobspy import scrape_jobs

jobs = scrape_jobs(
    site_name=["indeed", "linkedin", "zip_recruiter"],
    search_term="software engineer",
    location="San Francisco, CA",
    results_wanted=20,
)
print(f"Found {len(jobs)} jobs")
print(jobs.head())

Choosing which sites to scrape

The site_name parameter accepts a string, a list of strings, or a Site enum (or list of Site enums).

from jobspy import scrape_jobs

# Omitting site_name scrapes all supported boards
jobs = scrape_jobs(
    search_term="data scientist",
    location="New York, NY",
)

The supported values for site_name are:

String value	Board
`"linkedin"`	LinkedIn
`"indeed"`	Indeed
`"glassdoor"`	Glassdoor
`"zip_recruiter"`	ZipRecruiter
`"google"`	Google Jobs
`"bayt"`	Bayt
`"naukri"`	Naukri
`"bdjobs"`	BDJobs

Controlling the number of results

The results_wanted parameter sets how many job results to retrieve per site. If you scrape three sites with results_wanted=20, you may receive up to 60 results total.

jobs = scrape_jobs(
    site_name=["indeed", "linkedin", "zip_recruiter"],
    search_term="product manager",
    location="Austin, TX",
    results_wanted=25,  # up to 25 results from each site
)

All job board endpoints are capped at around 1,000 jobs per search, regardless of results_wanted.

Filtering by recency

Use hours_old to limit results to jobs posted within the last N hours.

jobs = scrape_jobs(
    site_name=["indeed", "linkedin"],
    search_term="devops engineer",
    location="Seattle, WA",
    results_wanted=30,
    hours_old=24,  # only jobs posted in the last 24 hours
)

ZipRecruiter and Glassdoor round hours_old up to the next full day.

Controlling log output

The verbose parameter controls how much JobSpy prints during scraping.

Value	Behavior
`0`	Errors only (default)
`1`	Errors and warnings
`2`	All logs

jobs = scrape_jobs(
    site_name="indeed",
    search_term="backend engineer",
    location="Chicago, IL",
    verbose=0,  # silent except for errors
)

Full example

import csv
from jobspy import scrape_jobs

jobs = scrape_jobs(
    site_name=["indeed", "linkedin", "zip_recruiter", "google"],
    search_term="software engineer",
    google_search_term="software engineer jobs near San Francisco, CA since yesterday",
    location="San Francisco, CA",
    results_wanted=20,
    hours_old=72,
    country_indeed="USA",
    verbose=1,
)

print(f"Found {len(jobs)} jobs")
jobs.to_csv("jobs.csv", quoting=csv.QUOTE_NONNUMERIC, escapechar="\\", index=False)

The google_search_term parameter is the only way to filter Google Jobs results. Copy the query string from the Google Jobs search box after applying filters in your browser.

Get Started

Guides

Job Boards

Reference

Help

How concurrent scraping works

Basic usage

Choosing which sites to scrape

Controlling the number of results

Filtering by recency

Controlling log output

Full example

Build docs developers (and LLMs) love

Get Started

Guides

Job Boards

Reference

Help

Documentation Index

​How concurrent scraping works

​Basic usage

​Choosing which sites to scrape

​Controlling the number of results

​Filtering by recency

​Controlling log output

​Full example

Build docs developers (and LLMs) love

How concurrent scraping works

Basic usage

Choosing which sites to scrape

Controlling the number of results

Filtering by recency

Controlling log output

Full example