Documentation Index
Fetch the complete documentation index at: https://mintlify.com/speedyapply/JobSpy/llms.txt
Use this file to discover all available pages before exploring further.
scrape_jobs() scrapes job data from one or more job boards concurrently and returns the results as a sorted pandas DataFrame.
Signature
Parameters
The job board(s) to scrape. You can pass a single site as a string or
Site enum, or a list of either. When None, all supported sites are scraped.Accepted string values: "linkedin", "indeed", "zip_recruiter", "glassdoor", "google", "bayt", "naukri", "bdjobs".The job title or keyword to search for. Used by all sites except Google (which uses
google_search_term instead).The search query used exclusively for Google Jobs. Google requires a very specific query syntax — copy the query string directly from the Google Jobs search box in your browser after applying your desired filters.
The
search_term parameter has no effect on Google searches. You must use google_search_term to filter Google results.The geographic location to search within. Used by LinkedIn, Indeed, Glassdoor, and ZipRecruiter. For Indeed and Glassdoor, pair this with
country_indeed to narrow results to a specific city or state.Search radius in miles from the specified
location. Defaults to 50 miles.When
True, filters results to remote jobs only.Filters results by employment type. Accepted values:
"fulltime", "parttime", "contract", "temporary", "internship".When
True, filters for jobs that are hosted directly on the job board site (one-click apply). Note that LinkedIn Easy Apply filtering no longer works reliably.Number of job results to retrieve per site. For example, if you pass
results_wanted=20 with three sites, you may receive up to 60 total results.All job board endpoints are capped at approximately 1,000 jobs per search, regardless of this value.
The country to search on Indeed and Glassdoor. Pass the country name as a string (e.g., LinkedIn searches globally and ignores this parameter. ZipRecruiter only searches the US and Canada.
"usa", "uk", "canada", "germany"). See the full list of supported countries.One or more proxy addresses. Each scraper rotates through the list in a round-robin fashion. Supports HTTP, HTTPS, and SOCKS5 proxies.
Path to a CA certificate file, used to verify SSL connections when routing through a proxy.
The format in which job descriptions are returned. Accepted values:
"markdown", "html", "plain".When
True, fetches the full job description and direct job URL for each LinkedIn result. This makes an additional HTTP request per job, so it increases total requests by O(n) and is significantly slower.Restricts the LinkedIn search to jobs posted by specific companies, identified by their LinkedIn company IDs.
Starts the search from a specific result index. For example, an offset of
25 skips the first 25 results and begins from result 26. Useful for paginating through large result sets.Filters jobs by how recently they were posted, in hours. For example,
hours_old=72 returns jobs posted within the last 3 days.ZipRecruiter and Glassdoor round up to the next full day, so results may include jobs slightly older than the specified threshold.
When
True, converts all salary values to annual equivalents. Hourly wages are multiplied by 2,080, monthly by 12, weekly by 52, and daily by 260.Controls how much logging output is printed at runtime.
| Value | Level |
|---|---|
0 | Errors only |
1 | Errors and warnings |
2 | All logs (info, warnings, errors) |
Overrides the default HTTP User-Agent string used when making requests. Useful if the default agent has been blocked or is outdated.
Return value
Returns apd.DataFrame sorted by site ascending and date_posted descending. Each row represents a single job posting.
If no jobs are found across all specified sites, an empty pd.DataFrame() is returned.
The DataFrame columns are ordered as follows:
Example
results_wanted is applied per site. With results_wanted=20 and four sites, you may receive up to 80 total results.