Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/speedyapply/JobSpy/llms.txt

Use this file to discover all available pages before exploring further.

Job board websites rate-limit requests from the same IP address. When you exceed their threshold, you receive a 429 Too Many Requests response and scraping stops. Proxies let you rotate IP addresses to reduce this risk.

Which boards need proxies

BoardRate limitingRecommendation
IndeedMinimalNot required
ZipRecruiterModerateOptional
GlassdoorModerateOptional
LinkedInHighly restrictiveStrongly recommended
LinkedIn typically rate-limits around the 10th page of results from a single IP address. If you need more than ~100 LinkedIn results per run, proxies are effectively required.

The proxies parameter

Pass a list of proxy strings to scrape_jobs() via the proxies parameter. Each scraper rotates through the list in round-robin order.
from jobspy import scrape_jobs

jobs = scrape_jobs(
    site_name=["linkedin", "indeed"],
    search_term="software engineer",
    location="New York, NY",
    results_wanted=50,
    proxies=[
        "user:pass@host1:port",
        "user:pass@host2:port",
        "localhost",  # fall back to direct connection
    ],
)

Proxy format

Proxies are strings in the format user:pass@host:port. You can also use "localhost" to represent a direct (no-proxy) connection slot in the rotation.
proxies = [
    "alice:secret@192.168.1.10:8080",
    "bob:pass123@10.0.0.5:3128",
    "localhost",
]
JobSpy also accepts URLs with explicit schemes:
proxies = [
    "http://user:pass@proxy.example.com:8080",
    "https://user:pass@proxy.example.com:8443",
    "socks5://user:pass@proxy.example.com:1080",
]

Single proxy

You can pass a single proxy as a string instead of a list:
jobs = scrape_jobs(
    site_name="linkedin",
    search_term="backend engineer",
    location="San Francisco, CA",
    proxies="user:pass@proxy.example.com:8080",
)

How rotation works

Each scraper instance gets its own rotating proxy session. When a scraper makes a request, it advances to the next proxy in the cycle. If one proxy is blocked, the next request will use a different proxy automatically. This means if you are scraping four sites simultaneously, each site rotates through the proxy list independently.

CA certificate for proxies

Some corporate or SSL-intercepting proxies require a custom CA certificate for HTTPS inspection. Pass the path to the certificate file via ca_cert.
jobs = scrape_jobs(
    site_name=["linkedin", "indeed"],
    search_term="devops engineer",
    location="Austin, TX",
    proxies=["user:pass@corporate-proxy.example.com:8080"],
    ca_cert="/path/to/ca-bundle.crt",
)

Overriding the user agent

The default user agent string may become outdated as job boards update their bot detection. Use user_agent to override it with a current browser user agent.
jobs = scrape_jobs(
    site_name="linkedin",
    search_term="machine learning engineer",
    location="Seattle, WA",
    user_agent=(
        "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
        "AppleWebKit/537.36 (KHTML, like Gecko) "
        "Chrome/124.0.0.0 Safari/537.36"
    ),
)

Handling rate limit errors

A 429 response means the job board has temporarily blocked your IP. When this happens:
  1. Wait before scraping again. The required wait time is site-dependent — LinkedIn may need several minutes, while other boards recover faster.
  2. Add more proxies to the rotation to distribute requests across more IP addresses.
  3. Reduce results_wanted to make fewer requests per run.
All job board endpoints are aggressive with blocking. Never scrape in a tight loop without delays.

Build docs developers (and LLMs) love