GET /searchpdfs/ — find PDF documents on the web

The /searchpdfs/ endpoint searches for publicly accessible PDF files and returns an array of direct PDF URLs. It works by prepending filetype:pdf to your query before passing it to the underlying DDGS search, so you do not need to include the operator yourself. This is useful for collecting PDF datasets — research reports, whitepapers, textbook chapters, technical manuals, and similar documents.

The query parameter for this endpoint is named limits (with a trailing s), not limit. Using the wrong parameter name will cause the request to fail.

Request

GET /searchpdfs/?query={query}&limits={limits}

Parameters

query

string

required

The search query. Do not include filetype:pdf — this is added automatically. URL-encode spaces as + or %20.

limits

integer

required

Maximum number of PDF URLs to return. Note the parameter name is limits, not limit.

Response

Returns a JSON array of URL strings pointing to PDF files on success.

[*]

string

A direct URL to a PDF document. URLs typically end with .pdf but may include query strings or redirects.

Example

curl "http://localhost:8000/searchpdfs/?query=machine+learning+survey&limits=5"

Response

[
  "https://arxiv.org/pdf/1811.12560.pdf",
  "https://jmlr.org/papers/volume17/15-538/15-538.pdf",
  "https://www.cs.toronto.edu/~rsalakhu/papers/science_rbm.pdf",
  "https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf",
  "https://www.deeplearningbook.org/contents/intro.html"
]

Error response

500

{ "error": "PDF search failed" }

Use case

Combine this endpoint with a document parser to build a corpus of PDF text for LLM pre-training or fine-tuning. For example, collect 100 PDFs on a topic, download them, extract text with pdfminer or pypdf, and write the results to JSONL. For other filetypes (DOCX, PPTX, XLSX, and more), use the /search/specific/ endpoint instead.

Overview

Endpoints

GET /searchpdfs/ — find PDF documents on the web

Request

Parameters

Response

Example

Error response

Use case

Build docs developers (and LLMs) love

Overview

Endpoints

Documentation Index

​Request

​Parameters

​Response

​Example

​Error response

​Use case

Build docs developers (and LLMs) love

Request

Parameters

Response

Example

Error response

Use case