Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/pratyay360/searchapi/llms.txt

Use this file to discover all available pages before exploring further.

The /searchpdfs/ endpoint searches for publicly accessible PDF files and returns an array of direct PDF URLs. It works by prepending filetype:pdf to your query before passing it to the underlying DDGS search, so you do not need to include the operator yourself. This is useful for collecting PDF datasets — research reports, whitepapers, textbook chapters, technical manuals, and similar documents.
The query parameter for this endpoint is named limits (with a trailing s), not limit. Using the wrong parameter name will cause the request to fail.

Request

GET /searchpdfs/?query={query}&limits={limits}

Parameters

query
string
required
The search query. Do not include filetype:pdf — this is added automatically. URL-encode spaces as + or %20.
limits
integer
required
Maximum number of PDF URLs to return. Note the parameter name is limits, not limit.

Response

Returns a JSON array of URL strings pointing to PDF files on success.
[*]
string
A direct URL to a PDF document. URLs typically end with .pdf but may include query strings or redirects.

Example

curl "http://localhost:8000/searchpdfs/?query=machine+learning+survey&limits=5"
Response
[
  "https://arxiv.org/pdf/1811.12560.pdf",
  "https://jmlr.org/papers/volume17/15-538/15-538.pdf",
  "https://www.cs.toronto.edu/~rsalakhu/papers/science_rbm.pdf",
  "https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf",
  "https://www.deeplearningbook.org/contents/intro.html"
]

Error response

500
{ "error": "PDF search failed" }

Use case

Combine this endpoint with a document parser to build a corpus of PDF text for LLM pre-training or fine-tuning. For example, collect 100 PDFs on a topic, download them, extract text with pdfminer or pypdf, and write the results to JSONL. For other filetypes (DOCX, PPTX, XLSX, and more), use the /search/specific/ endpoint instead.

Build docs developers (and LLMs) love