Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/reductoai/reducto-python-sdk/llms.txt

Use this file to discover all available pages before exploring further.

The Parse API converts documents into structured content, extracting text, tables, images, and layouts with high accuracy.

Basic Usage

from reducto import Reducto

client = Reducto()

response = client.parse.run(
    input="https://example.com/document.pdf"
)
print(response)

Method Signature

client.parse.run(
    input: str,
    enhance: Enhance | None = None,
    formatting: Formatting | None = None,
    retrieval: Retrieval | None = None,
    settings: Settings | None = None,
    spreadsheet: Spreadsheet | None = None,
    async_: ConfigV3AsyncConfig | None = None
) -> ParseRunResponse

Parameters

input
string
required
The URL of the document to parse. You can provide:
  • A publicly available URL
  • A presigned S3 URL
  • A reducto:// prefixed URL from the /upload endpoint
  • A jobid:// prefixed URL from a previous parse invocation
  • A list of URLs (for multi-document pipelines, V3 API only)
enhance
Enhance
Enhancement options for improving extraction accuracy.
formatting
Formatting
Control output formatting and structure.
retrieval
Retrieval
Configure chunking for retrieval-optimized output.
settings
Settings
Processing settings and preferences.
spreadsheet
Spreadsheet
Spreadsheet-specific parsing options.
async_
ConfigV3AsyncConfig
Configuration for asynchronous processing. When provided, the request returns immediately with a job ID.

Advanced Example

from reducto import Reducto

client = Reducto()

response = client.parse.run(
    input="https://example.com/document.pdf",
    enhance={
        "summarize_figures": True,
        "agentic": ["table", "figure"]
    },
    formatting={
        "add_page_markers": True,
        "table_output_format": "json",
        "merge_tables": False
    }
)

# Access the parsed content
print(response.content)

Async Job Processing

For long-running documents, use run_job() to process asynchronously:
from reducto import Reducto

client = Reducto()

# Start an async job
job = client.parse.run_job(
    input="https://example.com/large-document.pdf",
    async_={"webhook": {"url": "https://example.com/webhook"}}
)

print(f"Job ID: {job.job_id}")

# Poll for results
result = client.job.get(job.job_id)

Input Formats

The Parse API supports multiple input methods:

Direct URL

response = client.parse.run(
    input="https://example.com/document.pdf"
)

File Upload

from pathlib import Path

# First upload the file
upload_response = client.upload(
    file=Path("/path/to/document.pdf")
)

# Then parse using the reducto:// URL
response = client.parse.run(
    input=upload_response.url
)

Reuse Previous Parse

# Use output from a previous parse job
response = client.parse.run(
    input=f"jobid://{previous_job_id}"
)

Build docs developers (and LLMs) love