Document Parsing

The Parse API converts documents into structured content, extracting text, tables, images, and layouts with high accuracy.

Basic Usage

Sync
Async

from reducto import Reducto

client = Reducto()

response = client.parse.run(
    input="https://example.com/document.pdf"
)
print(response)

import asyncio
from reducto import AsyncReducto

client = AsyncReducto()

async def main():
    response = await client.parse.run(
        input="https://example.com/document.pdf"
    )
    print(response)

asyncio.run(main())

Method Signature

client.parse.run(
    input: str,
    enhance: Enhance | None = None,
    formatting: Formatting | None = None,
    retrieval: Retrieval | None = None,
    settings: Settings | None = None,
    spreadsheet: Spreadsheet | None = None,
    async_: ConfigV3AsyncConfig | None = None
) -> ParseRunResponse

Parameters

input

string

required

The URL of the document to parse. You can provide:

A publicly available URL
A presigned S3 URL
A reducto:// prefixed URL from the /upload endpoint
A jobid:// prefixed URL from a previous parse invocation
A list of URLs (for multi-document pipelines, V3 API only)

enhance

Enhance

Enhancement options for improving extraction accuracy.

Show properties

agentic

array

Uses vision language models to enhance accuracy for tables, figures, or text. Increases cost and latency.

summarize_figures

boolean

default:"true"

If true, summarize figures using a small vision language model.

formatting

Formatting

Control output formatting and structure.

Show properties

add_page_markers

boolean

default:"false"

Add page markers to the output. Useful for extracting data with page-specific information.

table_output_format

enum

default:"dynamic"

Format for table output: html, json, md, jsonbbox, dynamic, or csv. Dynamic returns markdown for simple tables and HTML for complex tables.

merge_tables

boolean

default:"false"

Merge consecutive tables with the same number of columns.

include

array

List of formatting elements to include: change_tracking, highlight, comments, hyperlinks, signatures.

retrieval

Retrieval

Configure chunking for retrieval-optimized output.

settings

Settings

Processing settings and preferences.

spreadsheet

Spreadsheet

Spreadsheet-specific parsing options.

async_

ConfigV3AsyncConfig

Configuration for asynchronous processing. When provided, the request returns immediately with a job ID.

Advanced Example

Sync
Async

from reducto import Reducto

client = Reducto()

response = client.parse.run(
    input="https://example.com/document.pdf",
    enhance={
        "summarize_figures": True,
        "agentic": ["table", "figure"]
    },
    formatting={
        "add_page_markers": True,
        "table_output_format": "json",
        "merge_tables": False
    }
)

# Access the parsed content
print(response.content)

import asyncio
from reducto import AsyncReducto

client = AsyncReducto()

async def main():
    response = await client.parse.run(
        input="https://example.com/document.pdf",
        enhance={
            "summarize_figures": True,
            "agentic": ["table", "figure"]
        },
        formatting={
            "add_page_markers": True,
            "table_output_format": "json",
            "merge_tables": False
        }
    )

    # Access the parsed content
    print(response.content)

asyncio.run(main())

Async Job Processing

For long-running documents, use run_job() to process asynchronously:

from reducto import Reducto

client = Reducto()

# Start an async job
job = client.parse.run_job(
    input="https://example.com/large-document.pdf",
    async_={"webhook": {"url": "https://example.com/webhook"}}
)

print(f"Job ID: {job.job_id}")

# Poll for results
result = client.job.get(job.job_id)

Input Formats

The Parse API supports multiple input methods:

Direct URL

response = client.parse.run(
    input="https://example.com/document.pdf"
)

File Upload

from pathlib import Path

# First upload the file
upload_response = client.upload(
    file=Path("/path/to/document.pdf")
)

# Then parse using the reducto:// URL
response = client.parse.run(
    input=upload_response.url
)

Reuse Previous Parse

# Use output from a previous parse job
response = client.parse.run(
    input=f"jobid://{previous_job_id}"
)

Get Started

Core Concepts

Main Features

Advanced

Guides

Basic Usage

Method Signature

Parameters

Advanced Example

Async Job Processing

Input Formats

Direct URL

File Upload

Reuse Previous Parse

Build docs developers (and LLMs) love

Get Started

Core Concepts

Main Features

Advanced

Guides

Documentation Index

​Basic Usage

​Method Signature

​Parameters

​Advanced Example

​Async Job Processing

​Input Formats

​Direct URL

​File Upload

​Reuse Previous Parse

Build docs developers (and LLMs) love

Basic Usage

Method Signature

Parameters

Advanced Example

Async Job Processing

Input Formats

Direct URL

File Upload

Reuse Previous Parse