Documentation Index Fetch the complete documentation index at: https://mintlify.com/reductoai/reducto-python-sdk/llms.txt
Use this file to discover all available pages before exploring further.
The Parse API converts documents into structured content, extracting text, tables, images, and layouts with high accuracy.
Basic Usage
from reducto import Reducto
client = Reducto()
response = client.parse.run(
input = "https://example.com/document.pdf"
)
print (response)
import asyncio
from reducto import AsyncReducto
client = AsyncReducto()
async def main ():
response = await client.parse.run(
input = "https://example.com/document.pdf"
)
print (response)
asyncio.run(main())
Method Signature
client.parse.run(
input : str ,
enhance: Enhance | None = None ,
formatting: Formatting | None = None ,
retrieval: Retrieval | None = None ,
settings: Settings | None = None ,
spreadsheet: Spreadsheet | None = None ,
async_: ConfigV3AsyncConfig | None = None
) -> ParseRunResponse
Parameters
The URL of the document to parse. You can provide:
A publicly available URL
A presigned S3 URL
A reducto:// prefixed URL from the /upload endpoint
A jobid:// prefixed URL from a previous parse invocation
A list of URLs (for multi-document pipelines, V3 API only)
Enhancement options for improving extraction accuracy. Uses vision language models to enhance accuracy for tables, figures, or text. Increases cost and latency.
If true, summarize figures using a small vision language model.
Control output formatting and structure. Add page markers to the output. Useful for extracting data with page-specific information.
Format for table output: html, json, md, jsonbbox, dynamic, or csv.
Dynamic returns markdown for simple tables and HTML for complex tables.
Merge consecutive tables with the same number of columns.
List of formatting elements to include: change_tracking, highlight, comments, hyperlinks, signatures.
Configure chunking for retrieval-optimized output.
Processing settings and preferences.
Spreadsheet-specific parsing options.
Configuration for asynchronous processing. When provided, the request returns immediately with a job ID.
Advanced Example
from reducto import Reducto
client = Reducto()
response = client.parse.run(
input = "https://example.com/document.pdf" ,
enhance = {
"summarize_figures" : True ,
"agentic" : [ "table" , "figure" ]
},
formatting = {
"add_page_markers" : True ,
"table_output_format" : "json" ,
"merge_tables" : False
}
)
# Access the parsed content
print (response.content)
import asyncio
from reducto import AsyncReducto
client = AsyncReducto()
async def main ():
response = await client.parse.run(
input = "https://example.com/document.pdf" ,
enhance = {
"summarize_figures" : True ,
"agentic" : [ "table" , "figure" ]
},
formatting = {
"add_page_markers" : True ,
"table_output_format" : "json" ,
"merge_tables" : False
}
)
# Access the parsed content
print (response.content)
asyncio.run(main())
Async Job Processing
For long-running documents, use run_job() to process asynchronously:
from reducto import Reducto
client = Reducto()
# Start an async job
job = client.parse.run_job(
input = "https://example.com/large-document.pdf" ,
async_ = { "webhook" : { "url" : "https://example.com/webhook" }}
)
print ( f "Job ID: { job.job_id } " )
# Poll for results
result = client.job.get(job.job_id)
The Parse API supports multiple input methods:
Direct URL
response = client.parse.run(
input = "https://example.com/document.pdf"
)
File Upload
from pathlib import Path
# First upload the file
upload_response = client.upload(
file = Path( "/path/to/document.pdf" )
)
# Then parse using the reducto:// URL
response = client.parse.run(
input = upload_response.url
)
Reuse Previous Parse
# Use output from a previous parse job
response = client.parse.run(
input = f "jobid:// { previous_job_id } "
)