Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/reductoai/reducto-python-sdk/llms.txt

Use this file to discover all available pages before exploring further.

split.run()

Splits a document into categorized sections synchronously.
client.split.run(
    input="https://example.com/document.pdf",
    split_description=[
        {"category": "summary", "description": "Executive summary sections"},
        {"category": "financials", "description": "Financial data and tables"}
    ],
    parsing={...},
    settings={...},
    split_rules="Additional rules for splitting"
)

Parameters

input
string | list[string]
required
The URL of the document to be processed. You can provide one of the following:
  1. A publicly available URL
  2. A presigned S3 URL
  3. A reducto:// prefixed URL obtained from the /upload endpoint after directly uploading a document
  4. A jobid:// prefixed URL obtained from a previous /parse invocation
  5. A list of URLs (for multi-document pipelines, V3 API only)
split_description
Iterable[SplitCategory]
required
The configuration options for processing the document. Define the categories and their descriptions for splitting.
parsing
ParseOptions
The configuration options for parsing the document. If you are passing in a jobid:// URL for the file, then this configuration will be ignored.
settings
object
The settings for split processing.
split_rules
string
The prompt that describes rules for splitting the document.

Response

SplitResponse
object
Returns the document split into categorized sections.
result
object
The categorized sections of the document.

split.run_job()

Splits a document into categorized sections asynchronously and returns a job ID immediately.
response = client.split.run_job(
    input="https://example.com/document.pdf",
    split_description=[
        {"category": "summary", "description": "Executive summary sections"},
        {"category": "financials", "description": "Financial data and tables"}
    ],
    async_={"webhook": {"url": "https://example.com/webhook"}},
    parsing={...},
    settings={...},
    split_rules="Additional rules for splitting"
)

print(response.job_id)  # Use this to check job status later

Parameters

input
string | list[string]
required
The URL of the document to be processed. You can provide one of the following:
  1. A publicly available URL
  2. A presigned S3 URL
  3. A reducto:// prefixed URL obtained from the /upload endpoint after directly uploading a document
  4. A jobid:// prefixed URL obtained from a previous /parse invocation
  5. A list of URLs (for multi-document pipelines, V3 API only)
split_description
Iterable[SplitCategory]
required
The configuration options for processing the document. Define the categories and their descriptions for splitting.
async_
ConfigV3AsyncConfig
The configuration options for asynchronous processing (default synchronous).
parsing
ParseOptions
The configuration options for parsing the document. If you are passing in a jobid:// URL for the file, then this configuration will be ignored.
settings
object
The settings for split processing.
split_rules
string
The prompt that describes rules for splitting the document.

Response

SplitRunJobResponse
object
job_id
string
The ID of the asynchronous job. Use client.job.get(job_id) to retrieve the result when the job completes.

Build docs developers (and LLMs) love