Extract

extract.run()

Extracts structured data from a document synchronously based on provided instructions.

client.extract.run(
    input="https://example.com/document.pdf",
    instructions={"schema": {...}, "prompt": "Extract all contact information"},
    parsing={...},
    settings={...}
)

Parameters

input

string | list[string]

required

The URL of the document to be processed. You can provide one of the following:

A publicly available URL
A presigned S3 URL
A reducto:// prefixed URL obtained from the /upload endpoint after directly uploading a document
A jobid:// prefixed URL obtained from a previous /parse invocation
A list of URLs (for multi-document pipelines, V3 API only)

instructions

object

The instructions to use for the extraction. Define the schema and extraction prompts.

parsing

ParseOptions

The configuration options for parsing the document. If you are passing in a jobid:// URL for the file, then this configuration will be ignored.

settings

object

The settings to use for the extraction.

async_

ConfigV3AsyncConfig

The configuration options for asynchronous processing (default synchronous). Only available when using async mode.

Response

ExtractRunResponse

ExtractResponse | AsyncExtractResponse

Returns either an ExtractResponse with the extracted data (sync mode) or an AsyncExtractResponse containing a job_id (async mode).

Show ExtractResponse

result

object

The extracted data in the schema format you specified.

Show AsyncExtractResponse

job_id

string

The ID of the asynchronous job. Use this to retrieve the result later with job.get().

extract.run_job()

Extracts structured data from a document asynchronously and returns a job ID immediately.

response = client.extract.run_job(
    input="https://example.com/document.pdf",
    instructions={"schema": {...}, "prompt": "Extract all contact information"},
    async_={"webhook": {"url": "https://example.com/webhook"}},
    parsing={...},
    settings={...}
)

print(response.job_id)  # Use this to check job status later

Parameters

input

string | list[string]

required

The URL of the document to be processed. You can provide one of the following:

A publicly available URL
A presigned S3 URL
A reducto:// prefixed URL obtained from the /upload endpoint after directly uploading a document
A jobid:// prefixed URL obtained from a previous /parse invocation
A list of URLs (for multi-document pipelines, V3 API only)

instructions

object

The instructions to use for the extraction. Define the schema and extraction prompts.

async_

ConfigV3AsyncConfig

The configuration options for asynchronous processing (default synchronous).

parsing

ParseOptions

The configuration options for parsing the document. If you are passing in a jobid:// URL for the file, then this configuration will be ignored.

settings

object

The settings to use for the extraction.

Response

ExtractRunJobResponse

object

job_id

string

The ID of the asynchronous job. Use client.job.get(job_id) to retrieve the result when the job completes.

Client

Resources

Types

Exceptions

extract.run()

Parameters

Response

extract.run_job()

Parameters

Response

Build docs developers (and LLMs) love

Client

Resources

Types

Exceptions

Documentation Index

​extract.run()

​Parameters

​Response

​extract.run_job()

​Parameters

​Response

Build docs developers (and LLMs) love

extract.run()

Parameters

Response

extract.run_job()

Parameters

Response