Output

OutputResponse

The OutputResponse object contains the processed results of a document analysis task.

chunks

Chunk[]

required

Collection of document chunks, where each chunk contains one or more segments. See Chunk below.

file_name

string

The name of the file.

page_count

integer

The number of pages in the file.

pdf_url

string

The presigned URL of the PDF file.

extracted_json

object

deprecated

DEPRECATED: The extracted JSON from the document.

Chunk

A Chunk represents a logical grouping of segments from the document. Chunks are created based on the target_length configuration.

chunk_id

string

required

The unique identifier for the chunk.

chunk_length

integer

required

The total number of tokens in the chunk. Calculated by the configured tokenizer.

segments

Segment[]

required

Collection of document segments that form this chunk.When target_chunk_length > 0, contains the maximum number of segments that fit within that length (segments remain intact). Otherwise, contains exactly one segment.See Segment below.

embed

string

Suggested text to be embedded for the chunk. This text is generated by combining the embed content from each segment according to the configured embed sources (HTML, Markdown, LLM, or Content).Can be configured using embed_sources in the SegmentProcessing configuration.

Segment

A Segment represents a logical element within a document page (e.g., title, paragraph, table, image).

segment_id

string

required

Unique identifier for the segment.

segment_type

SegmentType

required

The type of the segment. See Segment Types for all possible values.

bbox

BoundingBox

required

Bounding box coordinates for the segment.

Show BoundingBox properties

left

number

required

The left coordinate of the bounding box.

top

number

required

The top coordinate of the bounding box.

width

number

required

The width of the bounding box.

height

number

required

The height of the bounding box.

page_number

integer

required

Page number of the segment (1-indexed).

page_width

number

required

Width of the page containing the segment.

page_height

number

required

Height of the page containing the segment.

content

string

required

Content of the segment, will be either HTML or Markdown, depending on the format chosen in segment processing configuration.

html

string

required

HTML representation of the segment.

markdown

string

required

Markdown representation of the segment.

text

string

required

Text content of the segment. Calculated from the OCR results.

llm

string

LLM-generated representation of the segment. Only present if LLM processing is configured for this segment type.

image

string

Presigned URL to the cropped image of the segment. Only present if cropping is enabled for this segment type.

confidence

number

Confidence score of the layout analysis model for this segment (0.0 to 1.0).

ocr

OCRResult[]

OCR results for the segment.

Show OCRResult properties

text

string

required

The recognized text of the OCR result.

bbox

BoundingBox

required

Bounding box for this OCR result.

confidence

number

The confidence score of the recognized text (0.0 to 1.0).

Example Response

{
  "chunks": [
    {
      "chunk_id": "550e8400-e29b-41d4-a716-446655440000",
      "chunk_length": 256,
      "segments": [
        {
          "segment_id": "660e8400-e29b-41d4-a716-446655440001",
          "segment_type": "Title",
          "bbox": {
            "left": 72.0,
            "top": 100.0,
            "width": 450.0,
            "height": 36.0
          },
          "page_number": 1,
          "page_width": 612.0,
          "page_height": 792.0,
          "content": "# Document Title",
          "html": "<h1>Document Title</h1>",
          "markdown": "# Document Title",
          "text": "Document Title",
          "confidence": 0.95
        }
      ],
      "embed": "# Document Title"
    }
  ],
  "file_name": "example.pdf",
  "page_count": 10,
  "pdf_url": "https://s3.amazonaws.com/..."
}

Overview

Tasks

Models

OutputResponse

Chunk

Segment

Example Response

Build docs developers (and LLMs) love

Overview

Tasks

Models

Documentation Index

​OutputResponse

​Chunk

​Segment

​Example Response

Build docs developers (and LLMs) love

OutputResponse

Chunk

Segment

Example Response