Documentation Index
Fetch the complete documentation index at: https://mintlify.com/lumina-ai-inc/chunkr/llms.txt
Use this file to discover all available pages before exploring further.
Segment Types
Chunkr identifies the following segment types during layout analysis:Main title or heading of the document.
Section headers and subheadings within the document.
Regular paragraph text.
Individual items in bulleted or numbered lists.
Tabular data.
Images, diagrams, and figures.
Captions for images, tables, or other elements.
Mathematical formulas and equations.
Footnotes and endnotes.
Headers that appear at the top of pages.
Footers that appear at the bottom of pages.
An entire page treated as a single segment (when using
Page segmentation strategy).Segment Processing
TheSegmentProcessing configuration allows you to control how each segment type is processed and which content representations are generated.
Configuration Structure
Each segment type can have its own processing configuration:AutoGenerationConfig
Used for most segment types (Title, SectionHeader, Text, ListItem, Caption, Footnote, PageHeader, PageFooter).Specifies the output format.
Determines how the content is generated.
Controls whether to crop the page image to the segment’s bounding box.
Custom prompt for LLM-based processing of this segment. Only used when LLM processing is enabled for the segment.
Defines which content sources will be included in the chunk’s embed field and counted towards the chunk length. The array’s order determines the sequence in which content appears.
Use the full page image as context for LLM generation.
Deprecated Fields
DEPRECATED: Use
format: Html and strategy instead.DEPRECATED: Use
format: Markdown and strategy instead.LlmGenerationConfig
Used for Formula and Page segment types. Has the same fields as AutoGenerationConfig but withstrategy defaulting to LLM.
Output format (Html or Markdown).
Generation strategy (Auto or LLM).
Image cropping strategy.
Custom LLM prompt.
Content sources for embedding.
Use full page image as context.
TableGenerationConfig
Used specifically for Table segments. Has the same fields as AutoGenerationConfig but with different defaults.Output format (Html or Markdown). Tables default to HTML for better structure preservation.
Generation strategy. Tables default to LLM for higher accuracy.
Image cropping strategy.
Custom LLM prompt.
Content sources for embedding.
Use full page image as context.
PictureGenerationConfig
Used specifically for Picture segments.Output format (Html or Markdown).
Generation strategy.When set to
Auto, generates image tags:- HTML format:
<img src="{url}" /> - Markdown format:

Controls image cropping for pictures.
Custom LLM prompt for describing or processing the image.
Content sources for embedding.
Use full page image as context.