Class Overview
pageindex/utils.py:681
Description
TheConfigLoader class manages configuration for PageIndex, loading defaults from config.yaml and merging them with user-provided options. It provides a centralized way to handle all PageIndex settings with validation and type checking.
Constructor
__init__(default_path=None)
Path to the YAML configuration file. If
None, defaults to pageindex/config.yaml in the package directory.Methods
load(user_opt=None)
Loads configuration by merging user options with default values.
User-provided configuration options. Can be:
None: Use all defaultsdict: Dictionary of config keys/valuesconfig(SimpleNamespace): Existing config object
ValueError if unknown keys are provided.
Raises TypeError if invalid type is passed.Configuration object with all settings as attributes. Contains:
model: OpenAI model nametoc_check_page_num: Pages to check for TOCmax_page_num_each_node: Max pages per nodemax_token_num_each_node: Max tokens per nodeif_add_node_id: Add node IDs (“yes”/“no”)if_add_node_summary: Generate summaries (“yes”/“no”)if_add_doc_description: Generate doc description (“yes”/“no”)if_add_node_text: Include text (“yes”/“no”)
Default Configuration
The defaultconfig.yaml contains:
pageindex/config.yaml:1-8
Configuration Parameters
OpenAI model for processing. Supported models:
"gpt-4o-2024-11-20"(recommended)"gpt-4o""gpt-4.1"- Other OpenAI chat models
Number of pages to scan for table of contents. Increase for documents with TOC appearing later.
Maximum pages per node. Larger nodes are recursively subdivided.
Maximum token count per node. Used with
max_page_num_each_node to trigger subdivision.Add sequential node IDs (“0001”, “0002”, etc.). Values:
"yes" or "no"Generate AI summaries for each node. Values:
"yes" or "no"Generate one-sentence document description. Values:
"yes" or "no"Only works if if_add_node_summary="yes"Include full text content in each node. Values:
"yes" or "no"Including text increases memory usage and output file size significantly.
Example Usage
Basic Usage - All Defaults
Partial Override
Custom Config File
Validation
Accessing Configuration
Creating Config from Scratch
Creating Custom config.yaml
You can create your own configuration file:Performance Recommendations
Fast Processing (Structure Only)
Balanced (With Summaries)
Full Features (Slowest)
Error Handling
See Also
- page_index() - Uses ConfigLoader internally
- page_index_main() - Accepts config object
- CLI Reference - Command-line configuration options