Prerequisites
Before installing PageIndex, ensure you have:- Python 3.8+ installed on your system
- pip or pip3 package manager
- An OpenAI API key with access to GPT-4o models
PageIndex is designed for self-hosting and local deployment. For cloud-based solutions, see the Chat Platform or API.
Install from Source
Install Dependencies
Install the required Python packages:This will install the following dependencies:
requirements.txt
Configure Environment Variables
Create a Add your OpenAI API key:
.env file in the root directory:.env
Package Dependencies
PageIndex relies on several key Python packages:OpenAI
Version: 1.101.0Official OpenAI Python client for GPT-4o API access. Used for LLM-powered reasoning and tree generation.
PyMuPDF
Version: 1.26.4High-performance PDF parsing library. Extracts text content and page structure from PDF documents.
PyPDF2
Version: 3.0.1Additional PDF utilities for metadata extraction and document manipulation.
python-dotenv
Version: 1.1.0Loads environment variables from
.env files for secure API key management.tiktoken
Version: 0.11.0OpenAI’s token counting library. Ensures nodes stay within token limits for optimal LLM processing.
PyYAML
Version: 6.0.2Configuration file parser for loading user settings and default parameters.
Python Module Structure
After installation, you can import PageIndex in your Python code:run_pageindex.py, which provides:
- PDF Processing:
page_index_main()- Generate tree from PDF - Markdown Processing:
md_to_tree()- Generate tree from markdown - Configuration:
config()- Customize tree generation parameters
Alternative Installation Methods
Virtual Environment (Recommended)
For isolated package management, use a virtual environment:Docker (Coming Soon)
Dockerized deployment will be available in a future release. For now, use the source installation method.Troubleshooting
ImportError: No module named 'pageindex'
ImportError: No module named 'pageindex'
Make sure you’re running commands from the PageIndex root directory and that all dependencies are installed:
OpenAI API Error: Authentication Failed
OpenAI API Error: Authentication Failed
Verify your API key is correctly set in the Ensure there are no extra spaces or quotes around the key.
.env file:PDF Parsing Errors
PDF Parsing Errors
Some complex PDFs may have parsing issues. Try:
- Ensure the PDF is not password-protected
- Check that the PDF contains extractable text (not just images)
- For scanned documents, consider using PageIndex OCR
Token Limit Exceeded
Token Limit Exceeded
If nodes exceed token limits, adjust the parameters:
Next Steps
Quick Start Guide
Generate your first PageIndex tree structure
API Reference
Explore configuration options and Python API