Documentation Index
Fetch the complete documentation index at: https://mintlify.com/ragaeeb/shamela/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Parses Shamela HTML content into structured lines while preserving headings. This is the primary function for processing raw Shamela page content into a format that preserves title hierarchy and Arabic punctuation.Signature
Parameters
The raw HTML markup representing a page
Returns
An array of Line objects containing text and optional IDs
Behavior
- Normalizes line endings to Unix-style (
\n) before processing - Fast path optimization when no
<span>tags are present - Preserves title hierarchy from
<span data-type="title" id="...">elements - Merges punctuation-only lines into preceding titles
- Handles nested spans and maintains title context across line breaks
- Filters out empty lines from the result
Example
Processing Pipeline
- Normalize line endings - Convert all line endings to
\n - Fast path check - Skip tokenization if no spans present
- Tokenize HTML - Break HTML into structural tokens
- Process tokens - Extract text and title metadata
- Merge punctuation - Combine dangling punctuation with titles
- Filter empties - Remove empty lines
Related Functions
removeTagsExceptSpan()- Strip all tags except spans before parsingnormalizeLineEndings()- Normalize line endingsconvertContentToMarkdown()- Full pipeline including this function