Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/ragaeeb/shamela/llms.txt

Use this file to discover all available pages before exploring further.

Overview

Sanitizes page content by applying regex-based replacement rules tuned for Shamela sources. This function normalizes Arabic text and removes common artifacts from the Shamela export format.

Signature

mapPageCharacterContent(
  text: string,
  rules?: Record<string, string>
): string

Parameters

text
string
required
The text to clean
rules
Record<string, string>
Optional custom replacement rules as regex pattern/replacement pairs. Defaults to DEFAULT_MAPPING_RULES which includes:
  • Footnote marker removal
  • Arabic character normalization
  • Whitespace cleanup
  • Diacritical mark processing

Returns

string
string
The sanitized content with all rules applied

Example

import { mapPageCharacterContent } from 'shamela';

const rawContent = 'النص مع الحواشي[١] والرموز';
const cleaned = mapPageCharacterContent(rawContent);
console.log(cleaned); // Normalized text

Custom Rules

You can extend the default rules with custom mappings:
import { mapPageCharacterContent } from 'shamela/content';
import { DEFAULT_MAPPING_RULES } from 'shamela/constants';

const customRules = {
  ...DEFAULT_MAPPING_RULES,
  'pattern1': 'replacement1',
  'pattern2': 'replacement2',
};

const processed = mapPageCharacterContent(rawContent, customRules);

Performance

  • Rules are compiled into RegExp objects and cached for reuse
  • Default rules are pre-compiled for optimal performance
  • Custom rules are compiled on first use

Build docs developers (and LLMs) love