Documents
PDF Files
.pdf - Portable Document Format files with text extractionWord Documents
.docx - Microsoft Word documents (Office Open XML)Text Files
.txt - Plain text filesRich Text
.rtf - Rich Text Format documentsPresentations & Spreadsheets
PowerPoint
.pptx - PowerPoint presentationsExcel
.xlsx - Excel spreadsheet filesCSV
.csv - Comma-separated valuesWeb & Data Formats
HTML
.html - HTML web pagesJSON
.json - JSON data filesXML
.xml - XML documentsImages & Media
Images
PNG, JPG formats with OCR text extraction
Audio
.wav, .mp3 - Audio files for transcriptionFile Size Limitations
src/App.tsx
Format Categories
Formats are organized into four main categories in the UI:src/components/SupportedFormats/SupportedFormats.tsx
Backend Requirements
MkDowner relies on Microsoft MarkItDown for conversion:The backend must have MarkItDown installed and properly configured. The conversion engine handles format detection automatically.
API Endpoint
The backend exposes a single upload endpoint:Response Format
- Single file: Returns
.mdfile directly - Multiple files: Returns
.ziparchive containing all converted Markdown files
Format-Specific Notes
PDF Files
PDF Files
Text is extracted from PDFs. Image-based PDFs may require OCR capabilities on the backend for best results.
Office Documents
Office Documents
DOCX, PPTX, and XLSX files are parsed for structure and content. Formatting is converted to Markdown equivalents where possible.
Images
Images
PNG and JPG images can have text extracted via OCR. This feature requires additional backend configuration.
Audio Files
Audio Files
WAV and MP3 files can be transcribed to text. Transcription capabilities depend on backend setup.
Web Formats
Web Formats
HTML is converted to clean Markdown. JSON and XML are formatted as structured text.
AI-Enhanced Conversion
This enables:- Intelligent table detection and conversion
- Header hierarchy preservation
- List structure recognition
- Code block formatting
- Link and reference handling