Documentation Index
Fetch the complete documentation index at: https://mintlify.com/PHPOffice/PhpSpreadsheet/llms.txt
Use this file to discover all available pages before exploring further.
Overview
The Html reader class loads HTML files containing tables and converts them into spreadsheet format. This is useful for importing data from HTML reports, web pages, or HTML-formatted data exports.
Namespace: PhpOffice\PhpSpreadsheet\Reader\Html
Extends: BaseReader
Implements: IReader
Source: src/PhpSpreadsheet/Reader/Html.php:32
Basic Usage
Simple File Loading
use PhpOffice\PhpSpreadsheet\Reader\Html;
$reader = new Html();
$spreadsheet = $reader->load('data.html');
// Access worksheet data
$sheet = $spreadsheet->getActiveSheet();
$data = $sheet->toArray();
Using IOFactory
use PhpOffice\PhpSpreadsheet\IOFactory;
// Auto-detect and load
$spreadsheet = IOFactory::load('data.html');
// Or create specific reader
$reader = IOFactory::createReader('Html');
$spreadsheet = $reader->load('data.html');
Key Methods
__construct()
Creates a new Html reader instance.
public function __construct();
Example:
canRead()
Checks if the file can be read by this reader.
public function canRead(string $filename): bool;
Path to the file to check
Returns: bool - True if the file appears to be HTML
Example:
$reader = new Html();
if ($reader->canRead('data.html')) {
$spreadsheet = $reader->load('data.html');
}
Loads a spreadsheet from an HTML file.
public function load(string $filename, int $flags = 0): Spreadsheet;
Path to the HTML file to load
Optional flags (limited support for HTML format)
Returns: Spreadsheet object
Example:
$reader = new Html();
$spreadsheet = $reader->load('data.html');
HTML-Specific Configuration
Sets the input character encoding for the HTML file.
public function setInputEncoding(string $encoding): self;
Character encoding (e.g., ‘UTF-8’, ‘ANSI’, ‘ISO-8859-1’)
Example:
$reader = new Html();
$reader->setInputEncoding('UTF-8');
$spreadsheet = $reader->load('data.html');
setSheetIndex()
Sets which worksheet index to use when loading (for multiple tables).
public function setSheetIndex(int $sheetIndex): self;
The 0-based worksheet index
Example:
$reader = new Html();
$reader->setSheetIndex(0);
$spreadsheet = $reader->load('data.html');
setSuppressLoadWarnings()
Controls whether to suppress libxml load warnings.
public function setSuppressLoadWarnings(?bool $suppressLoadWarnings): self;
True to suppress warnings, false to show them, null for default behavior
Example:
$reader = new Html();
$reader->setSuppressLoadWarnings(true);
$spreadsheet = $reader->load('data.html');
// Check for any warnings
$warnings = $reader->getLibxmlMessages();
foreach ($warnings as $warning) {
echo $warning->message;
}
Supported HTML Features
The Html reader recognizes and converts the following HTML elements:
Table Structure
<table> - Converted to worksheet
<tr> - Converted to row
<td> - Converted to cell
<th> - Converted to cell (typically bold)
<thead>, <tbody>, <tfoot> - Structural elements
Text Formatting
<b>, <strong> - Bold text
<i>, <em> - Italic text
<u> - Underlined text
<s>, <strike> - Strikethrough text
<sup> - Superscript
<sub> - Subscript
<h1> to <h6> - Headers with different font sizes
<a> - Hyperlinks (blue, underlined)
<hr> - Horizontal rule (bottom border)
Table Attributes
colspan - Cell spanning multiple columns
rowspan - Cell spanning multiple rows
width - Column width
height - Row height
Style Attributes
The reader parses inline CSS styles:
font-family - Font name
font-size - Font size
font-weight - Bold text
font-style - Italic text
text-decoration - Underline, strikethrough
color - Text color
background-color - Cell background color
border - Cell borders
text-align - Horizontal alignment
vertical-align - Vertical alignment
width - Column width
height - Row height
Simple HTML Table
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<title>Sales Report</title>
</head>
<body>
<table>
<thead>
<tr>
<th>Product</th>
<th>Quantity</th>
<th>Price</th>
</tr>
</thead>
<tbody>
<tr>
<td>Widget</td>
<td>100</td>
<td>$10.00</td>
</tr>
<tr>
<td>Gadget</td>
<td>50</td>
<td>$20.00</td>
</tr>
</tbody>
</table>
</body>
</html>
$reader = new Html();
$spreadsheet = $reader->load('report.html');
HTML with Inline Styles
<table style="border: 1px solid black;">
<tr>
<td style="font-weight: bold; background-color: #cccccc;">Header</td>
<td style="color: red;">Value</td>
</tr>
<tr>
<td style="text-align: center;">Center</td>
<td style="font-style: italic;">Italic</td>
</tr>
</table>
$reader = new Html();
$spreadsheet = $reader->load('styled.html');
HTML with Colspan and Rowspan
<table>
<tr>
<td colspan="2">Merged across 2 columns</td>
</tr>
<tr>
<td rowspan="2">Merged across 2 rows</td>
<td>Cell 1</td>
</tr>
<tr>
<td>Cell 2</td>
</tr>
</table>
$reader = new Html();
$spreadsheet = $reader->load('merged.html');
// Colspan and rowspan are converted to merged cells
Multiple Tables
If an HTML file contains multiple <table> elements, each table is loaded as a separate worksheet:
$reader = new Html();
$spreadsheet = $reader->load('multi-table.html');
// Access different tables
$sheet1 = $spreadsheet->getSheet(0); // First table
$sheet2 = $spreadsheet->getSheet(1); // Second table
$sheet3 = $spreadsheet->getSheet(2); // Third table
echo "Loaded {$spreadsheet->getSheetCount()} tables\n";
Handling Encoding
UTF-8 HTML
$reader = new Html();
$reader->setInputEncoding('UTF-8');
$spreadsheet = $reader->load('utf8.html');
Other Encodings
// ISO-8859-1 (Latin-1)
$reader = new Html();
$reader->setInputEncoding('ISO-8859-1');
$spreadsheet = $reader->load('latin1.html');
// Windows-1252
$reader->setInputEncoding('CP1252');
$spreadsheet = $reader->load('windows.html');
Working with Images
The Html reader can load images from HTML:
$reader = new Html();
// Allow external images (use with caution)
$reader->setAllowExternalImages(true);
$spreadsheet = $reader->load('report.html');
Be cautious when enabling external images as this can expose your application to security risks.
Error Handling
use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException;
use PhpOffice\PhpSpreadsheet\Reader\Html;
$reader = new Html();
$reader->setSuppressLoadWarnings(true);
try {
if (!$reader->canRead('data.html')) {
throw new Exception('File is not valid HTML');
}
$spreadsheet = $reader->load('data.html');
// Check for warnings
$warnings = $reader->getLibxmlMessages();
if (!empty($warnings)) {
echo "Warnings during load:\n";
foreach ($warnings as $warning) {
echo "- {$warning->message}\n";
}
}
} catch (ReaderException $e) {
echo 'Error loading HTML file: ' . $e->getMessage();
} catch (\Exception $e) {
echo 'General error: ' . $e->getMessage();
}
Security Considerations
XML External Entity (XXE) Protection
The Html reader uses the XmlScanner security scanner to protect against XXE attacks.
External Resources
Be careful with external images and stylesheets:
$reader = new Html();
// Better: use a whitelist
$reader->setIsWhitelisted(function(string $path): bool {
return str_starts_with($path, 'https://trusted-domain.com/');
});
$reader->setAllowExternalImages(true);
$spreadsheet = $reader->load('report.html');
Complete Example
use PhpOffice\PhpSpreadsheet\Reader\Html;
use PhpOffice\PhpSpreadsheet\Reader\Exception as ReaderException;
// Create and configure reader
$reader = new Html();
$reader->setInputEncoding('UTF-8');
$reader->setSuppressLoadWarnings(true);
try {
// Verify file
if (!$reader->canRead('report.html')) {
throw new Exception('Invalid HTML file');
}
// Load file
$spreadsheet = $reader->load('report.html');
echo "Loaded {$spreadsheet->getSheetCount()} table(s)\n";
// Process each table
foreach ($spreadsheet->getAllSheets() as $index => $sheet) {
echo "\nTable " . ($index + 1) . ":\n";
$highestRow = $sheet->getHighestRow();
$highestColumn = $sheet->getHighestColumn();
echo "Rows: {$highestRow}, Columns: {$highestColumn}\n";
// Process data
$data = $sheet->toArray();
foreach ($data as $row) {
// Process row
print_r($row);
}
}
// Check for warnings
$warnings = $reader->getLibxmlMessages();
if (!empty($warnings)) {
echo "\n" . count($warnings) . " warning(s) encountered\n";
}
} catch (ReaderException $e) {
echo 'Reader error: ' . $e->getMessage();
}
Limitations
- Only processes
<table> elements; other HTML content is ignored
- CSS stylesheets are not fully supported (only inline styles)
- Complex HTML structures may not parse correctly
- JavaScript-generated content is not processed
- Some advanced CSS properties are not supported
- No support for formulas (everything is read as values)
- No support for charts
Tips for Best Results
- Use well-formed HTML - Valid HTML5 markup produces best results
- Use inline styles - External CSS stylesheets are not processed
- Specify encoding - Always set the correct character encoding
- Use simple table structures - Complex nested tables may not parse correctly
- Include charset meta tag - Add
<meta charset="UTF-8"> to HTML
- Test with sample data - Test the reader with a small sample first