PHP Simple HTML DOM is a HTML DOM parser written in PHP5+ . This class let you manipulate HTML in a very easy way, find tags on an HTML page with selectors just like jQuery.
-
PHP Simple HTML DOM 1.5.
•
Examples
API Reference
Helper functions:
- str_get_html ( string $content ) - Creates a DOM object from a string.
- file_get_html ( string $filename ) - Creates a DOM object from a file or a URL.
DOM methods & properties:
- __construct ( [string $filename] ) - Constructor, set the filename parameter will automatically load the contents, either text or file/url.
- plaintext - Returns the contents extracted from HTML.
- clear () - Clean up memory.
- load ( string $content ) - Load contents from a string.
- save ( [string $filename] ) - Dumps the internal DOM tree back into a string. If the $filename is set, result string will save to file.
- load_file ( string $filename ) - Load contents from a from a file or a URL.
- set_callback ( string $function_name ) - Set a callback function.
- find ( string $selector [, int $index] ) - Find elements by the CSS selector. Returns the Nth element object if index is set, otherwise return an array of object.
Element methods & properties:
- [attribute] - Read or write element's attribure value.
- tag - Read or write the tag name of element.
- outertext - Read or write the outer HTML text of element.
- innertext - Read or write the inner HTML text of element.
- plaintext - Read or write the plain text of element.
- find ( string $selector [, int $index] ) - Find children by the CSS selector. Returns the Nth element object if index is set, otherwise, return an array of object.
DOM traversing:
- $e->children ( [int $index] ) - Returns the Nth child object if index is set, otherwise return an array of children.
- $e->parent () - Returns the parent of element.
- $e->first_child () - Returns the first child of element, or null if not found.
- $e->last_child () - Returns the last child of element, or null if not found.
- $e->next_sibling () - Returns the next sibling of element, or null if not found.
- $e->prev_sibling () - Returns the previous sibling of element, or null if not found.
You can also call methods with W3C STANDARD camel naming convertions.
- $e->getAllAttributes () - $e->attr
- $e->getAttribute ( $name ) - $e->attribute
- $e->setAttribute ( $name, $value ) - $value = $e->attribute
- $e->hasAttribute ( $name ) - isset($e->attribute)
- $e->removeAttribute ( $name ) - $e->attribute = null
- $e->getElementById ( $id ) - $e->find ( "#$id", 0 )
- $e->getElementsById ( $id [,$index] ) - $e->find ( "#$id" [, int $index] )
- $e->getElementByTagName ($name ) - $e->find ( $name, 0 )
- $e->getElementsByTagName ( $name [, $index] ) - $e->find ( $name [, int $index] )
- $e->parentNode () - $e->parent ()
- $e->childNodes ( [$index] ) - $e->children ( [int $index] )
- $e->firstChild () - $e->first_child ()
- $e->lastChild () - $e->last_child ()
- $e->nextSibling () - $e->next_sibling ()
- $e->previousSibling () - $e->prev_sibling ()
Examples:
1. Find all links, and their text, in a page from a URL:
<?php
include('simplehtmldom/simple_html_dom.php');
// Create DOM from URL or file
$html = file_get_html('http://coursesweb/');
// Find all links, and their text
foreach($html->find('a') as $elm) {
echo $elm->href .' ('.$elm->plaintext. ')<br/>';
}
?>
Result:
html/ (HTML)
css/ (CSS)
javascript/ (JavaScript)
php-mysql/ (PHP-MySQL)
ajax/ (AJAX)
flash/ (Flash - ActionScript)
ex/contact (Contact)
2. Find all images with a specified class attribute, in a HTML content defined in PHP script:
<?php
include('simplehtmldom/simple_html_dom.php');
// Create a DOM object from a string
$html = str_get_html('<div><img src="image1.jpg" alt="Img1" class="cls" /><br/>
<img src="image2.png" alt="Img2" /></div><p>Some text</p>
<img src="image3.gif" alt="Img3" class="cls" />');
// Find all images with class="cls"
foreach($html->find('img.cls') as $elm) {
echo $elm->src. '<br/>';
}
?>
Result:
image1.jpg
image3.gif
3. Get the id of the first LI in a UL list, change its text, and output the new content
<?php
include('simplehtmldom/simple_html_dom.php');
// Create a DOM object from a string
$html = str_get_html('<nav><ul>
<li id="idli1" class="cls">List 1</li><li>List 2</li><li class="cls">List 3</li>
</ul></nav>');
// Get the id of the first LI in UL, and change its content
$idli = $html->find('li', 0)->id;
if($idli) echo 'First LI id: '. $idli;
$html->find('ul li', 0)->innertext = '<b>PHP Simple HTML DOM</b>';
echo $html;
?>
Result this HTML code:
First LI id: idli1
<nav><ul>
<li id="idli1" class="cls"><b>PHP Simple HTML DOM</b></li>
<li>List 2</li>
<li class="cls">List 3</li>
</ul></nav>
4. Using a callback function, that is applied to each element in DOM (changes the class attribute).
<?php
include('simplehtmldom/simple_html_dom.php');
// Create a DOM object from a HTML file
$html = file_get_html('test.htm');
// Write a function with parameter "$elm"
function changeCls($elm) {
// if LI with class="cls", change the class
if ($elm->tag=='li' && $elm->class=='cls') {
$elm->setAttribute('class', 'class_2');
}
}
$html->set_callback('changeCls');
echo $html;
?>
- In the archive with "PHP Simple HTML DOM" class you'll find more examples, and documentation (in directory accessed from server).
•
PHP Simple HTML DOM Web Site.