Home PHP JavaScript CSS WordPress APIs .htaccess Other How-To Useful Scripts What I Recommend
Posted on by Aleksandar Gichevski ()
Today I will write about one great php parser that made my coding easier last week. Last Wednesday I had to scrape some html code, extract all necessary data that was around 50 different values positioned all over html and store in php array for further use. When I started creating my script I thought how it will be very easy if this was coded in JavaScript/jQuery instead PHP, we would have used JS predefined functions and get the data based on ids, class attributes and other info... I did quick search on internet to check if there is some good php parser that would do exact what is possible to do with JS and I found PHP Simple HTML DOM Parser.

What is PHP Simple HTML DOM Parser?

This is HTML DOM Parser written in PHP that gives you all needed functions to easily manipulate with any HTML.
  • Supports invalid HTML.
  • Find tags on an HTML page with selectors just like jQuery.
  • Extract contents from HTML in a single line.

What can you do with PHP Simple HTML DOM Parser?

With HTML DOM Parser you can do everything that you can with JavaScript when working with HTML like:
  • Extract HTML content directly from URL, string or HTML file, DOM element can be created on 2 ways, Normal and Object-oriented
  • Get HTML elements by tag, attribute or by using descendant and nested selectors
  • Modify HTML elements - Gee/Set/Remove attributes and Insert/Append/Remove elements
and more...

Few Examples

Let me show you with few examples what this powerful script can do... Create DOM Element Quick Way:

// Create a DOM object from a URL
$html = file_get_html('http://www.google.com/');

// Create a DOM object from a HTML file
$html = file_get_html('test.htm');
Object Oriented Way:

// Load HTML from a URL 

// Load HTML from a HTML file 
Find HTML Elements
// Find all anchors, returns a array of element objects
$ret = $html->find('a');

// Find (N)th anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', 0);

// Find lastest anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', -1); 

// Find all 
with the id attribute $ret = $html->find('div[id]'); // Find all
which attribute id=foo $ret = $html->find('div[id=foo]'); // Find all element which id=foo $ret = $html->find('#foo'); // Find all element which class=foo $ret = $html->find('.foo'); // Find all element has attribute id $ret = $html->find('*[id]'); // Find all anchors and images $ret = $html->find('a, img'); // Find all anchors and images with the "title" attribute $ret = $html->find('a[title], img[title]'); // Find all
  • in