`

使用php simple html dom parser解析html标签

    博客分类:
  • php
阅读更多

用了一下

PHP Simple HTML DOM Parser

解析HTML页面,感觉还不错,它能创建一个DOM tree方便你解析html里面的内容。用来抓东西挺好的。

 

附带一个例子,你也到sourceforge下载压缩包看里面的例子:

<!----> <!----> <!---->

Scraping data with PHP Simple HTML DOM Parser

<!---->

Save to StumbleUpon  Stumble Upon it!

  <!---->

  Save to Del.icio.us  Save to Del.icio.us   (9 saves)

  <!---->

Share on Twitter!  Share on Twitter!

<!----> <!---->

PHP Simple HTML DOM Parser , written in PHP5+, allows you to manipulate HTML in a very easy way. Supporting invalid HTML, this parser is better then other PHP scripts using complicated regexes to extract information from web pages.

Before getting the necessary info, a DOM should be created from either URL or file. The following script extracts links & images from a website:

  1. // Create DOM from URL or file   
  2. $html  = file_get_html( 'http://www.microsoft.com/' );  
  3.   
  4. // Extract links   
  5. foreach ( $html ->find( 'a' as   $element )  
  6.        echo   $element ->href .  '<br>' ;   
  7.   
  8. // Extract images   
  9. foreach ( $html ->find( 'img' as   $element )  
  10.        echo   $element ->src .  '<br>' ;  
// Create DOM from URL or file
$html = file_get_html('http://www.microsoft.com/');

// Extract links
foreach($html->find('a') as $element)
       echo $element->href . '<br>'; 

// Extract images
foreach($html->find('img') as $element)
       echo $element->src . '<br>';

The parser can also be used to modify HTML elements:

  1. // Create DOM from string   
  2. $html  = str_get_html( '<div id="simple">Simple</div><div id="parser">Parser</div>' );  
  3.   
  4. $html ->find( 'div' , 1)-> class  =  'bar' ;  
  5.   
  6. $html ->find( 'div[id=simple]' , 0)->innertext =  'Foo' ;  
  7.   
  8. // Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>   
  9. echo   $html ;  
// Create DOM from string
$html = str_get_html('<div id="simple">Simple</div><div id="parser">Parser</div>');

$html->find('div', 1)->class = 'bar';

$html->find('div[id=simple]', 0)->innertext = 'Foo';

// Output: <div id="simple">Foo</div><div id="parser" class="bar">Parser</div>
echo $html;

Do you wish to retrieve content without any tags?

  1. echo  file_get_html( 'http://www.yahoo.com/' )->plaintext;  
echo file_get_html('http://www.yahoo.com/')->plaintext;

In the package files of this parser (http://simplehtmldom.sourceforge.net/) you can find some scraping examples from digg, imdb, slashdot. Let’s create one that extracts the first 10 results (titles only) for the keyword “php” from Google:

  1. $url  =  'http://www.google.com/search?hl=en&q=php&btnG=Search' ;  
  2.   
  3. // Create DOM from URL   
  4. $html  = file_get_html( $url );  
  5.   
  6. // Match all 'A' tags that have the class attribute equal with 'l'   
  7. foreach ( $html ->find( 'a[class=l]' as   $key  =>  $info )  
  8. {  
  9. echo  ( $key  + 1). '. ' . $info ->plaintext. "<br />\n" ;  
  10. }  
$url = 'http://www.google.com/search?hl=en&q=php&btnG=Search';

// Create DOM from URL
$html = file_get_html($url);

// Match all 'A' tags that have the class attribute equal with 'l'
foreach($html->find('a[class=l]') as $key => $info)
{
echo ($key + 1).'. '.$info->plaintext."<br />\n";
}

NOTE Make sure to include the parser before using any functions of it:

  1. include   'simple_html_dom.php' ;  
include 'simple_html_dom.php';

For more information regarding the usage of this function consider checking the ‘PHP Simple HTML Dom Parser’ Manual. To download the package files use the following URL: http://sourceforge.net/project/showfiles.php?group_id=218559 .

分享到:
评论

相关推荐

Global site tag (gtag.js) - Google Analytics