Gossamer Forum: General: Perl Programming: parsing HTML

Jan 26, 2003, 12:13 PM

adrockjames

Novice (35 posts)

Jan 26, 2003, 12:13 PM

Post #1 of 3

Shortcut

parsing HTML

hi,

having trouble parsing a bunch of html held in a scalar. i've read through html::parser and html::treebuilder and i still can't tell exactly what i need to do. so, i have a var with tons of html code in it, and let's say i want to print out everything contained between <td> and </td> tags.

can someone give me a quick example? thanks!

i started with

Code:

use HTML::TreeBuilder;  

my $data = "lots_of_hmtl"; 

my $parser = HTML::TreeBuilder->new_from_content; 
$parser->parse($data);

now what do i do to get all content between all <td> and </td> tags?

thanks!

Jan 31, 2003, 2:09 AM

Wil

Veteran / Moderator (4108 posts)

Jan 31, 2003, 2:09 AM

Post #2 of 3

Shortcut

Re: [adrockjames] parsing HTML In reply to

I've always found HTML::TokeParser the best module for this sort of thing. Their docs has a good example:

http://www.perldoc.com/...HTML/TokeParser.html

Code:
  use HTML::TokeParser; 
  $p = HTML::TokeParser->new(shift||"index.html"); 
  if ($p->get_tag("title")) { 
      my $title = $p->get_trimmed_text; 
      print "Title: $title\n"; 
  }

- wil

Jan 31, 2003, 3:17 AM

Paul

Veteran (19537 posts)

Jan 31, 2003, 3:17 AM

Post #3 of 3

Shortcut

Re: [adrockjames] parsing HTML In reply to

Do you just want to match <td></td> all the time or is that just an example of what you want...will you be changing to other tags?