Gossamer Forum
Home : General : Perl Programming :

parsing HTML

Quote Reply
parsing HTML
hi,

having trouble parsing a bunch of html held in a scalar. i've read through html::parser and html::treebuilder and i still can't tell exactly what i need to do. so, i have a var with tons of html code in it, and let's say i want to print out everything contained between <td> and </td> tags.

can someone give me a quick example? thanks!

i started with

Code:


use HTML::TreeBuilder;

my $data = "lots_of_hmtl";

my $parser = HTML::TreeBuilder->new_from_content;
$parser->parse($data);


now what do i do to get all content between all <td> and </td> tags?

thanks!
Quote Reply
Re: [adrockjames] parsing HTML In reply to
I've always found HTML::TokeParser the best module for this sort of thing. Their docs has a good example:

http://www.perldoc.com/...HTML/TokeParser.html

Code:
use HTML::TokeParser;
$p = HTML::TokeParser->new(shift||"index.html");
if ($p->get_tag("title")) {
my $title = $p->get_trimmed_text;
print "Title: $title\n";
}

- wil
Quote Reply
Re: [adrockjames] parsing HTML In reply to
Do you just want to match <td></td> all the time or is that just an example of what you want...will you be changing to other tags?