Since you need to recursively crawl a site, it probably makes sense to use modules like WWW::Spyder or WWW::Robot and HTML::Parser.
Retrieving a URL and parsing the document is fairly simplistic. But recursing a website is a bit trickier. It would be a lot more useful if that website provided simplified (machine readable) results. They might.
If you want to write the spidering code yourself, keep in mind that a 'for' loop is just a 'while' loop in disguise:
# do something
}
is the same as
while ($i++ < 10) {
# do something
}
Recursion, on the other hand, usually relies on subroutines that call eachother or themselves until some condition is met (you crawled the part of the site you wanted).
Or would using the same data locally be more useful (if that is the same data, I'm betting yes)?
Retrieving a URL and parsing the document is fairly simplistic. But recursing a website is a bit trickier. It would be a lot more useful if that website provided simplified (machine readable) results. They might.
If you want to write the spidering code yourself, keep in mind that a 'for' loop is just a 'while' loop in disguise:
Code:
for (my $i = 0; $i < 10; $i++) { # do something
}
is the same as
Code:
my $i = 0; while ($i++ < 10) {
# do something
}
Recursion, on the other hand, usually relies on subroutines that call eachother or themselves until some condition is met (you crawled the part of the site you wanted).
Or would using the same data locally be more useful (if that is the same data, I'm betting yes)?