Ok...their code was just a problem where I had missed a 'my' decleration
Fixed that, and now it works fine. I'm trying to use the following code to get URL's from a specific page;
Code:
print $IN->header();
use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;
my $ua = new LWP::UserAgent;
# Set up a callback that collect image links
my @urls = ();
my ($p, $res);
sub callback {
my($tag, %attr) = @_;
return if $tag eq 'img'; # we only look closer at <img ...>
push(@urls, values %attr);
}
# Make the parser. Unfortunately, we don't know the base yet
# (it might be diffent from $url)
$p = HTML::LinkExtor->new(\&callback);
# Request document and parse it as it arrives
$res = $ua->request(HTTP::Request->new(GET => $page),
sub {$p->parse($_[0])});
# Expand all image URLs to absolute ones
my $base = $res->base;
@urls = map { $_ = url($_, $base)->abs; } @urls;
# Print them out
print join("<BR>", @urls), "\n";
The problem I am having, is that it is returning images too, for example, http://www.gossamer-threads.com, returns;
Quote:
http://www.gossamer-threads.com/includes/threads.css
http://www.gossamer-threads.com/index.htm
http://www.gossamer-threads.com/scripts/index.htm
http://www.gossamer-threads.com/services/index.htm
http://www.gossamer-threads.com/support/index.htm
http://www.gossamer-threads.com/contact/index.htm
http://www.gossamer-threads.com/jobs/index.htm
http://www.gossamer-threads.com/images/black.gif
http://www.gossamer-threads.com/corporate/index.htm
http://www.gossamer-threads.com/perl/gforum/gforum.cgi?forum=16
http://www.gossamer-threads.com/forum/Gossamer_AutoRespond_1.1.1_Security_Fix_P216296/
http://www.gossamer-threads.com/forum/Forum_Guidelines_P216271/
http://www.gossamer-threads.com/forum/Gossamer_Forum_1.1.8_Released!_P213168/
http://www.gossamer-threads.com/forum/Gossamer_AutoRespond_1.1.0_Now_Available!_P211856/
http://www.gossamer-threads.com/forum/Gossamer_Forum_1.1.7_Released!_P205091/
http://www.gossamer-threads.com/scripts/index.htm
http://www.gossamer-threads.com/scripts/webmail/index.htm
http://www.gossamer-threads.com/scripts/links-sql/index.htm
http://www.gossamer-threads.com/scripts/gforum/index.htm
http://www.gossamer-threads.com/scripts/autores/index.htm
http://www.gossamer-threads.com/scripts/mysqlman/index.htm
http://www.gossamer-threads.com/scripts/dbman-sql/index.htm
http://www.gossamer-threads.com/scripts/dbman/index.htm
http://www.gossamer-threads.com/scripts/links/index.htm
http://www.gossamer-threads.com/scripts/fileman/index.htm
http://www.gossamer-threads.com/scripts/register/index.htm
http://www.gossamer-threads.com/services/index.htm
http://www.gossamer-threads.com/support/index.htm
http://www.gossamer-threads.com/perl/gforum/
http://www.gossamer-threads.com/scripts/resources/
http://www.gossamer-threads.com/images/foot_bkgd.gif
http://www.gossamer-threads.com/index.htm
http://www.gossamer-threads.com/scripts/index.htm
http://www.gossamer-threads.com/services/index.htm
http://www.gossamer-threads.com/support/index.htm
http://www.gossamer-threads.com/contact/index.htm
http://www.gossamer-threads.com/jobs/index.htm
http://www.gossamer-threads.com/images/foot_bkgd.gif
I'm still trying to work out how to edit this sub, so that it will not pass on select things...such as .gif/.jpg's/css etc.
Any ideas?
Cheers
Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my
Amazon Wish List GLinks ULTRA Package | GLinks ULTRA Package PRO Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin |
Pre-Made Template Sets |
FREE GLinks Plugins!