Gossamer Forum
Quote Reply
Spider
Does anyone remember/use Bmxers(?) (I'm not sure it is Bmxers) spider mod for Links2?

It was about 1 million lines long if I remember correctly (Smile).

Well if anyone is interested I have made a mod that does the same job but is only currently 20 lines long.

Example:

http://www.wiredon.net/test/spider.cgi (currently spidering Yahoo)

It is currently a standalone script but would only take 5 minutes to make it work with a Links2 template.

Interested?

Installs:http://wiredon.net/gt
FAQ:http://www.perlmad.com

Quote Reply
Re: Spider In reply to
I have looked at your demo, maybe its me being silly but what is the purpose of this script and where would it be used.

This is not meant to sound narky, i geniunely can't see the reason for it.

Quote Reply
Re: Spider In reply to
Well it can be used to SPIDER sites (as the name suggests), allowing people to stay on your directory page whilst being able to see all links on the other site.

It can also be quickly altered to spider images only which can be used for many different reasons, including image galleries (with permission) etc....it can also be altered to spider each link collected which can then be added to links.db

Installs:http://wiredon.net/gt
FAQ:http://www.perlmad.com

Quote Reply
Re: Spider In reply to
if its 20 lines long then it doesn't do what mine did. What does yours do? features? etc...

Quote Reply
Re: Spider In reply to
Yeah I'm interested ... open source it here :)

http://www.ActiveWebmaster.com - Webmaster Resources Cool
Quote Reply
Re: Spider In reply to
Bmxer - I'm still not sure it is yours I'm thinking of. It is the one that allows you to add:

<a href="<%db_cgi_url%>/spider.cgi?URL=<%URL%>">Spider</a>

....into link.html and clicking on it will go to that URL and parse the page and return the list of links found on the page.

At the moment mine just does exactly that but it should be easy to get it to add stuff into links.db and I've already tried getting it to grab image tags and that worked well. It can be made to parse any tag you want.

Installs:http://wiredon.net/gt
FAQ:http://www.perlmad.com

Quote Reply
Re: Spider In reply to
Ok here's the code but I don't have time to alter it to work with a Links2 template. So at the moment it is a standalone script.

Code:
#!/usr/bin/perl
Code:
use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;
Code:
my ($url,$agent,$page,$results,$bs,@ahref);
$url = "http://www.yahoo.com";
$agent = new LWP::UserAgent;
Code:
my @links = ();
$page = HTML::LinkExtor->new(\&scan);
$results = $agent->request(HTTP::Request->new(GET => $url), sub {$page->parse($_[0])});
my $bs = $results->base;
@links = map { $_ = url($_, $bs)->abs; } @links;
Code:
sub scan {
my($html, %tag) = @_;
return if $html ne 'a';
push(@links, values %tag);
}
Code:
push @ahref, "<a href=\"$_\">$_</a>" foreach @links;
print "Content-type: text/html\n\n";
print join("< br >", @ahref);
Remember to remove the spaces in the br tag on the bottom line.

Installs:http://wiredon.net/gt
FAQ:http://www.perlmad.com

Quote Reply
Re: Spider In reply to
Paul,

Bmxer's mod is called GoFetch (in the Resource Center). It is saving spidered URLs in a database, not just displaying spidered sites. Bmxer was working on integrating DBMan as admin script last summer.

Thomas
http://links.japanref.com
Quote Reply
Re: Spider In reply to
Yeah I didn't think it was Bmxers, thats why I sounded unsure in my first post - I can't think who's mod Im thinking of though. But anyway the code above can be easily altered to spider each URL you find to add it into links.db (but I only created it as a quick little thing to shows all the URL's on a particular page).

To make the Links2 add page like a spider is also fairly easy. You just need a text box for URL/Email on add.html and then get add.cgi to visit the URL and fetch the page title, description and then you already have the URL and email as well so it would be fairly easy to get all the info you needed to put into links.db

Installs:http://wiredon.net/gt
FAQ:http://www.perlmad.com