Gossamer Forum
Home : Products : Links 2.0 : Discussions :

Selective Spider?

Quote Reply
Selective Spider?
Hello,

Is there a Spider program that can bring home just what Links wants to see? Sounds pie in the sky, but if there are already Spiders that does this for all of the big engines, why not for links...that would rock!

Rod


Quote Reply
Re: Selective Spider? In reply to
1) Spider Pro - located in the Resource Center.

2) goFetch - located at http://lookhard.hypermart.net/links-mods

Good luck!

Regards,

Eliot Lee
Quote Reply
Re: Selective Spider? In reply to
This discussion is better suited to the Links 2.0 Customization forum but I will discuss it here anyway.
Please bear with me, this is the first code modification I have submitted and I am still learning perl on my own.
I use goFetch in a modified form.

1. Everywhere in the script it says $db_spider_id_file_name, I replaced that with $db_links_id_file_name.
2. Everywhere in the script it says $db_spider_name, I replaced it with $db_valid_name.

Ok, so now when it spiders a page it reads how many links there are and updates the validate.db file.

But it isn't that simple. My database is pipe delimited so I had to change:
print SPIDER "$ID%%$mytitle%%$myurl%%$mydescrip%%$mykeywords%%$mysize%%$lastupd\n";
to:
print SPIDER "$ID|$mytitle|$myurl|$mydescrip|$mykeywords|$mysize|$lastupd\n";

You have to match how many fields you have in your database. I have 17 fields so I had to change it to:
print SPIDER "$ID|$mytitle|$myurl|$date||$mydescrip|Name Here|your\@email.com|||||||||$date\n";

Also note that $lastupd is changed to $date to produce the right date format. You also have to change:
use HTTP::Date;
$lastupd = time2str($res->last_modified);

to:
$date = &get_date;

(Note, I still get one problem with this, though. The first 2 links always have the infamous 1969 date!)


C. Now you have to eliminate some characters from the title and description fields.
Beneath:
# Update the counter.
open (ID, ">$db_links_id_file_name") or &cgierr("error in get_defaults. unable to open id file: $db_links_id_file_name. Reason: $!");
flock(ID, 2) unless (!$db_use_flock);
print ID $ID; # update counter.
close ID; # automatically removes file lock
open (SPIDER, ">>$db_valid_name") or &cgierr ("Can't open for output counter file. Reason: $!");
if ($db_use_flock) { flock (SPIDER, 2) or &cgierr ("Can't get file lock. Reason: $!"); }


I added this:
$mydescrip =~ tr/\n//d;
$mydescrip =~ tr/|\n//d;
$mytitle =~ tr/|\n//d;

to remove the | character from the title and description. and to remove line breaks from the description.

Now it should enter everything into the valildate.db in the right slots with no pipes or line breaks to screw up the fields.

Now I need a page to spider. The way I do it is use a shareware program called UrlSearch. I find a page with links I would like to add,
then I save it to my hard drive. Then open it with UrlSearch and eliminate the irrelevant links. Then save it, upload it to my
server then spider it on my server. I usually only do 10 or 15 at a time because I still have to validate them to check the title
and description. But it is not often that both the title and description turn out OK. Too many people do not have a meta tag
for description. And when it builds a description from the content it puts in javascript and other things it doesn't understand.
Sometimes it is easier to use the bookmarklet tool to add links because you can highlight your description on the page.

Too much rambling now. Anyway I have it working on my site and am now adding 20 or 30 relevant links to my site a week this way.

Mike
http://www.sweepstalk.com

Quote Reply
Re: Selective Spider? In reply to
Hello,

pls mail the limits for the spider.
- how many links
- domain limit
- stop words
- spider words