Gossamer Forum
Home : Products : Links 2.0 : Customization :

goFetch Spider

(Page 1 of 3)
> >
Quote Reply
goFetch Spider
Sorry i haven't put it out, but right before finals i was gonna release it, but i found out i needed to study. Then i went and looked at it and decided a more altavista/add.cgi type thing would be better. In which the links would be in validate.db, then you go to the spider.cgi, and then it spiders the sites, and puts them(the main site that was in validate.db, and the sub sites it just got) in links.db. That way you can review them before you spider them. Then it will also send a letter saying they added their url and it needs to be reviewed and spidered to you. Then when its spidered send a letter to them. It may take sometime, about a week, cuz i'm out of school in 2 days, but i still have to work. ya know, plus i'm also working on a relevancy sort mod, since i found code by one user that does the scoring but needs to be put into links sorted.

Quote Reply
Re: New Links Spider In reply to
Thanks for the update, Bmxer.

Regards,

Eliot Lee
Quote Reply
Re: New Links Spider In reply to
You rock!!!!! I am sure I speak for everyone else here when I say, I can't wait to see the spider mod.

RVCorner.com - RV and Camping resource guide
http://www.rvcorner.com
Quote Reply
Re: New Links Spider In reply to
I haven't gotten that far in my newer spider efforts, but if you wanna check out the relevancy sorting thing, u can see half of it at http://lookhard.hypermart.net/...n/Look/sortrelwp.cgi
What this does is take the url you give, and then the query you put in and matches the query to the descrip,keywords, and title. Then scores and you see the points at the bottom. The scoring is hard to get because you don't know if the script took something as insensitive or not, but just know that it works. So basically, all i have to do is make links require this script, then sort by the highest numeric value. But i have to put something that opens the db and sets the links in an array. So just know that if you want this, your searches will probably be longer.

Quote Reply
Re: New Links Spider In reply to
Very nice, Bmxer...similar components of the relevancy search I have in my search.cgi.

Good job!

Regards,

Eliot Lee
Quote Reply
Re: New Links Spider In reply to
I cannot even get straight in my head what to do with the sorting part. i can't find a new sort routine to coexist with the scoring. I know i have to run the link results through the scoring routine in a loop. Then just basically sort as usual with $score first, but i may need any guidance you got. Think you can help?
I'm in need of some.

Quote Reply
Re: New Links Spider In reply to
http://lookhard.hypermart.net/...bin/Look/goFetch.cgi
I'm releasing it tonight, around 11 eastern time. It will be zipped,and have an instruction file with it telling where to make code changes. In the zip will also be a: spider.db,spidered.db, and spiderid.txt. Not hard mod to install, since it doesn't rely on templates. You'll have to know how to write html in perl if you want to change the header part. Just open an html editor, take the existing code, and then form it leaving the variables like $links in it so all you have to do is copy and paste back into the script. The name is weird, oh well, its faster, and i'm putting in a feature that will block urls from being spidered after they have been once, it emails the admin after a main site is submitted to let him/her know. It shows the size of the current database as you will see. Anyway, tonight.

Quote Reply
Re: New Links Spider In reply to
Pretty cool, Bmxer...seems to take awhile...I am unsure about the email field since I never received an email message.

Also...the spider seems to just grab tons of links from the same domain...which may be problematic in editing the spider.db file.

Example: I entered a directory off of www.yahoo.com and most of the subsites were just directories under www.yahoo.com. Only 4 out of 58 links were unique links.

Regards,

Eliot Lee
Quote Reply
Re: New Links Spider In reply to
the email goes to me, the admin, not you, the user. It takes a while because it does alot for those 58 links. Also, i can't do anything about how many internal links yahoo has on that page. You should go deeper if you want more external links. I could parse out internal links,but for that directory, you'd only end up with 4 links. Anyway, sorry no release at 11, connection problems, i'm doin it now

Quote Reply
Re: goFetch LinkSpider released! In reply to
ok, the goFetch1.0 Beta1 program is at
http://lookhard.hypermart.net/...s/goFetch1.0_Beta-1/
I added an email to the user. Thanks to Eliot for bringing that to my attention. It looks better that way. Pretty much, the only 'bug' i see is that some sites do not have a last modified date on their page/header, so the spider automatically puts todays date and time, but thats the pages fault, because the last modified does work. Anyway, enjoy, i'm glad i got rid of it.
email if bugs
released 6/19/00 1:18pm eastern

Quote Reply
Re: goFetch LinkSpider released! In reply to
i found one bug, the subsite's size weren't being printed, they were 0. So i updated that. It seems like noone is testing because i got no replies back here so i just need one reply to know it works for ya'll, or i may pull the zip off my site.

Quote Reply
Re: goFetch LinkSpider released! In reply to
Wait! Don't pull it! This is one of the most exciting scripts I've ever encountered. When it comes out of beta I will be among the first to PAY you for it.

I was not able to run the script yet because I don't have LWP::UserAgent on my server. I have a virtual server, so I have no control over how Perl is configured. I e-mailed my system administrator about it, but he hasn't responded yet. Can I upload it myself? A folder called LWP with Perl modules was distributed with Links 2.0. Can I put LWP::UserAgent there and if I do will your script find it?

This is the directory structure I'm talking about:

links2:cgi-bin:admin:LWP:Parallel:Protocol



Thanks for your help and for all the thought and energy you've put into your script. I hope I can test it soon!



Quote Reply
Re: goFetch LinkSpider released! In reply to
just put this above the lwp use call:
use lib '/full/path/to/links2/cgi-bin/admin/';
so it stops at admin, because the LWP folder would be in admin.

Quote Reply
Re: goFetch LinkSpider released! In reply to
it works great! i had some problems getting it to work the first time. but i just refreshed the browser and it worked perfect from then on in. i am very excited about this bmxer, thanks a whole lot!

Quote Reply
Re: goFetch LinkSpider released! In reply to
Thanks for your fast response, but it didn't work for me.

Now your script finds UserAgent.pm, but it seems that UserAgent calls other modules such as debug.pm and expects them to be in the same directory. I don't know enough about Perl to try anything else on my own.

Since posting my first message I discovered that the file UserAgent.pm is, in fact, distributed with Links 2.0. Would it be possible for you to modify your script to call it from the location that Alex put it in? That way all Links users would be able to use your script. Here is the full path:

Links2:cgi-bin:admin:LWP:Parallel:UserAgent.pm



Quote Reply
Re: goFetch LinkSpider released! In reply to
Poil, can you give the link to where its at?

Floristan,
get rid of that use lib line, and take off the two
use LWP::UserAgent Lines and replace them with this
require "/directory/path/to/Links2/cgi-bin/admin/LWP/Parallel/UserAgent.pm";

Quote Reply
Re: goFetch LinkSpider released! In reply to
Thanks bmxer.

I tried your new suggestion, but my server still returns an error message as follows:

BEGIN failed--compilation aborted at goFetch.cgi line 625.

I guess I will have to insist that my system administrator install UserAgent.pm. Where should I tell him to put it? I really want to run your script.

Thanks in advance.

Quote Reply
Re: goFetch LinkSpider released! In reply to
tell him to just put it in the perl directory

Quote Reply
Re: goFetch LinkSpider released! In reply to
Hi,
This is a GREAT program! I've been waiting for it for many months!
Two questions though:
- why does the program email the link owner as well as the admin of links2? How, do I turn them off?
- how do I import the spider.db into links.db?
Thanks!


Quote Reply
Re: goFetch LinkSpider released! In reply to
http://www.poilsports.net/spider/goFetch.cgi
uhm...ya, havn't had any time to realy play with it, it works...

Quote Reply
Re: goFetch LinkSpider released! In reply to
Thanks poil, i'd be glad if everyone who uses it can comment and give the url, but i won't run up your database or anything.
Luxo,
wow, i thought about putting in a turning off option but i forgot to put it in.
I will post code tonight, that will only take two lines of editing to do this in goFetch. I had it sent to the user in case they didn't know why their site was spidered when they looked at their logs.


Quote Reply
Re: goFetch LinkSpider released! In reply to
ok, to turn on/off the emailing to the user or admin(u), do this:

under
Code:
use Socket;
put
Code:
$sendadminmail = 1;
$sendusermail = 1;
0 means emailing is off, 1 is on
then lower
above the sub webEncode you will see
&send_email();
&send_useremail();
replace them with this
Code:
if ($sendadminmail == 1) {
&send_email();
}
if ($sendusermail == 1) {
&send_useremail();
}
as of now, you import manually, it says right there in the instructions u will have to be able to edit goFetch so you can put your links fields into the code. As of now, only ID, Title,Description,Keywords,Size,and Date are fields.
Quote Reply
Re: goFetch LinkSpider released! In reply to
Thanks bmxer!
Great job!

Quote Reply
Re: goFetch LinkSpider released! In reply to
i am new to perl and all, how do you do the above? and import the spidered database into the links database?

Quote Reply
Re: goFetch LinkSpider released! In reply to
This sucks....once again, a mod I can't use because I don't have LWP!! Floristan, did you ever get it working with the files that come with LINKS? Bmxer, do you know the best way to get this working without having to have my hosting company install LWP? Thanks.

> >