Gossamer Forum
Home : Products : Links 2.0 : Customization :

goFetch Spider

(Page 2 of 3)
> > > >
Quote Reply
Re: goFetch LinkSpider released! In reply to
i just said you import manually, and i don't know what you mean by the above because above is Thanks Bmxer.

Domenic, i really don't know what to say, it uses LWP, i may eventually make it all Sockets some time, but not now, b/c i am on another mod, translation mod. Maybe i'll do something in the next release for goFetch, which'll be like in a month or 2

Quote Reply
Re: goFetch LinkSpider released! In reply to
Oh...hey that's ok Bmxer.......don't do it on my account! :) It looks like a cool program though-----BTW- great job on it, I'm sure it will get tons of use.

Quote Reply
Re: goFetch LinkSpider released! In reply to
i was gonna do it eventually anyway, theres no need to require Sockets and LWP, so i might end up doing it, might not

Quote Reply
Re: goFetch LinkSpider released! In reply to
ok, as it seems, i will probably get a database manager, freeware, and make it do the things for goFetch admin, like importing masslinks, deleting, searching.

Quote Reply
Re: goFetch LinkSpider released! In reply to
DBMAN is an excellent Database Manager program (does everything you are attempting to do except for import, but can be added on).

Regards,

Eliot Lee
Quote Reply
Re: goFetch LinkSpider released! In reply to
yea i went through many managers, and saw dbman but was swayed away from it because it looks like it is made for multiple users, and won't just read a db, for one user, and do all that stuff, i mean, maybe being able to login and have multiple admins that do the same thing, but i don't
need users. Correct me if i'm wrong please.
-------------------------------------------
Nevermind, it looks good, i'll probably take the modify out,and make the add feature coexist with spidering dmoz categories, u just specify a category. And delete is delete, i like the view all because i'm sure it'll do it in sets of something and show other records on different pages, which is what i wanted to accomplish with readdb.pl but won't have to with this. thanks AnthroRules for getting me to finally look at dbman
Quote Reply
Re: goFetch LinkSpider released! In reply to
Regarding the multiple users...that is incorrect...It can be set up for admin only access OR multiple user access.

The only thing you will have to add is the import feature (but as you can probably see in the DBMAN Forums and in the Resource Center...there are many Modifications to the script), which should not be too hard since there are already export Mods (for Excel and Postscript files). And there are a few Mods for IMPORTING password files from other programs into DBMAN.

Wink

In Reply To:
thanks AnthroRules for getting me to finally look at dbman
You're welcome!

I use DBMAN in various sections of my LINKS site, as you probably have noticed. In addition, I use DBMAN for other projects in my site (like the Career Connection, Alumni Database, etc.).

Regards,


Eliot Lee
Quote Reply
Re: goFetch LinkSpider released! In reply to
I have several static html pages on my site in a section, which I had created before I discovered links.

The question is, using this spider is it possible to spider thses pages and bring those links in to the db to build a links 2.0 directory?

Thanks

Quote Reply
Re: goFetch LinkSpider released! In reply to
In theory, it should...all you have to do is type in the URL where these "static" pages are located. Of course, you will have to delete all the extraneous links that you do not want to have in your new links.db file (like local links to other pages in your site).

Regards,

Eliot Lee
Quote Reply
Re: goFetch LinkSpider released! In reply to
Oh, great - thanks Eliot and Bmxer. I will try and see.

Quote Reply
Re: goFetch LinkSpider released! In reply to
you know what, i don't think this spider blocks .cgi, .pl urls, i'll have to do that and release soon

Quote Reply
Re: goFetch LinkSpider released! In reply to
I don't know if that is necessary a good idea to block dymanic URLS (php, asp, cgi, pl, jsp, jhtml). As web sites move more and more in the direction towards dynamic structures, it would be a shame to block those types of sites from the "spidering" process.

Regards,

Eliot Lee
Quote Reply
Re: goFetch LinkSpider released! In reply to
Hi BMXer...

I've sent you a couple of emails already about what you plan to do with this MOD... but can I suggest that you think about the following idea for enhancing it slightly...

- Script matches the spidered text against categories/keywords in the Links category.db and selects the most relevant one for that page
- Script looks for an email address (a href="mailto:...") on the page that is likely to be most relevant (matches against a set of norms eg: 'admin', 'webmaster', etc...) and selects the top 2 most likely (if available) and when verifying the admin has to select one or input one instead.

That's just a couple of ideas I'd really like to see added to the script if possible. If you don't plan to do it yourself I'd appreciate a little insight as to where any Mod's need to target to get the above working...

Thanks and great work on creating this little gem
:)

Quote Reply
Re: goFetch LinkSpider released! In reply to
#1 will really slow the script down, the thing i'm thinking about is integrating it into dmoz categories, which will basically be like the rdf importing, except its spidering and won't get links from dmoz sites, only external links, and it'll have a warning maybe saying there's only so and so links in the directory, try deeper.

2, is redundant i think, when you can just have a form with an email field in it. I see that many are dumbasses and have to put none@none.com, so maybe i'll have two email addresses show up in the admin, the first one, the one they submitted and if that looks relevant i'll take that one, then i'll parse out email addresses on the spidered page, and take the one that has that domain ending on it, then that one will show up under it. Then the admin can click a checkbox to choose which one he wants.

Quote Reply
Re: goFetch LinkSpider released! In reply to
Thanks for the reply..

I'd still be interested in anything that pulls a lot of info out of a page and highlights it if nothing else.. The email address issue is one that tends to drive me up the wall as my site is not really likely to garner tons of attention and people clamouring to register until I've found and added something like 2,000 links %-/

Please keep me on the list for any updates, idea bouncing, etc.. it's something I'd be more than interested to keep up to date on and help (where I can)..

Cheers again..
:)
Martin

Quote Reply
Re: goFetch LinkSpider released! In reply to
Is there anyone else besides Catflap who would want to know and be updated maybe bidaily about what i'm doing with this?

Quote Reply
Re: goFetch LinkSpider released! In reply to
Yeah, I would enjoy updates. I was also wondering if there was a way to make the spider transparent to users. Have you ever been to google or hotbot. They have you submit your information and forward to another page, where you never see any of the information being processed. I've looked at your code and see where all the printing comes in but am afraid to mess with it much. Anyway, just a thought.
Thanks,
Paul

http://www.fullmoonshining.com for Pearl Jam Fans
Quote Reply
Re: goFetch LinkSpider released! In reply to
i can't add you to the list without an email address, so you can private message me if you want and give it to me.
That would be maybe another release, because it wouldn't be hard, but just changing the html, which would indeed speed up the spider. Because my tests show that when the spider is on link 60 when the page is loading, the last link will be 42. Anyway, that'll be a private release for select users.

Quote Reply
Re: goFetch LinkSpider released! In reply to
Oh, I definitely want to be set on that list as well pls...

Quote Reply
Re: goFetch LinkSpider released! In reply to
I have installed this great mod (and thanks Bmxer for the great mod) and I have a couple of questions.

I'm using hypermart as my web host, and the mod seems to be working ok, but when I check hypermart cgi-error logs, I have the below error messages repeated tens of times.

Code:
[Mon Jul 17 05:58:42 2000] goFetch.cgi: Use of uninitialized value at goFetch.cgi line 642.
[Mon Jul 17 05:58:42 2000] goFetch.cgi: Use of uninitialized value at goFetch.cgi line 674.
[Mon Jul 17 05:58:42 2000] goFetch.cgi: Use of uninitialized value at goFetch.cgi line 675.
[Mon Jul 17 05:58:42 2000] goFetch.cgi: Use of uninitialized value at goFetch.cgi line 676.
[Mon Jul 17 05:58:42 2000] goFetch.cgi: Use of uninitialized value at goFetch.cgi line 677.
[Mon Jul 17 05:58:57 2000] readdb.pl: Name "main::ad" used only once: possible typo at readdb.pl line 18.
[Mon Jul 17 05:58:57 2000] readdb.pl: Can't exec "/data1/hypermart.net/lookhard/look-bin/look-ads/click.cgi": Permission denied at readdb.pl line 89.
here are the lines mentioned above;

Code:
642 $mydescrip = substr($mydescrip,0,225);
674 $mykeywords =~ s/
//isg;
675 $mykeywords =~ s/<p>//isg;
676 $mykeywords =~ s,\n,,g;
677 $mykeywords = substr($mykeywords,0,145);
and lines from readdb.pl

Code:
18 $ad = '';
89 $banner = `/data1/hypermart.net/lookhard/look-bin/look-ads/click.cgi`;
would we great if someone can tell me what these all about.


Also I have tested this mod on a few sites, I mean sites that have even links program installed, I was expecting for it to fetch for all the links on these pages, but instead it only brought back the category pages. Am I doing something wrong, or does this mod suppose to do this.




Quote Reply
Re: goFetch LinkSpider released! In reply to
I have just figured out why wasnt I getting the links that are registered on other links sites. If the links has

Code:
http://www.anysite.com/cgi-bin/jump.cgi?ID=639
meaning it doesnt have a direct links but some sort of jump cgi or asp, then it doesnt go and fetch the info on this page, which I really would love it to do. could someone help me out with this, Bmxer if you reading this, would it be too much trouble to add this feature to this great mod of yours.

thanks.

Quote Reply
Re: LWP In reply to
Domenic,

Sorry for taking so long to answer your question. Yes, my hosting company did install LWP at my request. Now goFetch is running beautifully. Bmxer's script calls several other modules besides LWP::useragent, therefore it is best to have the entire LWP library installed. libwww-perl-5.48 is the name of the library and it is available on CPAN.

My hosting company, by the way, is Intersessions. You can find them on the web at http://www.intersessions.com. They are usually very responsive to all of my requests. This LWP request of mine was the first that took longer than usual.

Good luck in your quest for LWP!

Floristan

Quote Reply
Re: goFetch LinkSpider released! In reply to
i don't see why you would think it would fetch the jump.cgi link since the part that gets subsites is in lwp and thats alot harder to do in lwp than sockets.

also, the reason all those errors are showing up is because of the -w on the shebang line, just delete that and they'll go away. The shebang line is the top line, ie. #usr/bin/perl -w

Quote Reply
Re: goFetch LinkSpider released! In reply to
Hi,
first of all, great mod of links 2.0
A nice addon would be a connector of
spider.db to validate.db.

Quote Reply
Re: goFetch LinkSpider released! In reply to
have i not said that that was in store for this. I can't do a billion things at once so thats not coming out now,unless one of you geniouses make it.

> > > >