Gossamer Forum: Products: Links 2.0: Customization: goFetch Spider: Page 2

Jun 21, 2000, 8:06 PM

Bmxer

Veteran (1311 posts)

Jun 21, 2000, 8:06 PM

Post #26 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

i just said you import manually, and i don't know what you mean by the above because above is Thanks Bmxer.

Domenic, i really don't know what to say, it uses LWP, i may eventually make it all Sockets some time, but not now, b/c i am on another mod, translation mod. Maybe i'll do something in the next release for goFetch, which'll be like in a month or 2

Jun 21, 2000, 9:56 PM

Domenic

User (370 posts)

Jun 21, 2000, 9:56 PM

Post #27 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

Oh...hey that's ok Bmxer.......don't do it on my account! :) It looks like a cool program though-----BTW- great job on it, I'm sure it will get tons of use.

Jun 21, 2000, 10:26 PM

Bmxer

Veteran (1311 posts)

Jun 21, 2000, 10:26 PM

Post #28 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

i was gonna do it eventually anyway, theres no need to require Sockets and LWP, so i might end up doing it, might not

Jun 23, 2000, 8:43 AM

Bmxer

Veteran (1311 posts)

Jun 23, 2000, 8:43 AM

Post #29 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

ok, as it seems, i will probably get a database manager, freeware, and make it do the things for goFetch admin, like importing masslinks, deleting, searching.

Jun 23, 2000, 8:54 AM

Stealth

Veteran (17240 posts)

Jun 23, 2000, 8:54 AM

Post #30 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

DBMAN is an excellent Database Manager program (does everything you are attempting to do except for import, but can be added on).

Regards,

Eliot Lee

Jun 23, 2000, 8:57 AM

Bmxer

Veteran (1311 posts)

Jun 23, 2000, 8:57 AM

Post #31 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

yea i went through many managers, and saw dbman but was swayed away from it because it looks like it is made for multiple users, and won't just read a db, for one user, and do all that stuff, i mean, maybe being able to login and have multiple admins that do the same thing, but i don't
need users. Correct me if i'm wrong please.
-------------------------------------------
Nevermind, it looks good, i'll probably take the modify out,and make the add feature coexist with spidering dmoz categories, u just specify a category. And delete is delete, i like the view all because i'm sure it'll do it in sets of something and show other records on different pages, which is what i wanted to accomplish with readdb.pl but won't have to with this. thanks AnthroRules for getting me to finally look at dbman

Jun 23, 2000, 9:14 AM

Stealth

Veteran (17240 posts)

Jun 23, 2000, 9:14 AM

Post #32 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

Regarding the multiple users...that is incorrect...It can be set up for admin only access OR multiple user access.

The only thing you will have to add is the import feature (but as you can probably see in the DBMAN Forums and in the Resource Center...there are many Modifications to the script), which should not be too hard since there are already export Mods (for Excel and Postscript files). And there are a few Mods for IMPORTING password files from other programs into DBMAN.

Wink

In Reply To:

thanks AnthroRules for getting me to finally look at dbman

You're welcome!

I use DBMAN in various sections of my LINKS site, as you probably have noticed. In addition, I use DBMAN for other projects in my site (like the Career Connection, Alumni Database, etc.).

Regards,

Eliot Lee

Jun 24, 2000, 10:23 PM

socrates

Enthusiast (832 posts)

Jun 24, 2000, 10:23 PM

Post #33 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

I have several static html pages on my site in a section, which I had created before I discovered links.

The question is, using this spider is it possible to spider thses pages and bring those links in to the db to build a links 2.0 directory?

Thanks

Jun 24, 2000, 10:48 PM

Stealth

Veteran (17240 posts)

Jun 24, 2000, 10:48 PM

Post #34 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

In theory, it should...all you have to do is type in the URL where these "static" pages are located. Of course, you will have to delete all the extraneous links that you do not want to have in your new links.db file (like local links to other pages in your site).

Regards,

Eliot Lee

Jun 24, 2000, 11:01 PM

socrates

Enthusiast (832 posts)

Jun 24, 2000, 11:01 PM

Post #35 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

Oh, great - thanks Eliot and Bmxer. I will try and see.

Jun 25, 2000, 6:17 AM

Bmxer

Veteran (1311 posts)

Jun 25, 2000, 6:17 AM

Post #36 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

you know what, i don't think this spider blocks .cgi, .pl urls, i'll have to do that and release soon

Jun 25, 2000, 9:43 AM

Stealth

Veteran (17240 posts)

Jun 25, 2000, 9:43 AM

Post #37 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

I don't know if that is necessary a good idea to block dymanic URLS (php, asp, cgi, pl, jsp, jhtml). As web sites move more and more in the direction towards dynamic structures, it would be a shame to block those types of sites from the "spidering" process.

Regards,

Eliot Lee

Jun 25, 2000, 6:14 PM

Catflap01

User (70 posts)

Jun 25, 2000, 6:14 PM

Post #38 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

Hi BMXer...

I've sent you a couple of emails already about what you plan to do with this MOD... but can I suggest that you think about the following idea for enhancing it slightly...

- Script matches the spidered text against categories/keywords in the Links category.db and selects the most relevant one for that page
- Script looks for an email address (a href="mailto:...") on the page that is likely to be most relevant (matches against a set of norms eg: 'admin', 'webmaster', etc...) and selects the top 2 most likely (if available) and when verifying the admin has to select one or input one instead.

That's just a couple of ideas I'd really like to see added to the script if possible. If you don't plan to do it yourself I'd appreciate a little insight as to where any Mod's need to target to get the above working...

Thanks and great work on creating this little gem
:)

Jul 11, 2000, 9:30 AM

Bmxer

Veteran (1311 posts)

Jul 11, 2000, 9:30 AM

Post #39 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

#1 will really slow the script down, the thing i'm thinking about is integrating it into dmoz categories, which will basically be like the rdf importing, except its spidering and won't get links from dmoz sites, only external links, and it'll have a warning maybe saying there's only so and so links in the directory, try deeper.

2, is redundant i think, when you can just have a form with an email field in it. I see that many are dumbasses and have to put none@none.com, so maybe i'll have two email addresses show up in the admin, the first one, the one they submitted and if that looks relevant i'll take that one, then i'll parse out email addresses on the spidered page, and take the one that has that domain ending on it, then that one will show up under it. Then the admin can click a checkbox to choose which one he wants.

Jul 11, 2000, 6:42 PM

Catflap

Novice (39 posts)

Jul 11, 2000, 6:42 PM

Post #40 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

Thanks for the reply..

I'd still be interested in anything that pulls a lot of info out of a page and highlights it if nothing else.. The email address issue is one that tends to drive me up the wall as my site is not really likely to garner tons of attention and people clamouring to register until I've found and added something like 2,000 links %-/

Please keep me on the list for any updates, idea bouncing, etc.. it's something I'd be more than interested to keep up to date on and help (where I can)..

Cheers again..
:)
Martin

Jul 11, 2000, 9:28 PM

Bmxer

Veteran (1311 posts)

Jul 11, 2000, 9:28 PM

Post #41 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

Is there anyone else besides Catflap who would want to know and be updated maybe bidaily about what i'm doing with this?

Jul 11, 2000, 9:34 PM

haplo

User (204 posts)

Jul 11, 2000, 9:34 PM

Post #42 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

Yeah, I would enjoy updates. I was also wondering if there was a way to make the spider transparent to users. Have you ever been to google or hotbot. They have you submit your information and forward to another page, where you never see any of the information being processed. I've looked at your code and see where all the printing comes in but am afraid to mess with it much. Anyway, just a thought.
Thanks,
Paul

http://www.fullmoonshining.com for Pearl Jam Fans

Jul 11, 2000, 9:54 PM

Bmxer

Veteran (1311 posts)

Jul 11, 2000, 9:54 PM

Post #43 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

i can't add you to the list without an email address, so you can private message me if you want and give it to me.
That would be maybe another release, because it wouldn't be hard, but just changing the html, which would indeed speed up the spider. Because my tests show that when the spider is on link 60 when the page is loading, the last link will be 42. Anyway, that'll be a private release for select users.

Jul 11, 2000, 11:39 PM

Thomas.

Veteran (1220 posts)

Jul 11, 2000, 11:39 PM

Post #44 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

Oh, I definitely want to be set on that list as well pls...

Jul 17, 2000, 6:31 AM

Ugur

Novice (41 posts)

Jul 17, 2000, 6:31 AM

Post #45 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

I have installed this great mod (and thanks Bmxer for the great mod) and I have a couple of questions.

I'm using hypermart as my web host, and the mod seems to be working ok, but when I check hypermart cgi-error logs, I have the below error messages repeated tens of times.

Code:

[Mon Jul 17 05:58:42 2000] goFetch.cgi: Use of uninitialized value at goFetch.cgi line 642.
[Mon Jul 17 05:58:42 2000] goFetch.cgi: Use of uninitialized value at goFetch.cgi line 674.
[Mon Jul 17 05:58:42 2000] goFetch.cgi: Use of uninitialized value at goFetch.cgi line 675.
[Mon Jul 17 05:58:42 2000] goFetch.cgi: Use of uninitialized value at goFetch.cgi line 676.
[Mon Jul 17 05:58:42 2000] goFetch.cgi: Use of uninitialized value at goFetch.cgi line 677.
[Mon Jul 17 05:58:57 2000] readdb.pl: Name "main::ad" used only once: possible typo at readdb.pl line 18.
[Mon Jul 17 05:58:57 2000] readdb.pl: Can't exec "/data1/hypermart.net/lookhard/look-bin/look-ads/click.cgi": Permission denied at readdb.pl line 89.

here are the lines mentioned above;

Code:

642 $mydescrip = substr($mydescrip,0,225);
674 $mykeywords =~ s/
//isg;
675 $mykeywords =~ s/<p>//isg;
676 $mykeywords =~ s,\n,,g;
677 $mykeywords = substr($mykeywords,0,145);

and lines from readdb.pl

Code:

18 $ad = '';
89 $banner = `/data1/hypermart.net/lookhard/look-bin/look-ads/click.cgi`;

would we great if someone can tell me what these all about.

Also I have tested this mod on a few sites, I mean sites that have even links program installed, I was expecting for it to fetch for all the links on these pages, but instead it only brought back the category pages. Am I doing something wrong, or does this mod suppose to do this.

Jul 17, 2000, 8:11 AM

Ugur

Novice (41 posts)

Jul 17, 2000, 8:11 AM

Post #46 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

I have just figured out why wasnt I getting the links that are registered on other links sites. If the links has

Code:

http://www.anysite.com/cgi-bin/jump.cgi?ID=639

meaning it doesnt have a direct links but some sort of jump cgi or asp, then it doesnt go and fetch the info on this page, which I really would love it to do. could someone help me out with this, Bmxer if you reading this, would it be too much trouble to add this feature to this great mod of yours.

thanks.

Jul 17, 2000, 10:22 AM

Floristan

User (75 posts)

Jul 17, 2000, 10:22 AM

Post #47 of 66

Shortcut

Re: LWP In reply to

Domenic,

Sorry for taking so long to answer your question. Yes, my hosting company did install LWP at my request. Now goFetch is running beautifully. Bmxer's script calls several other modules besides LWP::useragent, therefore it is best to have the entire LWP library installed. libwww-perl-5.48 is the name of the library and it is available on CPAN.

My hosting company, by the way, is Intersessions. You can find them on the web at http://www.intersessions.com. They are usually very responsive to all of my requests. This LWP request of mine was the first that took longer than usual.

Good luck in your quest for LWP!

Floristan

Jul 17, 2000, 8:45 PM

Bmxer

Veteran (1311 posts)

Jul 17, 2000, 8:45 PM

Post #48 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

i don't see why you would think it would fetch the jump.cgi link since the part that gets subsites is in lwp and thats alot harder to do in lwp than sockets.

also, the reason all those errors are showing up is because of the -w on the shebang line, just delete that and they'll go away. The shebang line is the top line, ie. #usr/bin/perl -w

Jul 19, 2000, 3:48 AM

tigerman

New User (1 post)

Jul 19, 2000, 3:48 AM

Post #49 of 66

Shortcut

Re: goFetch LinkSpider released! In reply to

Hi,
first of all, great mod of links 2.0
A nice addon would be a connector of
spider.db to validate.db.

Jul 19, 2000, 7:37 AM

Bmxer

Veteran (1311 posts)