Gossamer Forum
Home : Products : Gossamer Links : Discussions :

over zealous spidering

Quote Reply
over zealous spidering
I have a dedicated server set up with much of the dmoz imported, its running in dynamic but with a mod rewrite to make the links seem static, the spiders are eating it up nicely, with one problem, the spiders are accessing all the other links too, all the cgi being run was putting a load on my old server which was bringing it down, yes, spidering alone was overloading it, who would think that was a thing to complain about :)

Upgrade to P4 2.8 1mb ht cache and dual 160 sata raid drives seems to have given it enough power to cope, nevertheless I would like to minimize the demand on resources.

I have put in the robots.txt to ignore the cgi-bin and all the cgi's in question such as user.cgi and review.cgi but they continue to be spidered!

What I would like is for the directory pages and the links to be spidered but nothing else, anyone know a way?

I read google recently has a facility to look for a nofollow tag in the link but other than that, I've found nothing to give me a clue what to try next.

Anyone able to suggest something?
Quote Reply
Re: [roman365] over zealous spidering In reply to
For help with robot exclusion, try this:

http://www.robotstxt.org/wc/exclusion.html

Make sure your robots.txt file is in the proper place (the top-level of your server's document space) and is in the proper format:

http://www.robotstxt.org/wc/exclusion-admin.html

Besides google's nofollow tag, there is a robots meta tag:

http://www.robotstxt.org/wc/exclusion.html#meta

Since your objective is to minimize resource use by cgi's, why not think about installing mod_perl or Fast CGI?

http://perl.apache.org/

http://www.fastcgi.com/

good luck!
Quote Reply
Re: [iam] over zealous spidering In reply to
Thanks,

Familiar with the guidelines, my robots.txt complies ok but its being ignored, I've just moved server and I'm not totally familiar with its setup, I'll look into mod_perl and fast_cgi.
Quote Reply
Re: [roman365] over zealous spidering In reply to
There's also Speedy CGI (persistent perl):

http://www.daemoninc.com/SpeedyCGI/

I'm curious to know how you finally solve your problem. Please post an update when you can.
Quote Reply
Re: [iam] over zealous spidering In reply to
Try FCGI
Quote Reply
Re: [iam] over zealous spidering In reply to
Or check mod_dosevasive

Thanks
HyTC
==================================
Mail Me If Contacting Privately Is That Necessary.
==================================
Quote Reply
Re: [HyperTherm] over zealous spidering In reply to
Ended up using mod perl

I have a whm/cpanel server and in the list of installable rpm's it listed mod perl so went with that.


Thanks for the advice folks
Quote Reply
Re: [roman365] over zealous spidering In reply to
If it worked out of the box then cpanel must have improved a bit. It never worked a year back... we had to hand compile before finally swicthing to mod_perl proxied httpd

Thanks
HyTC
==================================
Mail Me If Contacting Privately Is That Necessary.
==================================