Gossamer Forum
Home : General : Perl Programming :

LinksSQL/Logging/Googlebot

Quote Reply
LinksSQL/Logging/Googlebot
Hi,

This has more to do with googlebot than anything, but it is in relation to my LinksSQL installation - it's something I've been battling with for months now with no result so hopefully someone here can provide an answer. Apologies if it's slightly off topic, but with no answer from the people at googlebot I'm a little lost - hopefully someone here has already gone through it?

I want to exclude all of the googlebot ip's from my logging software that's looking after my LinksSQL site - the only thing is, I don't have a complete list of all their crawler ips. I have searched the net and tried the different lists I have found - but then the next day... another 300 googlebot visits.

Does anyone have a complete list, or know if there is an accurate one posted on any web site anywhere? AOL provide one in their webmasters section, but google don't give you anything as far as I can see. Even a message board that someone could recommend if nobody here knows the answer?

Any help would be appreciated.

Cheers,
Regan.

Quote Reply
Re: LinksSQL/Logging/Googlebot In reply to
Why don't you ban the GoogleBot User-Agent rather than it's IP's?

Installations:http://www.wiredon.net/gt/

Quote Reply
Re: LinksSQL/Logging/Googlebot In reply to
Hi

Not sure what you are trying to do or achieve. Have you looked into excluding googlebot using the robots directive?

- w

Quote Reply
Re: LinksSQL/Logging/Googlebot In reply to
Sounds like Regan's not trying to actually stop googlebot from crawling her Links pages, but just stop the crawl from being logged... Is it Links logging (I have no clue how Links works ;) or is it your http daemon logging? If it's your httpd, then what httpd are you using (apache, iis, etc)?

Rereading your post, it looks like you're using some other software to do the actual logging. What software are you using? I'd think a host ignore would work better? Like *.google.com?

Adrian
Quote Reply
Re: LinksSQL/Logging/Googlebot In reply to
Hi,

Thanks for the replies! :)

Adrian's got it right - I want to allow googlebot to crawl my pages as much as it can (the more the better!). But my stats software is logging it and I belive the only way I can block it from appearing is by entering it's ip addresses - and I can't find them all for googlebot. I'm using Urchin (www.urchin.com) to do my logging, and as far as I know I can only enter ip's in there, although I will ask about the GoogleBot User-Agent idea.

The host ignore call sounds exactly like what I'm after though! I'm using Apache as my web server. I will have a read up about it, but if you can offer some advice or exactly where I put it and in what format it would be appreciated!

Thanks all!

Regan. (he) :)

Quote Reply
Re: LinksSQL/Logging/Googlebot In reply to
Hi Adrian,

I've been looking through my Apache books, and found a directive called:

RefererIgnore

which looks like it does what you're talking about. So what I have done is enter the directive into my httpd.conf file:

RefererIgnore *googlebot.com

which will hopefully do the trick! :)

Does that sound right to you?

Cheers,
Regan.

----------------

Nope, that didn't work. It's giving me this error:

Invalid command 'RefererIgnore', perhaps mis-spelled or defined by a module not included in the server configuration

Is the host ignore directive something else?

Cheers,
Regan.

Quote Reply
Re: LinksSQL/Logging/Googlebot In reply to
Looks like the Urchin software just parses apache log files and generates the stats from that. I actually don't think it's possible to get apache to not log requests from a ip/host (someone correct me if I'm wrong). Looking at Urchin's documentation, (http://www.urchin.com/support/MANUAL_P3300.pdf, page 24), it looks like you should be able to either setup a filter that ignores entries from a host (%h field) or by the User-Agent field (%{User-Agent}) (http://httpd.apache.org/docs/mod/mod_log_config.html for what the log thingi's mean)... If that doesn't work, you could always contact Urchin for help :)

Adrian
Quote Reply
Re: LinksSQL/Logging/Googlebot In reply to
You might have just solved my problem. :)

I'm trying out the User-Agent filtering directive.

Thanks!

R.