
andrew at zope
Mar 10, 2005, 6:27 AM
Post #7 of 10
(2268 views)
Permalink
|
I need to read up on the robots.txt spec. Excellent Mark, thanks. Andrew -- Zope Managed Hosting Software Engineer Zope Corporation (540) 361-1700 > -----Original Message----- > From: zope-web-bounces [at] zope [mailto:zope-web-bounces [at] zope] On > Behalf Of Mark Pratt > Sent: Thursday, March 10, 2005 6:16 AM > To: Jens Vagelpohl > Cc: zope-web [at] zope > Subject: Re: [ZWeb] Zope.org currently unusable > > Hi, > > I recommend adding crawl delays for all but google to something like: > > User-agent: Slurp > Crawl-delay: 120 > > This is for the yahoo bot but should also be applied to msnbot. > > It's crazy how some of these bots love to hit your site at the same > time. A 120 second delay should be more than enough time between > hits even if they all come at the same time. > > Cheers, > > Mark > > > On Mar 10, 2005, at 10:33 AM, Jens Vagelpohl wrote: > > > > > On Mar 10, 2005, at 2:18, Andrew Sawyers wrote: > > > >> It's a little of both; there's a group of people working on this - we > >> hope > >> to have something real soon now :) as a fix. Jens, could do you have > >> the > >> time to check the zope.org robots.txt? A lot of the problems I've > >> seen > >> recently were due to several robots spidering zope.org at a time. I'm > >> working on additional hardware and we should see more traction on the > >> project sooner then later. > > > > I don't believe all that much in robots.txt. The nasty bots completely > > ignore it, anyway. The only way to deal with them is to block them > > with e.g. iptables. > > > > What's currently there looks odd: > > > > """ > > User-agent: wget > > Disallow: / > > > > User-agent: Wget > > Disallow: / > > > > # Ask Google to skip search queries and the like. > > User-agent: Googlebot > > Disallow: /*? > > """ > > > > Looking at the spec the case sensitivity of the User-agent value is > > only "recommended", but you could shorten that into the following, > > because multiple User-agent values are allowed per rule set: > > > > """ > > User-agent: wget > > User-agent: Wget > > Disallow: / > > """ > > > > Otherwise there really isn't much in there, and from seeing googlebots > > myself often enough I have my doubts whether the line "Disallow: /*?" > > works at all. > > > > jens > > > > _______________________________________________ > > Zope-web maillist - Zope-web [at] zope > > http://mail.zope.org/mailman/listinfo/zope-web > > > > > > _______________________________________________ > Zope-web maillist - Zope-web [at] zope > http://mail.zope.org/mailman/listinfo/zope-web _______________________________________________ Zope-web maillist - Zope-web [at] zope http://mail.zope.org/mailman/listinfo/zope-web
|