
jon at endpoint
Nov 16, 2009, 4:17 PM
Post #2 of 2
(652 views)
Permalink
|
On Sun, 15 Nov 2009, DB wrote: > Recently I saw Yahoo's shopping bot hitting my site pretty hard. Apache > access log has lines like this every 20 seconds or so: > > ...HTTP/1.0" 200 19391 "-" "YahooSeeker/1.2 (compatible; Mozilla 4.0; > MSIE 5.5; yahooseeker at yahoo-inc dot com ; > http://help.yahoo.com/help/us/shop/merchant/)" > > I saw no entry for this in my system's robots.cfg and I suspect (can't > prove) that this robot was obtaining a session which grew *very* large. > So I have two questions: > > What exactly should I add to my robots.cfg Are you sure that it was not being flagged as a robot? I'm pretty sure that the "Yahoo" entry in the default robots.cfg will catch "YahooSeeker" as well. Take a look at your interchange.structure file with debug enabled, and you can see the regex created for the RobotUA directive, and Yahoo isn't anchored so should match YahooSeeker too. (In this case that's good, but in other cases you may find a RobotUA setting matches too loosely, such as "Google" matching "GoogleToolbar" or similar.) > Is there a way to set a maximum size for sessions so that the next time > a robot that's not in my robots.cfg file comes along this problem won't > repeat? I don't know of a way to limit the size of a session proactively. Jon -- Jon Jensen End Point Corporation http://www.endpoint.com/ _______________________________________________ interchange-users mailing list interchange-users [at] icdevgroup http://www.icdevgroup.org/mailman/listinfo/interchange-users
|