Gossamer Forum
Home : General : Internet Technologies :

Rewrite rules for protection

Quote Reply
Rewrite rules for protection
Hi,

Does anyone have an opinion about something like this:

Code:
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^BlackWidow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Bot\ mailto:craftbot@yahoo.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^ChinaClaw [OR]
RewriteCond %{HTTP_USER_AGENT} ^DISCo [OR]
RewriteCond %{HTTP_USER_AGENT} ^Download\ Demon [OR]
RewriteCond %{HTTP_USER_AGENT} ^eCatch [OR]
RewriteCond %{HTTP_USER_AGENT} ^EirGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^Express\ WebPictures [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
RewriteCond %{HTTP_USER_AGENT} ^EyeNetIE [OR]
RewriteCond %{HTTP_USER_AGENT} ^FlashGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^GetRight [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go!Zilla [OR]
RewriteCond %{HTTP_USER_AGENT} ^Go-Ahead-Got-It [OR]
RewriteCond %{HTTP_USER_AGENT} ^GrabNet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Grafula [OR]
RewriteCond %{HTTP_USER_AGENT} ^HMView [OR]
RewriteCond %{HTTP_USER_AGENT} ^HTTrack [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Stripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^Image\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^InterGET [OR]
RewriteCond %{HTTP_USER_AGENT} ^Internet\ Ninja [OR]
RewriteCond %{HTTP_USER_AGENT} ^JetCar [OR]
RewriteCond %{HTTP_USER_AGENT} ^JOC\ Web\ Spider [OR]
RewriteCond %{HTTP_USER_AGENT} ^larbin [OR]
RewriteCond %{HTTP_USER_AGENT} ^LeechFTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mass\ Downloader [OR]
RewriteCond %{HTTP_USER_AGENT} ^MIDown\ tool [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mister\ PiX [OR]
RewriteCond %{HTTP_USER_AGENT} ^Navroad [OR]
RewriteCond %{HTTP_USER_AGENT} ^NearSite [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetAnts [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Net\ Vampire [OR]
RewriteCond %{HTTP_USER_AGENT} ^NetZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Octopus [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Explorer [OR]
RewriteCond %{HTTP_USER_AGENT} ^Offline\ Navigator [OR]
RewriteCond %{HTTP_USER_AGENT} ^PageGrabber [OR]
RewriteCond %{HTTP_USER_AGENT} ^Papa\ Foto [OR]
RewriteCond %{HTTP_USER_AGENT} ^pcBrowser [OR]
RewriteCond %{HTTP_USER_AGENT} ^RealDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^ReGet [OR]
RewriteCond %{HTTP_USER_AGENT} ^Siphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^SiteSnagger [OR]
RewriteCond %{HTTP_USER_AGENT} ^SmartDownload [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperBot [OR]
RewriteCond %{HTTP_USER_AGENT} ^SuperHTTP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Surfbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^tAkeOut [OR]
RewriteCond %{HTTP_USER_AGENT} ^Teleport\ Pro [OR]
RewriteCond %{HTTP_USER_AGENT} ^VoidEYE [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Image\ Collector [OR]
RewriteCond %{HTTP_USER_AGENT} ^Web\ Sucker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebAuto [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebCopier [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebFetch [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebReaper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebSauger [OR]
RewriteCond %{HTTP_USER_AGENT} ^Website\ eXtractor [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebStripper [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebWhacker [OR]
RewriteCond %{HTTP_USER_AGENT} ^WebZIP [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Widow [OR]
RewriteCond %{HTTP_USER_AGENT} ^Xaldon\ WebSpider [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus
RewriteRule ^.*$ http://www.site-where-you-want-to-send-the-bot [L,R]

Is it worth using? does it help? does it slow down your server?

Cheers
Klaus

http://www.ameinfo.com
Quote Reply
Re: [klauslovgreen] Rewrite rules for protection In reply to
Hi,

I've moved the thread here as rewrite rules aren't really perl programming related although some contain regexs.
Quote Reply
Re: [klauslovgreen] Rewrite rules for protection In reply to
mod_rewrite is CPU-intensive as it is, that sure looks like a *serious* amount of CPU overhead for a busy site.

Coudln't you have done an .htaccess deny from domain1,domain2 or maybe a robots.txt file or something?

- wil
Quote Reply
Re: [klauslovgreen] Rewrite rules for protection In reply to
I personally would just block bulk of those robots using a robots.txt file. Obedient robots will read the robots.txt file and keep away. If you find some disobedient ones then add a rewrite rule for them. I would also add rewrite rules for things like wget and webcopier as they are run by humans.
Quote Reply
Re: [Paul] Rewrite rules for protection In reply to
What is the correct format of the robots.txt - I have had some problems making it work in the past (don't want to stop Google for example)

cheers
Klaus

http://www.ameinfo.com
Quote Reply
Re: [klauslovgreen] Rewrite rules for protection In reply to
http://www.searchengineworld.com/.../robots_tutorial.htm
Quote Reply
Re: [klauslovgreen] Rewrite rules for protection In reply to
In Reply To:
Is it worth using? does it help? does it slow down your server?

Yes, it will slow down your server. Whether the server becomes noticeably slow depends on what else the serve has to do.

It may help to put all the conditions on one line, eg:
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} (EmailSiphon|EmailWolf|ExtractorPro| ... |Teleport|Telesoft|WebStripper)
RewriteRule ^.*$ http://www.site-where-you-want-to-send-the-bot [L,R]

... but your list of undesirables is very long. It might be better just to block the agents that are actually, rather than potentially, giving you problems. Most, perhaps all, of the known nuisances can present phoney user agent strings, so your scheme won't catch any of them.
Quote Reply
Re: [klauslovgreen] Rewrite rules for protection In reply to
Block them in robots.txt, not in mod_rewrite. If they are a bad robot, they will probably mask their name anyways, and you'll end up having to ban their ip.

Cheers,

Alex
--
Gossamer Threads Inc.
Quote Reply
Re: [Alex] Rewrite rules for protection In reply to
The bad robots also ignore robots.txt

To be more exact -- the robots are just computer programs; if the users of the robots are bad, it is almost impossible to catch them.
Quote Reply
Re: [Paul] Rewrite rules for protection In reply to
Thanks Paul - what's with the Rambo thingy? Smile

Klaus

http://www.ameinfo.com
Quote Reply
Re: [Alex] Rewrite rules for protection In reply to
Thanks guys - someone suggested I looked at cnn.com's robots.txt as they might know what they are doing but I guess it is impossible to guard 100% - so perhaps the best thing is just to keep an eye on things and catch the problems as they come

Thanks again

Klaus

http://www.ameinfo.com
Quote Reply
Re: [klauslovgreen] Rewrite rules for protection In reply to
>>
what's with the Rambo thingy?
<<

Its the face of G.W Bush =)
Quote Reply
Re: [Paul] Rewrite rules for protection In reply to
Didn't see that a first Sly

Good one!

Klaus

http://www.ameinfo.com