Gossamer Forum
Home : Products : Gossamer Forum : Discussion :

Web hosting account deactivated - Baiduspider - SPAM

Quote Reply
Web hosting account deactivated - Baiduspider - SPAM
This morning I received a very unpleasant mail with this content:

Quote:
Web hosting account deactivated for "mywebsite.org"

Your web hosting account for mywebsite.org has been deactivated, as of 03/11/2016. (reason: site causing performance problems)


etc. etc. etc. ...

What???

After I contacted the hosting company and after a shorter "investigation", it became clear that it was a monster named "Baiduspider" that caused all the problems ..

Here are a few lines from the acces-log:

"GET /forum/gforum.cgi?username=bozo202;guest=2662521&t=search_engine HTTP/1.0"
"GET /forum/gforum.cgi?username=ANICO;guest=1737188&t=search_engine HTTP/1.0"
"GET /forum/gforum.cgi?do=user_list;sb=user_username;so=ASC;first=D;guest=3208431&t=search_engine HTTP/1.0"
"GET /forum/gforum.cgi?do=search;guest=2184932&t=search_engine HTTP/1.0"

etc. etc. etc. ...

Please note that "guest=*******" is always present in every single line ....

Now, ..

The bottom line is that I (read "we") need to stop: "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

How?

Many people have the same problem as me but it seems that "Baiduspider" is unstoppable!

Any idea how to solve this efficiently? Unfortunately, it seems that in this case both robots.txt and .htaccess are not very helpful ..

Thanks in advance :)
Quote Reply
Re: [katakombe] Web hosting account deactivated - Baiduspider - SPAM In reply to
Hi,

Yeah, they do have a habit of doing that :( They also seem to ignore the crawl rate/robots.txt stuff!

Your best bet, is to send them away! Something like:

Code:
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [NC]
RewriteRule .* http://google.com [L]

Hopefully that'll then direct the traffic away from your site, and stop the issues. Hope that helps

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Web hosting account deactivated - Baiduspider - SPAM In reply to
Hi Andy and thanks!

I already have these lines:

Quote:
##############################
##########S P A M
##############################
order allow,deny
deny from env=bad_bot
SetEnvIfNoCase User-Agent "^libwww" bad_bot
SetEnvIfNoCase User-Agent "^libwww-perl" bad_bot
SetEnvIfNoCase user-agent "Indy Library" bad_bot
SetEnvIfNoCase user-agent "noxtrumbot" bad_bot
#####12-3-2016
SetEnvIfNoCase User-Agent "^Baiduspider" bad_bot
deny from *.baidu.com
Deny from 180.76.
Deny from 119.63.
Deny from 123.125.
Deny from 220.181.
#####12-3-2016
allow from all

but it seems that it does not help ..Mad

I've added your solution, and we'll see Whistle

Many thanks again Wink
Quote Reply
Re: [Andy] Web hosting account deactivated - Baiduspider - SPAM In reply to
Unfortunately - as I said before, they are unstoppable ://

They are really a bunch of idiots .. The attacks do not stop ..

I am forced to block access to the entire forum with .htpasswd - otherwise, the account will be suspended again ..

Any help is welcome, thanks ..
Quote Reply
Re: [katakombe] Web hosting account deactivated - Baiduspider - SPAM In reply to
Have you put it right at the top of your .htaccess file? What are the log files saying now? That should have worked

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Web hosting account deactivated - Baiduspider - SPAM In reply to
Hi Andy!

Maybe you're right - that part was not at the top of .htaccess

Now I changed it and here's how it looks:

Code:
##############################
<Files .htaccess>
order allow,deny
deny from all
</Files>
##############################
##########S P A M
##############################
order allow,deny
#####13-3-2016
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [NC]
RewriteRule .* http://google.com [L]
#####13-3-2016
deny from env=bad_bot
SetEnvIfNoCase User-Agent "^libwww" bad_bot
SetEnvIfNoCase User-Agent "^libwww-perl" bad_bot
SetEnvIfNoCase user-agent "Indy Library" bad_bot
SetEnvIfNoCase user-agent "noxtrumbot" bad_bot
#####12-3-2016
SetEnvIfNoCase User-Agent "^Baiduspider" bad_bot
deny from *.baidu.com
Deny from 180.76.
Deny from 119.63.
Deny from 123.125.
Deny from 220.181.
#####12-3-2016
allow from all
##############################

All together resulted with this mail which I received this morning:

Quote:
Googlebot found an increase in authorization permission errors on ...

Very nice Mad

Thanks Andy .. Smile
Quote Reply
Re: [katakombe] Web hosting account deactivated - Baiduspider - SPAM In reply to
Hi,

So at least we know it works now :) One small tweak I'd suggest, is maybe:

Code:
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [NC]
RewriteRule .* http://google.com [F]

I'm a bit concerned at the fact Google is complaining about auth errors (it shouldn't be)

Also, in robots.txt you could set a crawl-delay:

Code:
User-agent: *
Crawl-delay: 10

This should limit each search engine request to 1 request every 10 seconds (assuming the robot obeys they rules!)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Web hosting account deactivated - Baiduspider - SPAM In reply to
Hi Andy!

I have to wait a few hours to see if everything is right, I did this a few minutes ago ..

.htaccess now looks like:

Code:
##############################
<Files .htaccess>
order allow,deny
deny from all
</Files>
##############################
##########S P A M
##############################
order allow,deny
#####13-3-2016
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [NC]
RewriteRule .* http://google.com [F]
#####13-3-2016
deny from env=bad_bot
SetEnvIfNoCase User-Agent "^libwww" bad_bot
SetEnvIfNoCase User-Agent "^libwww-perl" bad_bot
SetEnvIfNoCase user-agent "Indy Library" bad_bot
SetEnvIfNoCase user-agent "noxtrumbot" bad_bot
#####12-3-2016
SetEnvIfNoCase User-Agent "^Baiduspider" bad_bot
deny from *.baidu.com
Deny from 180.76.
Deny from 119.63.
Deny from 123.125.
Deny from 220.181.
#####12-3-2016
allow from all
##############################

and robots,txt like:

Code:
##############################
User-agent: *
Crawl-delay: 10
Disallow:
#####12-3-2016
User-agent: Baiduspider
User-agent: Baiduspider-image
User-agent: Baiduspider-video
Disallow: /
##############################

BTW. Google complain because I still prevents access to the forum with .htpasswd

I must be sure that .htaccess and robots.txt works 100% before I remove .htpasswd otherwise I risk suspension again ..

One other thing - I see that Googlebot also reads pages that have part "guest=1131854"

Can it be prevented somehow?
Quote Reply
Re: [katakombe] Web hosting account deactivated - Baiduspider - SPAM In reply to
Hi,

Ok sounds good. What you could do, is log in via SSH and then "tail" the access_log, to see if you can see any references to Baiduspider:

Code:
cd /path/to/logs
tail -n20 -f access_log | grep "Baidu'

That would be a more proactive way to see if they are still getting through :)

Quote:
One other thing - I see that Googlebot also reads pages that have part "guest=1131854"

Can it be prevented somehow?

I'm not sure how accurate this article is (as it was from 2006), but you should certainly be able to exclude the "guest" parameter from being read by google:

http://cutroni.com/...ry-string-variables/

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Web hosting account deactivated - Baiduspider - SPAM In reply to
As I said previously They are really a bunch of idiots Mad

Quote:
180.76.15.158 - - [14/Mar/2016:07:18:51 -0600] "GET /forum/gforum.cgi?do=login;guest=2357333&t=search_engine HTTP/1.0" 401 398 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.15.144 - - [14/Mar/2016:07:18:53 -0600] "GET /forum/gforum.cgi?do=search;guest=2074911&t=search_engine HTTP/1.0" 401 398 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.15.31 - - [14/Mar/2016:07:18:54 -0600] "GET /forum/gforum.cgi?do=search;guest=2067612&t=search_engine HTTP/1.0" 401 398 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
180.76.15.151 - - [14/Mar/2016:07:18:54 -0600] "GET /forum/gforum.cgi?do=search;guest=1500589&t=search_engine HTTP/1.0" 401 398 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

etc. etc. ...

BTW, you were right for Google Andy, I found a solution, thanks Smile
Quote Reply
Re: [katakombe] Web hosting account deactivated - Baiduspider - SPAM In reply to
Hi,

Try editing:

Code:
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [NC]

to:

Code:
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]

All we need it to do, is match the domain (so no need to use the ^ or .* at the end)

You are not alone with them though - lost of people are frustrated with them:

https://www.saotn.org/...ider-bot-user-agent/

Quote:
BTW, you were right for Google Andy, I found a solution, thanks Smile

Cool :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Web hosting account deactivated - Baiduspider - SPAM In reply to
Andy wrote:
Hi,

Try editing:

Code:
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider.* [NC]

to:

Code:
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC]

All we need it to do, is match the domain (so no need to use the ^ or .* at the end)

You are not alone with them though - lost of people are frustrated with them:

https://www.saotn.org/...ider-bot-user-agent/

This part I changed the way you said, I can only wait and hope Whistle

Anyway, thank you very much for your help Smile
Quote Reply
Re: [Andy] Web hosting account deactivated - Baiduspider - SPAM In reply to
YIPIII ... WELL DONE Andy!!!!!!!!!!

Ha ha ha ha ha .. Finally Laugh

That was the right solution! Great!

I'll wait 24 hours to be sure that everything is OK - after that I will delete .htpasswd Whistle

Once again Andy, you are superstar, thank you Wink

Last edited by:

katakombe: Mar 14, 2016, 10:03 AM
Quote Reply
Re: [katakombe] Web hosting account deactivated - Baiduspider - SPAM In reply to
WAHOO!!!! Angelic Glad I could help!

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!