Gossamer Forum
Home : General : Internet Technologies :

A tough robots.txt question

Quote Reply
A tough robots.txt question
We have a test server which maps to url http://abc.sitename.com. We move all our files up there for testing, and once we are done, we mirror the content from the test server to our live server http://www.sitename.com.

My problem is that I need the test server to be visible to the outside world, but I cant allow any spiders to index it. Yet on my main site, I want spiders to be able to index it. The problem comes up due to the mirroring process. Every file on the test server mirrors to the live server. So if I just used a simple robots.txt file and said:

User-agent: *
Disallow: /

that would end up on the live server upon mirroring and disallow spiders out there as well. Is there any way to specify an exact url in a robots.txt file like:

User-agent: *
Disallow: http://abc.sitename.com/

I'm hoping there's a way to disallow everything on the test server, yet allow everything on the live server.
Quote Reply
Re: [ngoodman] A tough robots.txt question In reply to
I'm not absolutely positive, but I'm pretty confident you can't use full URLs in robots.txt files. You might consider a somewhat more aggressive approach and using mod_rewrite to ban specific robots by user agent at the testing domain.

Here's a thread (on another forum) that talks about doing something similar for spambots, but you could just as easily apply the principle to other bots as well:

http://www.webmasterworld.com/forum13/687-1-15.htm

Hope that helps.

Fractured Atlas :: Liberate the Artist
Services: Healthcare, Fiscal Sponsorship, Marketing, Education, The Emerging Artists Fund