Gossamer Forum
Home : General : Internet Technologies :

robots.txt and cgi-bin

Quote Reply
robots.txt and cgi-bin
Hi there.

When setting up a robots.txt file what are the advantages and disadvantages of disallowing a cgi-bin?

It seems like all the sample robots.txt files I've found disallow the cgi-bin directory, but a lot of my content is dynamically generated, and it would be nice if the search spiders could index it. Is it that the robots aren't able to read the cgi generated content properly? Do they try to read the .cgi or .pl files themselves? Does it tax the server too much? I'm just not sure I underestand enough about how these robots work to know what I should be expressly disallowing.

Thanks in advance for any advice.

Best,
Adam

Fractured Atlas :: Liberate the Artist
Services: Healthcare, Fiscal Sponsorship, Marketing, Education, The Emerging Artists Fund
Quote Reply
Re: [adamuforestasan] robots.txt and cgi-bin In reply to
The main reason is to stop spiders overloading your server with requests such as some spiders do with this forum hence:

http://www.gossamer-threads.com/robots.txt
Quote Reply
Re: [Paul] robots.txt and cgi-bin In reply to
Are Teleport and TeleportPro especially nasty ones or something?

And the GT robots.txt file disallows the whole perl directory, which would seem to be their equivalent of a cgi-bin, right? Or is that just for this forum?

Fractured Atlas :: Liberate the Artist
Services: Healthcare, Fiscal Sponsorship, Marketing, Education, The Emerging Artists Fund

Last edited by:

adamuforestasan: Jun 20, 2002, 12:54 PM
Quote Reply
Re: [adamuforestasan] robots.txt and cgi-bin In reply to
>>Are Teleport and TeleportPro especially nasty ones or something? <<

Yeah they are the ones that tend to cause the who's online page to say things like "800 visitors in the last 15 minutes"

The /perl/ directory is where GT put their mod_perl scripts so essentially yes it is the same as your cgi-bin.
Quote Reply
Re: [Paul] robots.txt and cgi-bin In reply to
What does this line mean?

User-agent: *
Disallow: /perl/


Keep everyone (*) out of the perl directory? I guess that would make sense...


http://www.iuni.com/...tware/web/index.html
Links Plugins
Quote Reply
Re: [Ian] robots.txt and cgi-bin In reply to
You answered your own question :)
Quote Reply
Re: [Paul] robots.txt and cgi-bin In reply to
That happens now and then! sometimes BEFORE i press send too.Blush


http://www.iuni.com/...tware/web/index.html
Links Plugins
Quote Reply
Re: [adamuforestasan] robots.txt and cgi-bin In reply to
Teleport is known to abuse the robots 'standard' which states that a robot should not request more than one page from your server within a 5 second interval. Teleport just uses a parallel system of retrieving many times that number and thus bringing some badly configured servers to their knees.

- wil
Quote Reply
Re: [hennagaijin] robots.txt and cgi-bin In reply to
I saw a good software today to create the robots.txt file, it's called RobotPack, www.soho-it.com/robotpack/

it's a freeware, so...



DarkShadow
Quote Reply
Re: [darkshadow] robots.txt and cgi-bin In reply to
Hmm why download a 6MB file when it takes a couple of seconds to make one manually :)
Quote Reply
Re: [Paul] robots.txt and cgi-bin In reply to
because it allowed me to add easily robots to my robots.txt that I dont want to crawl my website like the sitecheck.internetseer.com without need to know the agent name of all the robots... you just select the robot and the directory and that's it. and it's free.. so why bother.. I saw one a 179.00$US .. haha incredible.
Quote Reply
Re: [darkshadow] robots.txt and cgi-bin In reply to
For reference..

http://www.robotstxt.org/...tive/html/index.html
Quote Reply
Re: [Paul] robots.txt and cgi-bin In reply to
Yes it's interesting, but that list has not been updated since 1996, except for a few one.. and all the "new robots" are not in the list.

DS
Quote Reply
Re: [darkshadow] robots.txt and cgi-bin In reply to
I didn't see 1996?

I can't currently find a better list.
Quote Reply
Re: [Paul] robots.txt and cgi-bin In reply to
Move around the website, and you will found no information after 1996... like

http://www.robotstxt.org/wc/articles.html

with little exception
Quote Reply
Re: [darkshadow] robots.txt and cgi-bin In reply to
Well I don't think the explanation of robots and the syntax of robots.txt files has changed much over the years so there has probably been no need to update those pages.