Gossamer Forum
Quote Reply
TLD's and spider
I have a client that is about to be very angry with me if I cannot get this to work:

I am trying to modify the spider to where it will stay on and only index .edu domains. The problem I am having is when queu http://www.searchedu.com, I get the casino banners which are indexed, which lead to other sites I don't need. Is there any way to modify the spider where it only stays on .edu domains?

</not a clue>
Quote Reply
Re: [Kilroy] TLD's and spider In reply to
I dont know the source of it, but it shouldn't be hard to add some regex to see if the domain ends in a .edu TLD. Something like;

if $var !~ /^[a-z]\.[a-z]\.edu/i;
next;

Not definate about the format though, as it is only a rough guide Tongue

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [AndyNewby] TLD's and spider In reply to
Its not that easy unfortunately I expect. Also that regex wouldn't work. What if the domain was:

aa1.edu ?
Quote Reply
Re: [RedRum] TLD's and spider In reply to
Just add [a-z0-9] then Wink

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [AndyNewby] TLD's and spider In reply to
and if it is aa-1.edu ?
Quote Reply
Re: [RedRum] TLD's and spider In reply to
Ah well, like you said, that method wouldn't work anyway ;)

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [AndyNewby] TLD's and spider In reply to
That's similiar to what I tried before:

if $var !~ /^[a-z]\.[a-z]\*.edu/;
then next;

Didn't work either...
Still working on it..

</not a clue>
Quote Reply
Re: [Kilroy] TLD's and spider In reply to
Where is the "if $var" code you are referring to?
Quote Reply
Re: [RedRum] TLD's and spider In reply to
The whole idea I had was to modify the rules form to be able to add a variable/field to it so when I input ".edu" into it, it would only search for those particular tld's. So far, I have totally screwed up the script, and will have to reload it...

Still working...

</not a clue>
Quote Reply
Re: [Kilroy] TLD's and spider In reply to
Hi Kilroy,

Can you email me directly? aki@gossamer-threads.com.

If not, the easiest thing you can do is to open up the "Robot.pm" module in the admin/GT/ directory. There, in sub check_urls, you can put a regex test to see if a domain is allowed, or not.

I'm not always on the forum and the easiest way to reach me is through my email. I'm very sorry I didn't catch this earlier.

Aki