Gossamer Forum
Quote Reply
TLD's and spider
I have a client that is about to be very angry with me if I cannot get this to work:

I am trying to modify the spider to where it will stay on and only index .edu domains. The problem I am having is when queu http://www.searchedu.com, I get the casino banners which are indexed, which lead to other sites I don't need. Is there any way to modify the spider where it only stays on .edu domains?

</not a clue>
Quote Reply
Re: [Kilroy] TLD's and spider In reply to
I dont know the source of it, but it shouldn't be hard to add some regex to see if the domain ends in a .edu TLD. Something like;

if $var !~ /^[a-z]\.[a-z]\.edu/i;
next;

Not definate about the format though, as it is only a rough guide Tongue

Andy (mod)
andy@ultranerds.co.uk


IMPORTANT: I've now moved to ultranerds.co.uk, and the .com will no longer work!
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package (plugins total "value" $3,325 & rising, for just $350)| GLinks ULTRA Package PRO (plugins total "value" $5,625 & rising, for just $500)
Support Forum | Links SQL Plugins | DMOZ Dumps | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Compare our different Plugin packages *new* Free CSS Templates
Quote Reply
Re: [AndyNewby] TLD's and spider In reply to
Its not that easy unfortunately I expect. Also that regex wouldn't work. What if the domain was:

aa1.edu ?
Quote Reply
Re: [RedRum] TLD's and spider In reply to
Just add [a-z0-9] then Wink

Andy (mod)
andy@ultranerds.co.uk


IMPORTANT: I've now moved to ultranerds.co.uk, and the .com will no longer work!
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package (plugins total "value" $3,325 & rising, for just $350)| GLinks ULTRA Package PRO (plugins total "value" $5,625 & rising, for just $500)
Support Forum | Links SQL Plugins | DMOZ Dumps | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Compare our different Plugin packages *new* Free CSS Templates
Quote Reply
Re: [AndyNewby] TLD's and spider In reply to
and if it is aa-1.edu ?
Quote Reply
Re: [RedRum] TLD's and spider In reply to
Ah well, like you said, that method wouldn't work anyway ;)

Andy (mod)
andy@ultranerds.co.uk


IMPORTANT: I've now moved to ultranerds.co.uk, and the .com will no longer work!
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package (plugins total "value" $3,325 & rising, for just $350)| GLinks ULTRA Package PRO (plugins total "value" $5,625 & rising, for just $500)
Support Forum | Links SQL Plugins | DMOZ Dumps | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Compare our different Plugin packages *new* Free CSS Templates
Quote Reply
Re: [AndyNewby] TLD's and spider In reply to
That's similiar to what I tried before:

if $var !~ /^[a-z]\.[a-z]\*.edu/;
then next;

Didn't work either...
Still working on it..

</not a clue>
Quote Reply
Re: [Kilroy] TLD's and spider In reply to
Where is the "if $var" code you are referring to?
Quote Reply
Re: [RedRum] TLD's and spider In reply to
The whole idea I had was to modify the rules form to be able to add a variable/field to it so when I input ".edu" into it, it would only search for those particular tld's. So far, I have totally screwed up the script, and will have to reload it...

Still working...

</not a clue>
Quote Reply
Re: [Kilroy] TLD's and spider In reply to
Hi Kilroy,

Can you email me directly? aki@gossamer-threads.com.

If not, the easiest thing you can do is to open up the "Robot.pm" module in the admin/GT/ directory. There, in sub check_urls, you can put a regex test to see if a domain is allowed, or not.

I'm not always on the forum and the easiest way to reach me is through my email. I'm very sorry I didn't catch this earlier.

Aki