Gossamer Forum
Home : Products : Gossamer Links : Version 1.x :

How do I....

Quote Reply
How do I....
How can I check to see if a site has added my link to their site. I want to check this in the add.cgi, and if the link does not exist, return the error page.
I know where I would add the code, and I know some of the coding.

I currently check to see if the link they are adding is good (not misspelled, dead, server error, etc.) with the following code:

my ($ua,$req,$res);
$ua = new LWP::UserAgent;
$ua->agent("MySuperDuperChecker/2.0 (http://www.fooee.com)");
$ua->timeout(30); # set this to what you want in seconds.
$req = new HTTP::Request 'GET' => (${$rec}{'URL'});
$res = $ua->request($req);
unless($res->is_success) {
$name = &get_category_name ($in->param('CategoryID'));
$category = "$name <input type=hidden name='CategoryID' value='" . $in->param('CategoryID') . "'>";
&site_html_add_failure( { error => "URL: ${$rec}{'URL'} -" . " " . $res->code . " " . $res->message, Category => $category, %in });
return;
}

I am sure this is most of the code. If the user lists say www.foobar.com as the locaton of my return link, I want to be able to determine, true or false if www.fooee.com is found in the page.
What would I need to add to that code to do this?

Chris
Quote Reply
Re: How do I.... In reply to
$content =~ m,<a.+href=("|')http://www\.domain\.com/[^"']*("|').*>,i;

jerry
Quote Reply
Re: How do I.... In reply to
Sorry to be a pain, but where in that code would I put it?
Could you explain a little what that line of code is doing....It helps my learning Smile

Chris
Quote Reply
Re: How do I.... In reply to
Oh yeah...and does it care whether or not a trailing slash is on the url?
i.e. www.domain.com
or www.domain.com/
or www.domain.com/foo.html

Chris
Quote Reply
Re: How do I.... In reply to
Code:
$content =~ m,<a.+href=("|').*domain\.com[^"']*("|').*>,i;

would check for a link in www.domain.com

it's just pattern matching.. you'd use it in like

Code:
if ($content =~ m,<a.+href=("|').*domain\.com[^"']*("|').*>,i) {
YES
}
else {
NO
}

.+ means one or more of anything.. so basically they can have <a target="blah" href..

{"|') means " or ' cause sometimes people use '..

.* means zero or more.. so there can be anything there..

\. is just a period..

[^"']* means one or more of anything except for " and '..

the i after , means case insensitive..

jerry
Quote Reply
Re: How do I.... In reply to
What I dont understand, is how I get content into "$content".
Would I do it like this:
$content = new HTTP::Request 'GET' => (${$rec}{'URL'});

In other words...how do I grab the other guy's page into $content?

Chris
Quote Reply
Re: How do I.... In reply to
I got it!!

Here it is...

use LWP::Simple;
my $page;
$page = get(${$rec}{'Return_Link'});
if ((! $page)| |($page !~ m/<a.+href=("|').*sexlynx\.com[^"']*("|').*>/i)){
&site_html_add_failure( { error => "Return link not found at ".$in->param('Return_Link'), Category => $category, %in });
return;
}

[This message has been edited by Digital Concepts (edited November 01, 1999).]
Quote Reply
Re: How do I.... In reply to
And you might as well just change your regexp to /sexylynx\.com/ as all the .*'s in there mean it will match anywhere, not just in a href.

Cheers,

Alex
Quote Reply
Re: How do I.... In reply to
but if he did that.. then it wouldn't search just for a link... it would also find like images...

jerry
Quote Reply
Re: How do I.... In reply to
Right, but what I meant was that because the regexp he used had all these .* it would match sexlynx.com anywhere, not neccessairly in an href tag. So he might as well just get rid of the other parts..

Cheers,

Alex
Quote Reply
Re: How do I.... In reply to
good point.. plus the reg expression wouldn't match <a href=http://sexlynx.com>

so it's a little "not that powerful Wink"

jerry
Quote Reply
Re: How do I.... In reply to
I am not good with regex's, so what would I need to match. I dont want to just match the domain name, that would match plain text on the page.
I want to match <a href="domain.com">
or <a href="www.domain.com">

Would I do it like this:
/<a\shref=("|')(?:www\.)domain\.com(?:\/)("|')>/i

Thats my best guess.

Chris

[This message has been edited by Digital Concepts (edited November 01, 1999).]
Quote Reply
Re: How do I.... In reply to
this would match a href target="something"

i'll take some time and think about this one..

m,<a[^>]+href=["']?[^"'>]+domain.com[^"'>]+["']?,i

hehe..

oh well..

jerry
Quote Reply
Re: How do I.... In reply to
Here we go....this will do it perfectly..well what I need..

/href=("|')(?:www\.)domain\.com(?:\/)("|')/i

This will match:
href="domain.com"
href="www.domain.com"

If someone uses a target="" then it wont matter, and wont matter if the target="" is before or after the href=""
All I care about is that I find href="domain.com" on the page, more than like it means they have added the return link Smile
Quote Reply
Re: How do I.... In reply to
oh ok.. but it won't match domain.com/index.html

Wink

jerry
Quote Reply
Re: How do I.... In reply to
Here we go....this will do it perfectly..well what I need..

/href=("|')(?:www\.)domain\.com(?:\/)("|')/i

This will match:
href="domain.com"
href="www.domain.com"

If someone uses a target="" then it wont matter, and wont matter if the target="" is before or after the href=""
All I care about is that I find href="domain.com" on the page, more than like it means they have added the return link Smile
Quote Reply
Re: How do I.... In reply to
people like me would do this..

<!--href="domain.com"-->

but it'd take time to figure it out Wink

that is why i rather do something like what Linkexchange does.. make sure they have like at least 1 impression of an image like once a week or something.. dunno

jerry
Quote Reply
Re: How do I.... In reply to
Yeah I know...its got flaws, but its something...maybe later I will do something more advanced.
I mainly am doing it cause I have had a big problem with people misspelling..i.e. domain.co/page.htl or they put the return link on some links page, and list the main page as the return link, then I have to search for it...etc.etc.

Chris