Gossamer Forum
Quote Reply
Precheck URL
I just want to know if there is any global/addon to precheck an URL while adding?
If not maybe someone has some ideas how to check it?

For websites like www.websitebuilder.com/sabrinaswebseite/home.htm
I cant check only websitebuilder.com, but checking the whole name is not enogh also, because we dont want to have www.websitebuilder.com/sabrinaswebseite/homeandhome.htm.

Maybe i have to send back everything that i can find to let an user decide if we want it?
And maybe add everything that is not really clear.
Quote Reply
Re: [Robert] Precheck URL In reply to
Hi,

You could do it with curl:

Code:
curl -o /dev/null --silent --head --write-out '%{http_code}\n' https://www.websitebuilder.com/sabrinaswebseite/home.htm
404

So as a perl script, something like:

Code:
my $status = `curl -o /dev/null --silent --head --write-out '%{http_code}' $URL`';
if ($status eq "404") {
# its a 404 page
}

Obviously this relies on the server sending the correct 404 status header back :)

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Precheck URL In reply to
Thank you.

To check the url is one thing, but i want to know if an url was added before.

then i have to parse the url with all the different possibilities:

https://
http://
www.domain
domain
domain.xxx (we take it, but admin should know it that there is the same domain but with another ending.
and
www.domain.com/ and www.domain.com
and maybe www.domain.com/index.php, html, home.php and so on.

I would like to know if there is anything, maybe a perl or php bibliothek, an article or something about this problem.

I see some points, but still not a solution that fits to everything.

What is the query to test the url?
Should i just show everything that is more or less equal and ask the user, if he still wants to send the new url?
Like: Hey, we have

....
....
....

Do you still want to add your url? Be careful not to add twice, because then we ... send a rocket to your home town ...


What i do with lala.wordpress.com or websitebuilder.com/annabelle? Do i need lists with such domains?

Last edited by:

Robert: Aug 8, 2016, 2:55 AM
Quote Reply
Re: [Robert] Precheck URL In reply to
Hi,

Its tricky - because as you say, if someone is on xxx.wordpress.com, and someone adds yyy.wordpress.com - it could still treat it as a duplicate. You would need to create a whitelist of domains that you *don't* care if they have multiple sub-domains. The easiest way to do it in Perl with www versions, would be something like:

Here is something to get your started:

Code:
sub check_dup {

my $url = $_[0];

$url =~ s/https?\:\/\///i;
$url =~ s/^www\.//i;

# URL would now be foo.com

# check to see if this URL exists...
my $cond = new GT::SQL::Condition;
$cond->add("URL","LIKE","%$url");
$cond->add("URL","LIKE","%$url/");
$cond->add("URL","LIKE","%$url/%");
$cond->bool("OR");
my $check = $DB->table("Links")->count( $cond );

# $check will be 0 if not matches, or > 0 if we found a match

}

This would work with:

https://foo.com
http://foo.com
https://foo.com/test
http://foo.com/test
https://www.foo.com
https://www.foo.com/foo
etc

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!