Gossamer Forum
Home : Products : Links 2.0 : Customization :

AutoKill Bad Links Mod for nph-verify

Quote Reply
AutoKill Bad Links Mod for nph-verify
 
Here's a mod you can run thru cron to autokill any 404 or 403, etc. file not found links in the database.

I've tried it a bit, and it looks like it 90% works. I havent really been able to test it much, but I was hoping someone would be able to get it in 100% working order. Please mail me if you do get it working 100%!!

In nph-verify.cgi:

in sub verify_links, just add a few entries to the variables:

sub verify_links {
# -----------------------------------------------------

use strict;
use vars qw(%urls %code %msg $method $db_url $db_key_pos $db_key $db_script_url $db_file_name $dbj @dblines);


now, in validator.pm add the following in the failure category, so it looks like this:

sub on_failure {
# ---------------------------------------------------------------
# Get's called whenever we fail right away.
#
my ($self, $request, $response, $entry) = @_;
if ($response) {
my $url = $request->url;
my $id = $main::urls{$url};
my $code = $response->code;
my $msg = $response->message;
chomp ($msg); chomp ($url);

$main::code{$url} = $code;
$main::msg {$url} = $msg;





# autokill code

@dblines = <DB>;
close(DB);

foreach ($dbj = 0;$dbj <= @dblines;$dbj++) {
if ($dblines[$dbj] =~ /$id|/) {
splice(@dblines, $dbj, 1);
$dbj -= 1;
}
}

open (DB, "<$db_file_name") or &cgierr("error in delete_records. unable to open db file: $db_file_name.\nReason: $!");
flock(DB, 2);
print DB @dblines;
close(DB);
open (DB, "<$db_file_name") or &cgierr("error in delete_records. unable to open db file: $db_file_name.\nReason: $!");
flock(DB, 2);
### end autokill


Now, I have to make sure you know there are some errors with this - but I cant see where. That's basically why I'm posting it. I'm not currently sure by looking at it whether it is working 100% or not. ie. this is completely alpha code - just wanted to make that clear.


Quote Reply
Re: AutoKill Bad Links Mod for nph-verify In reply to
Nice Mod...Haven't tested it...yet.

One suggestion (if I may)...I would suggest rather than automatically killing the link, that the link is taken out of the production database (links.db) and put into a temporary holding file (check.db) similar to what Bmxer is working on in terms of putting links on hold (for advertising/marketing purposes). Of course, this would increase the time frame for executing the nph-verify.cgi (but done through Cron, it would work just fine).

Rather than automatically deleting the link, it would be nice to have a method of manually checking the links deleted or at least have the links available to look at later. But in terms of pulling the links out of the production database (live), I like that a lot.

Great job!

Regards,

------------------
Eliot Lee
Anthro TECH,L.L.C
www.anthrotech.com
----------------------


Quote Reply
Re: AutoKill Bad Links Mod for nph-verify In reply to
 
Ya, the point is - I dont want to have to go through each 404 File Not Found link and manually delete it....

I've got a few thousand links and I always have to put a day aside just to verify and delete the bad links...

It's just good to do it automagically... like Yahoo, ie. has a crawler that goes thru all the links. If it finds a 404 in its databse, it just gets wiped out...
Quote Reply
Re: AutoKill Bad Links Mod for nph-verify In reply to
Hmmm,

is this the latest info?

1) Do you have found the last 10% ?
2) I can't understand: You kill only the
4xx messages, correct?

Another thing:
A autokill feature for *exaktly* same URLs. Any ideas?

(It should work for databases with 100.000 entries.

Martin Ebert