Gossamer Forum
Quote Reply
Annoying stuff!
Argh..this is really annoying me! I'm using the following code;

Code:
sub redo_checks {
# -------------------------------------------------------------------
# This subroutine will get called whenever the user clicks
# on 'Redo Checks' in the admin menu. Remember, you need to print
# your own content-type headers; you should use
#

print $IN->header();
my $opts = Links::Plugins->get_plugin_user_cfg('Recip_Link');

my ($db_con, $error_show, $exists, $html_page, $got);

# grab the URL that we need to find in the page...
my $user_url = $opts->{'URL_To_Check_For'};

my $table = $DB->table('Links');
$table->select_options ('ORDER BY ID DESC');
my $sth = $table->select;

while (my $hit = $sth->fetchrow_hashref) {

# do the checks for each link here!
my @page = get($hit->{URL});

# see if @page holds anything...if not, then we didnt get the page
foreach (@page) {
$got = 1; last;
}

# if $got is nul, then we didn't get the page! So report this...
if (!$got) { print "Could not connect to $hit->{URL} <BR>"; next; }

# gulp the whole page from the array...
foreach (@page) { $html_page .= $_; }

unless ($html_page =~ /$user_url/)
{
my $report = "yes";
# set their recip status to 'no', if no link was found....
my $table = $DB->table ('Links');
my $result = $table->update ( { Has_Priority => "0"}, { URL => $hit->{'URL'} }) or die $GT::SQL::error;

# show an error if the page could not be reached...
print "$report - <font color=blue>$hit->{URL}</font> - Could not find a reciprical link on this page. Setting to NULL in database...<BR>";

} else {

# update the database if a recip link exists...
my $table = $DB->table ('Links');
my $result = $table->update ( { Has_Priority => "1"}, { URL => $hit->{'URL'} }) or die $GT::SQL::error;

# show them a message...
print "<font color=red>$hit->{URL}</font> - Link was found...setting database to 'yes'<BR>";
}

} # end the while....

print "<BR><BR> <font color=red>COMPLETE!</font>";

}

The idea of this code, is that it will go through the entire database, grab the links URL, stick all the HTML in a variable, and then check if that variable exists on the page. The URL to check for is grabbed by $opts. The problem I am having, is that it seems to be verifying almost everything as a recip link page, although, when visiting it, this is not true. I'm completly stumped as to why this is happening Unsure

At first I thought it was the fact that the URL to check for was non-existant, but I tried printing it at the beginning of the sub, and it shows fine Unimpressed

Any ideas?

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Annoying stuff! In reply to
I think this here is the culprit.

Code:
# gulp the whole page from the array...
foreach (@page) { $html_page .= $_; }

Make sure you clear out the $html_page variable first. $html_page = ''; right before the loop may do the trick.

So what is probably happening is that you're snowballing all the html documents into that one var and checking against it. Once you download a link that is valid, all the rest will be.
Quote Reply
Re: [Aki] Annoying stuff! In reply to
Can't you use:

my $string = get($url);

Instead of looping it?

Last edited by:

Paul: Aug 22, 2002, 12:27 PM
Quote Reply
Re: [Paul] Annoying stuff! In reply to
I don't know enough about LWP to find out if it works. Usually I head for $buf = GT::URI::HTTP->get( $url );
Quote Reply
Re: [Aki] Annoying stuff! In reply to
Thanks Aki, you got it in one Smile I just cleared out $html_page before the foreach, and it worked great :)

Paul...I'l give your idea a go tomorrow (I'm off out now)....thanks for the suggestion.

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Annoying stuff! In reply to
You should look at doing something like this...index() will be a lot faster than many regexs...

Code:
#!/perl/bin/perl
#==========================================================

use strict;
use LWP::Simple;

main();

sub main {
#----------------------------------------------------------
# Paul's method.

my $url = 'http://www.gossamer-threads.com';
my $code = get($url);

print "Content-type: text/plain\n\n";
print index($code, 'AutoRespond') > -1 ? 'YEHAWWW' : 'HUH?';
}

This is tested and works.
Quote Reply
Re: [Paul] Annoying stuff! In reply to
I've implemented a RecipChecker once Smile and I found that though the index is much faster than a regex, the network latency trivializes the the difference in speed gain from the index. Found it was an alright compromize, since with the regex, I was able to easily add multiple url checking. case insensitivity... among other tests.

Code:
my $regex = ref $wanted ? join "|", @$wanted : $wanted;
Quote Reply
Re: [Aki] Annoying stuff! In reply to
Yeah I suppose you are right, it sure will take a long time to grab a few thousand pages.

There's no really great solution that I know of :(

>>
my $regex = ref $wanted ? join "|", @$wanted : $wanted;
<<

Let's hope you didn't do:

my $wanted = \'http://www.url.com';

Wink

Last edited by:

Paul: Aug 22, 2002, 1:32 PM