Gossamer Forum
Home : General : Perl Programming :

Regex not matching basic word?

Quote Reply
Regex not matching basic word?
I've got a bit of a weird problem here. Basically, I'm trying to grab the contents of a page (in the below example, its google.com). I then try to find ALL of the words in the $RULES variable. The problem is, that its only matching 2 of the words (job and help). The code is;

Code:
#!/usr/bin/perl

print "Content-type: text/html \n\n";

my $RULES = qq|job
help
tools|;

use LWP::Simple;
my $html = get("http://www.google.com");
$html =~ s/\n//g;

# do counting of rules...needed later.
my @c_rules = split("\n",$RULES);
my $c_rules_cnt = $#c_rules; # get the number of entries..

# just so we can see the entries of the array..
print join(",",@c_rules) . " Words...<BR>";

# see if we can find *ALL* the words...
my $count = 0;
foreach (@c_rules) {
chomp;
print "Word: $_ <BR>";
if ($html =~ /$_/sig) {
$count++;
print "<font color=blue>Match good word.. $_ </font><BR>";
}
} # end 'foreach' for @c_rules

# if we didn't get *ALL* the rules, then we need to skip....

if ($count < $c_rules_cnt) { print "Bad...<BR>"; } else { print "Good..<BR>"; }

If I add something like this;

Code:
if ($html =~ /tools/i) { print "yay!"; } else { print "damn"; }

...it shows "yay!" fine. Can anyone see why my code would do this? Is it a problem with my code, or a problem with the method I'm using?

TIA.

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Andy] Regex not matching basic word? In reply to
Because your chomp is not removing the linefeed... you must have other characters there.

Besides you're doing extra work, no need to make $RULES a string then split it to an array.

Code:
my @RULES = qw|job help tools|;

# do counting of rules...needed later.
my @c_rules = @RULES;
my $c_rules_cnt = @c_rules;

though defining @c_rules is redundant, just use @RULES (or don't use @RULES at all)

--mark

Last edited by:

Mark Badolato: Jul 19, 2003, 10:28 AM
Quote Reply
Re: [Mark Badolato] Regex not matching basic word? In reply to
Aaah...good point :p Maybe there is a \t in there somewhere?

>>>Besides you're doing extra work, no need to make $RULES a string then split it to an array. <<<

That data was actually coming in from a text box, so I was putting it straight into an array. I only used a string to prove some stuff out.

I'll give that a play, and see how it works.

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!
Quote Reply
Re: [Mark Badolato] Regex not matching basic word? In reply to
Still missing out 'tools'. Code now is;

Code:
#!/usr/bin/perl

use strict;
use CGI::Carp qw(fatalsToBrowser);

print "Content-type: text/html \n\n";

my @RULES = qw(job help tools);

use LWP::Simple;
my $html = get("http://www.google.com");
$html =~ s/\n//g;

if ($html =~ /tools/sig) { print "yay!"; } else { print "damn"; }

# do counting of rules...needed later.
my $c_rules_cnt = $#RULES; # get the number of entries..

# now we have the count, lets do some checkingto see if we have all
# the required words...if not, then we need to skip this URL, and pass
# back false value, so we can report that it wasn't added...
my $count = 0;
foreach (@RULES) {
$_ =~ s/\t\n//g;
print "Word: $_ <BR>";
if ($html =~ /$_/sig) {
$count++;
print "<font color=blue>Match good word.. $_ </font><BR>";
}
} # end 'foreach' for @c_rules

# if we didn't get *ALL* the rules, then we need to skip this one...



if ($count == $c_rules_cnt) { print "<BR>Bad...<BR>"; } else { print "<BR>Good..<BR>"; }

This is soooo weird!

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!

Last edited by:

Andy: Jul 19, 2003, 11:50 AM
Quote Reply
Re: [Andy] Regex not matching basic word? In reply to
Pull the g out of the regex

if ($html =~ /$_/si) {

g isn't needed in this case and seems to be having a weird effect. my regex-fu is rusty though so consult friedl's book for details :)
Quote Reply
Re: [Mark Badolato] Regex not matching basic word? In reply to
WAHOO! Thanks Mark, its working great now. Much appreciated Smile

Cheers

Andy (mod)
andy@ultranerds.co.uk
Want to give me something back for my help? Please see my Amazon Wish List
GLinks ULTRA Package | GLinks ULTRA Package PRO
Links SQL Plugins | Website Design and SEO | UltraNerds | ULTRAGLobals Plugin | Pre-Made Template Sets | FREE GLinks Plugins!