Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

[BUG] [Regex] Is it a bug? Doing /\G(?=\<)/g twice.

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


shlomif at iglu

Nov 4, 2009, 5:43 AM

Post #1 of 3 (51 views)
Permalink
[BUG] [Regex] Is it a bug? Doing /\G(?=\<)/g twice.

Hi all!

Today I ran into a problem with some parsing code of mine, and I was able to
reduce it into the following testcase:

----------------------
#!/usr/bin/perl

use strict;
use warnings;

my $text = "<title>";
pos($text) = 0;
print (($text =~ m{\G(?=<)}cg) ? "True" : "False");
print "\n";
print 'pos($text) = ', pos($text), "\n";
print (($text =~ m{\G(?=<)}cg) ? "True" : "False");
print "\n";
----------------------

This prints "True" and then "False". The latter surprised me. After chatting
with it on irc://irc.perl.org/p5p , nothingmuch gave the following
explanation:

---------------------
#!/usr/bin/perl

use strict;
use warnings;

use Devel::Peek;
no warnings 'uninitialized';

local $\ = "\n";

my $text = "<title>";

print "pristine ", pos($text);
Dump($text);

pos($text) = 0;
print "reset pos ", pos($text);
Dump($text);

print (($text =~ m{\G}cg) ? "True" : "False");
print "after successful match pos is ", pos($text);
Dump($text);

print (($text =~ m{\G}g) ? "True" : "False");
print "after failed match ", pos($text);
Dump($text);
---------------------

However, to me it seems that implementation details put aside, the code above
should just work and return two "True"'s.

I've tried it with the Mandriva Cooker perl, and perl-5.8.8 and perl-5.10.x-
latest and they all exhibit the same bug.

Could anyone comment on it? Is this behaviour (arguably mis-behaviour)
documented anywhere or can anyone give me a good reason for why it should not
work?

This example is a little simplified - my real code is somewhat more complex
and less silly.

Regards,

Shlomi Fish

--
-----------------------------------------------------------------
Shlomi Fish http://www.shlomifish.org/
Freecell Solver - http://fc-solve.berlios.de/

Chuck Norris read the entire English Wikipedia in 24 hours. Twice.


zefram at fysh

Nov 4, 2009, 5:58 AM

Post #2 of 3 (47 views)
Permalink
Re: [BUG] [Regex] Is it a bug? Doing /\G(?=\<)/g twice. [In reply to]

Shlomi Fish wrote:
>print (($text =~ m{\G(?=<)}cg) ? "True" : "False");

Zero-width match. perlre(1):

# Repeated Patterns Matching a Zero-length Substring
...
# Thus Perl allows such constructs, by forcefully breaking the infinite
# loop. The rules for this are different for lower-level loops given by
# the greedy quantifiers "*+{}", and for higher-level ones like the "/g"
# modifier or split() operator.
#
# The lower-level loops are interrupted (that is, the loop is broken)
# when Perl detects that a repeated expression matched a zero-length
# substring.
...
# Similarly, for repeated "m/()/g" the second-best match is the match at
# the position one notch further in the string.

Not a bug.

-zefram


demerphq at gmail

Nov 4, 2009, 6:01 AM

Post #3 of 3 (47 views)
Permalink
Re: [BUG] [Regex] Is it a bug? Doing /\G(?=\<)/g twice. [In reply to]

2009/11/4 Shlomi Fish <shlomif[at]iglu.org.il>:
> Hi all!
>
> Today I ran into a problem with some parsing code of mine, and I was able to
> reduce it into the following testcase:
>
> ----------------------
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> my $text = "<title>";
> pos($text) = 0;
> print (($text =~ m{\G(?=<)}cg) ? "True" : "False");
> print "\n";
> print 'pos($text) = ', pos($text), "\n";
> print (($text =~ m{\G(?=<)}cg) ? "True" : "False");
> print "\n";
> ----------------------
>
> This prints "True" and then "False". The latter surprised me. After chatting
> with it on irc://irc.perl.org/p5p , nothingmuch gave the following
> explanation:
>
> ---------------------
> #!/usr/bin/perl
>
> use strict;
> use warnings;
>
> use Devel::Peek;
> no warnings 'uninitialized';
>
> local $\ = "\n";
>
> my $text = "<title>";
>
> print "pristine ", pos($text);
> Dump($text);
>
> pos($text) = 0;
> print "reset pos ", pos($text);
> Dump($text);
>
> print (($text =~ m{\G}cg) ? "True" : "False");
> print "after successful match pos is ", pos($text);
> Dump($text);
>
> print (($text =~ m{\G}g) ? "True" : "False");
> print "after failed match ", pos($text);
> Dump($text);
> ---------------------
>
> However, to me it seems that implementation details put aside, the code above
> should just work and return two "True"'s.
>
> I've tried it with the Mandriva Cooker perl, and perl-5.8.8 and perl-5.10.x-
> latest and they all exhibit the same bug.
>
> Could anyone comment on it? Is this behaviour (arguably mis-behaviour)
> documented anywhere or can anyone give me a good reason for why it should not
> work?

If it didnt work then s///g or m//g might result in infinite loops,
and split //, $string wouldnt work either.

It is arguable the *exact* rules for how it works are not so sane, but
at this point it is too late to change them.

cheers,
yves


--
perl -Mre=debug -e "/just|another|perl|hacker/"

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.