Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

Why should these two regexes behave differently?

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


public at khwilliamson

Aug 4, 2012, 3:13 PM

Post #1 of 4 (60 views)
Permalink
Why should these two regexes behave differently?

I'm trying to start to understand the backtracking control verbs. I
don't understand why these two things differ:

blead -E 'say "foo" =~ / (?: (?! foo | bar ) ) . /x || 0;'
1

blead -E 'say "foo" =~ / (?: (?= foo | bar ) (*F) ) . /x || 0;'
0


l.mai at web

Aug 4, 2012, 4:24 PM

Post #2 of 4 (54 views)
Permalink
Re: Why should these two regexes behave differently? [In reply to]

On 2012-08-04 Karl Williamson wrote:

> I'm trying to start to understand the backtracking control verbs. I
> don't understand why these two things differ:
>
> blead -E 'say "foo" =~ / (?: (?! foo | bar ) ) . /x || 0;'
> 1
>
> blead -E 'say "foo" =~ / (?: (?= foo | bar ) (*F) ) . /x || 0;'
> 0

I don't understand why you think these two regexes should be similar.


[Regex #1]

Let's simplify a bit:

/ (?: (?! foo | bar ) ) . /x

# eliminate /x
/(?:(?!foo|bar))./

# remove redundant (?: )
/(?!foo|bar)./

=> "at a position not followed by foo or bar, match a non-newline
character"

Our string is "foo".

First we attempt to match at offset 0. This fails because offset 0 is
indeed followed by "foo".

Next we attempt to match at offset 1. This succeeds because offset 1 is
followed by "oo", which is matched by neither "foo" nor "bar". Then we
match a non-newline character ("o"). We've run out of regex: success.

Complete success at offset 1.


[Regex #2]

Simplify:

/ (?: (?= foo | bar ) (*F) ) . /x

# eliminate /x
/(?:(?=foo|bar)(*F))./

# remove redundant (?: )
/(?=foo|bar)(*F)./

=> "at a position followed by foo or bar, fail, then match a
non-newline character"

This regex can never succeed because it contains a non-optional (*F)
(and by that I mean every possible path through the regex reaches
(*F)). Reaching (*F) makes the current match attempt fail.

With our string of "foo":

Attempt to match at offset 0. (?=foo|bar) succeeds because offset 0 is
indeed followed by "foo". Then we hit (*F) and fail. There are no
recorded backtracking positions, so this offset fails completely.

Attempt to match at offset 1, 2, 3. We immediately fail because
(?=foo|bar) fails.

We've run out of possible match positions and fail the whole match.


public at khwilliamson

Aug 4, 2012, 8:10 PM

Post #3 of 4 (51 views)
Permalink
Re: Why should these two regexes behave differently? [In reply to]

On 08/04/2012 05:24 PM, Lukas Mai wrote:
> On 2012-08-04 Karl Williamson wrote:
>
>> I'm trying to start to understand the backtracking control verbs. I
>> don't understand why these two things differ:
>>
>> blead -E 'say "foo" =~ / (?: (?! foo | bar ) ) . /x || 0;'
>> 1
>>
>> blead -E 'say "foo" =~ / (?: (?= foo | bar ) (*F) ) . /x || 0;'
>> 0
>
> I don't understand why you think these two regexes should be similar.
>
>
> [Regex #1]
>
> Let's simplify a bit:
>
> / (?: (?! foo | bar ) ) . /x
>
> # eliminate /x
> /(?:(?!foo|bar))./
>
> # remove redundant (?: )
> /(?!foo|bar)./
>
> => "at a position not followed by foo or bar, match a non-newline
> character"
>
> Our string is "foo".
>
> First we attempt to match at offset 0. This fails because offset 0 is
> indeed followed by "foo".
>
> Next we attempt to match at offset 1. This succeeds because offset 1 is
> followed by "oo", which is matched by neither "foo" nor "bar". Then we
> match a non-newline character ("o"). We've run out of regex: success.
>
> Complete success at offset 1.
>
>
> [Regex #2]
>
> Simplify:
>
> / (?: (?= foo | bar ) (*F) ) . /x
>
> # eliminate /x
> /(?:(?=foo|bar)(*F))./
>
> # remove redundant (?: )
> /(?=foo|bar)(*F)./

I don't believe the (?: ) is redundant. The *F applies to its subpattern.

>
> => "at a position followed by foo or bar, fail, then match a
> non-newline character"
>
> This regex can never succeed because it contains a non-optional (*F)
> (and by that I mean every possible path through the regex reaches
> (*F)). Reaching (*F) makes the current match attempt fail.
>
> With our string of "foo":
>
> Attempt to match at offset 0. (?=foo|bar) succeeds because offset 0 is
> indeed followed by "foo". Then we hit (*F) and fail. There are no
> recorded backtracking positions, so this offset fails completely.
>
> Attempt to match at offset 1, 2, 3. We immediately fail because
> (?=foo|bar) fails.
>
> We've run out of possible match positions and fail the whole match.
>


public at khwilliamson

Aug 4, 2012, 9:23 PM

Post #4 of 4 (52 views)
Permalink
Re: Why should these two regexes behave differently? [In reply to]

Never mind. My folly is becoming apparent to me.

On 08/04/2012 09:10 PM, Karl Williamson wrote:
> On 08/04/2012 05:24 PM, Lukas Mai wrote:
>> On 2012-08-04 Karl Williamson wrote:
>>
>>> I'm trying to start to understand the backtracking control verbs. I
>>> don't understand why these two things differ:
>>>
>>> blead -E 'say "foo" =~ / (?: (?! foo | bar ) ) . /x || 0;'
>>> 1
>>>
>>> blead -E 'say "foo" =~ / (?: (?= foo | bar ) (*F) ) . /x || 0;'
>>> 0
>>
>> I don't understand why you think these two regexes should be similar.
>>
>>
>> [Regex #1]
>>
>> Let's simplify a bit:
>>
>> / (?: (?! foo | bar ) ) . /x
>>
>> # eliminate /x
>> /(?:(?!foo|bar))./
>>
>> # remove redundant (?: )
>> /(?!foo|bar)./
>>
>> => "at a position not followed by foo or bar, match a non-newline
>> character"
>>
>> Our string is "foo".
>>
>> First we attempt to match at offset 0. This fails because offset 0 is
>> indeed followed by "foo".
>>
>> Next we attempt to match at offset 1. This succeeds because offset 1 is
>> followed by "oo", which is matched by neither "foo" nor "bar". Then we
>> match a non-newline character ("o"). We've run out of regex: success.
>>
>> Complete success at offset 1.
>>
>>
>> [Regex #2]
>>
>> Simplify:
>>
>> / (?: (?= foo | bar ) (*F) ) . /x
>>
>> # eliminate /x
>> /(?:(?=foo|bar)(*F))./
>>
>> # remove redundant (?: )
>> /(?=foo|bar)(*F)./
>
> I don't believe the (?: ) is redundant. The *F applies to its subpattern.
>
>>
>> => "at a position followed by foo or bar, fail, then match a
>> non-newline character"
>>
>> This regex can never succeed because it contains a non-optional (*F)
>> (and by that I mean every possible path through the regex reaches
>> (*F)). Reaching (*F) makes the current match attempt fail.
>>
>> With our string of "foo":
>>
>> Attempt to match at offset 0. (?=foo|bar) succeeds because offset 0 is
>> indeed followed by "foo". Then we hit (*F) and fail. There are no
>> recorded backtracking positions, so this offset fails completely.
>>
>> Attempt to match at offset 1, 2, 3. We immediately fail because
>> (?=foo|bar) fails.
>>
>> We've run out of possible match positions and fail the whole match.
>>
>

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.