Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

[Slaven Rezic <slaven@rezic.de>] Another regexp failure with utf8-flagged string and byte-flagged pattern

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


slaven at rezic

Nov 4, 2007, 9:34 AM

Post #1 of 3 (74 views)
Permalink
[Slaven Rezic <slaven@rezic.de>] Another regexp failure with utf8-flagged string and byte-flagged pattern

The attached mail seems to got lost the first time.

Regards,
Slaven
Attachments: message-rfc822.eml (4.72 KB)


andreas.koenig.7os6VVqR at franz

Nov 14, 2007, 11:12 PM

Post #2 of 3 (55 views)
Permalink
Re: [Slaven Rezic <slaven@rezic.de>] Another regexp failure with utf8-flagged string and byte-flagged pattern [In reply to]

>>>>> On 04 Nov 2007 18:34:57 +0100, Slaven Rezic <slaven[at]rezic.de> said:

> The attached mail seems to got lost the first time.

Not only lost the first time, also warnocked the second time. And no
bugnumber. Could some kind soul with more RT fu than Slaven and me get
this thing perlbugged?

Interesting about this bug is that it has been actually fixed in maint
track:

Change 25568 by nicholas[at]nicholas-saigo on 2005/09/22 12:22:36

Integrate:
(the tests from)
[ 24044]
Subject: Re: Reworked Trie Patch
From: demerphq <demerphq[at]gmail.com>
Date: Mon, 14 Mar 2005 08:55:39 +0100
Message-ID: <9b18b31105031323557019ae1[at]mail.gmail.com>

Subject: Re: Reworked Trie Patch
From: demerphq <demerphq[at]gmail.com>
Date: Wed, 16 Mar 2005 19:48:18 +0100
Message-ID: <9b18b31105031610481025a080[at]mail.gmail.com>

Plus minor nits in the documentation of re.pm,
a version bump, and addition of an OPTIMIZE alias

[ 25095]
[perl #36207] UTF8/Latin 1/i regexp "Malformed character" warning
$utf8 =~ /latin/i didn't match.
Also added TODO for $latin =~ /utf8/i which also fails

[ 25106]
Re: [perl #36207] UTF8/Latin 1/i regexp "Malformed character" warning
From: demerphq <demerphq[at]gmail.com>
Message-ID: <9b18b3110507080807f16d1eb[at]mail.gmail.com>
Date: Fri, 8 Jul 2005 17:07:26 +0200

Fix trie codepath of mixed utf8/latin1 pattern matches


But is not fixed in any of my bleadperl versions.


--
andreas


nick at ccl4

Nov 17, 2007, 8:56 AM

Post #3 of 3 (53 views)
Permalink
Re: [Slaven Rezic <slaven@rezic.de>] Another regexp failure with utf8-flagged string and byte-flagged pattern [In reply to]

On Thu, Nov 15, 2007 at 10:14:29PM +0100, Andreas J. Koenig wrote:
> >>>>> On Thu, 15 Nov 2007 10:15:09 +0000, Nicholas Clark <nick[at]ccl4.org> said:
>
> > Do we know what change caused the regression?
>
> I posted it this morning. The term regression is misleading. It been
> (accidentally?) fixed in maintperl with
>
> Change 25568 by nicholas[at]nicholas-saigo on 2005/09/22 12:22:36

On Thu, Nov 15, 2007 at 08:12:25AM +0100, Andreas J. Koenig wrote:

> Interesting about this bug is that it has been actually fixed in maint
> track:
>
> Change 25568 by nicholas[at]nicholas-saigo on 2005/09/22 12:22:36
>
> Integrate:
> (the tests from)
> [ 24044]
> Subject: Re: Reworked Trie Patch
> From: demerphq <demerphq[at]gmail.com>
> Date: Mon, 14 Mar 2005 08:55:39 +0100
> Message-ID: <9b18b31105031323557019ae1[at]mail.gmail.com>
>
> Subject: Re: Reworked Trie Patch
> From: demerphq <demerphq[at]gmail.com>
> Date: Wed, 16 Mar 2005 19:48:18 +0100
> Message-ID: <9b18b31105031610481025a080[at]mail.gmail.com>
>
> Plus minor nits in the documentation of re.pm,
> a version bump, and addition of an OPTIMIZE alias
>
> [ 25095]
> [perl #36207] UTF8/Latin 1/i regexp "Malformed character" warning
> $utf8 =~ /latin/i didn't match.
> Also added TODO for $latin =~ /utf8/i which also fails
>
> [ 25106]
> Re: [perl #36207] UTF8/Latin 1/i regexp "Malformed character" warning
> From: demerphq <demerphq[at]gmail.com>
> Message-ID: <9b18b3110507080807f16d1eb[at]mail.gmail.com>
> Date: Fri, 8 Jul 2005 17:07:26 +0200
>
> Fix trie codepath of mixed utf8/latin1 pattern matches
>
>
> But is not fixed in any of my bleadperl versions.

Because, as Slaven deduced, it's not in any code ever in maint.
I guess it's two bugs.

On Sat, Nov 17, 2007 at 04:29:29PM +0100, Slaven Rezic wrote:
> Slaven Rezic <slaven[at]rezic.de> writes:
>
> > I'd like to remind that there's still a regression when matching a
> > utf8-flagged string with a byte-flagged pattern.
> >
> > The following script works fine on 5.8.8 but shows a couple of "not
> > ok"s on 5.10.0. The "not ok"s seem to have in common that it's only
> > for uppercase characters, probably because they have a folded
> > lowercase equivalent (the mu is also on the list and has probably also
> > some kind of folded equivalent).
> >
> > #!/usr/bin/perl -w
> > use strict;
> > for my $chr (160 .. 255) {
> > my $chr_byte = chr($chr);
> > my $chr_utf8 = chr($chr); utf8::upgrade($chr_utf8);
> > my $rx = qr{$chr_byte|X}i;
> > print $chr . " " . ($chr_utf8 =~ $rx ? "ok" : "not ok") . "\n";
> > }
> > __END__
> >
>
> I have a patch for this problem, including a new test case for
> t/op/pat.t. Please try and comment.

Works for me. But it's Rafael's call.

Specifically also once I integrate the regression tests, but before rebuilding
perl, I can make the tests pass with

./perl -e '${^RE_TRIE_MAXBUF} = -1; do shift' t/op/pat.t

so it has to be a bug in the trie code.
(Your script I quoted also passes when run in that fashion)

Nicholas Clark

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.