
nick at ccl4
Nov 17, 2007, 8:56 AM
Post #3 of 3
(53 views)
Permalink
|
|
Re: [Slaven Rezic <slaven@rezic.de>] Another regexp failure with utf8-flagged string and byte-flagged pattern
[In reply to]
|
|
On Thu, Nov 15, 2007 at 10:14:29PM +0100, Andreas J. Koenig wrote: > >>>>> On Thu, 15 Nov 2007 10:15:09 +0000, Nicholas Clark <nick[at]ccl4.org> said: > > > Do we know what change caused the regression? > > I posted it this morning. The term regression is misleading. It been > (accidentally?) fixed in maintperl with > > Change 25568 by nicholas[at]nicholas-saigo on 2005/09/22 12:22:36 On Thu, Nov 15, 2007 at 08:12:25AM +0100, Andreas J. Koenig wrote: > Interesting about this bug is that it has been actually fixed in maint > track: > > Change 25568 by nicholas[at]nicholas-saigo on 2005/09/22 12:22:36 > > Integrate: > (the tests from) > [ 24044] > Subject: Re: Reworked Trie Patch > From: demerphq <demerphq[at]gmail.com> > Date: Mon, 14 Mar 2005 08:55:39 +0100 > Message-ID: <9b18b31105031323557019ae1[at]mail.gmail.com> > > Subject: Re: Reworked Trie Patch > From: demerphq <demerphq[at]gmail.com> > Date: Wed, 16 Mar 2005 19:48:18 +0100 > Message-ID: <9b18b31105031610481025a080[at]mail.gmail.com> > > Plus minor nits in the documentation of re.pm, > a version bump, and addition of an OPTIMIZE alias > > [ 25095] > [perl #36207] UTF8/Latin 1/i regexp "Malformed character" warning > $utf8 =~ /latin/i didn't match. > Also added TODO for $latin =~ /utf8/i which also fails > > [ 25106] > Re: [perl #36207] UTF8/Latin 1/i regexp "Malformed character" warning > From: demerphq <demerphq[at]gmail.com> > Message-ID: <9b18b3110507080807f16d1eb[at]mail.gmail.com> > Date: Fri, 8 Jul 2005 17:07:26 +0200 > > Fix trie codepath of mixed utf8/latin1 pattern matches > > > But is not fixed in any of my bleadperl versions. Because, as Slaven deduced, it's not in any code ever in maint. I guess it's two bugs. On Sat, Nov 17, 2007 at 04:29:29PM +0100, Slaven Rezic wrote: > Slaven Rezic <slaven[at]rezic.de> writes: > > > I'd like to remind that there's still a regression when matching a > > utf8-flagged string with a byte-flagged pattern. > > > > The following script works fine on 5.8.8 but shows a couple of "not > > ok"s on 5.10.0. The "not ok"s seem to have in common that it's only > > for uppercase characters, probably because they have a folded > > lowercase equivalent (the mu is also on the list and has probably also > > some kind of folded equivalent). > > > > #!/usr/bin/perl -w > > use strict; > > for my $chr (160 .. 255) { > > my $chr_byte = chr($chr); > > my $chr_utf8 = chr($chr); utf8::upgrade($chr_utf8); > > my $rx = qr{$chr_byte|X}i; > > print $chr . " " . ($chr_utf8 =~ $rx ? "ok" : "not ok") . "\n"; > > } > > __END__ > > > > I have a patch for this problem, including a new test case for > t/op/pat.t. Please try and comment. Works for me. But it's Rafael's call. Specifically also once I integrate the regression tests, but before rebuilding perl, I can make the tests pass with ./perl -e '${^RE_TRIE_MAXBUF} = -1; do shift' t/op/pat.t so it has to be a bug in the trie code. (Your script I quoted also passes when run in that fashion) Nicholas Clark
|