
steve.m.hay at googlemail
Aug 2, 2012, 2:38 PM
Post #5 of 20
(333 views)
Permalink
|
On 2 August 2012 19:18, Craig A. Berry <craig.a.berry [at] gmail> wrote: > On Thu, Aug 2, 2012 at 12:02 PM, Steve Hay <Steve.Hay [at] verosoftware> wrote: > > >> I will start with bisecting to find where it was introduced >> (unfortunately I've not been doing many smokes recently, and I think it >> started while I wasn't smoking...). > > My bet would be on: > > <http://perl5.git.perl.org/perl.git/commitdiff/613c63b465f01af4e535fdc6c1c17e7470be5aad> Yes, you're spot on. The failing tests were added by that commit, and all tests are successful in the previous commit. As I found before, the results depend on the cmd.exe shell's current code page, which is the OEM one by default and therefore doesn't match the ANSI code page which perl.exe uses as its native character set... The following samples reproduce and explain the first test failure (#154 - 'ENV store downgrades utf8 in setenv'): First, with my system's default (OEM) code page (cp850). The byte \x{A0} in $b is manually upgraded to utf8 in $c (being interpreted as the character U+00A0 because that's the character found at byte point 0xA0 in perl.exe's native character set, cp1252, on my system -- see http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT), then downgraded again when stored in %ENV, so $f comes out identical to $b (so test #153 'ENV store downgrades utf8 in SV' passes), but when the output of cmd.exe's 'set' command is inspected it is found to contain the byte \x{FF} instead of \x{A0}. This is because cmd.exe is using cp850, in which character U+00A0 is found at byte point 0xFF (see http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP850.TXT): perl.exe -MDevel::Peek -e "$b = qq[\x{A0}]; utf8::upgrade($c = $b); $f = $ENV{foo} = $c; $e = `set foo`; Dump($b); Dump($c); Dump($f); Dump($e)" SV = PV(0x8f61c4) at 0x22a6d24 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x22ae0dc "\240"\0 CUR = 1 LEN = 12 SV = PV(0x8f61ec) at 0x22a6d64 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x22ae1cc "\302\240"\0 [UTF8 "\x{a0}"] CUR = 2 LEN = 12 SV = PVMG(0x8ffbac) at 0x22a6d54 REFCNT = 1 FLAGS = (POK,pPOK) IV = 0 NV = 0 PV = 0x22dd334 "\240"\0 CUR = 1 LEN = 12 SV = PV(0x8f6274) at 0x22a7064 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x22dd244 "foo=\377\n"\0 CUR = 6 LEN = 12 If I switch the cmd.exe code page to cp1252 to match perl.exe the it all works fine because then there is no mix-up between perl.exe and cmd.exe about which character is meant by byte \x{A0}: perl.exe -MDevel::Peek -e "$b = qq[\x{A0}]; utf8::upgrade($c = $b); $f = $ENV{foo} = $c; $e = `set foo`; Dump($b); Dump($c); Dump($f); Dump($e)" SV = PV(0x9f61c4) at 0x5c6d24 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x5ce0dc "\240"\0 CUR = 1 LEN = 12 SV = PV(0x9f61ec) at 0x5c6d64 REFCNT = 1 FLAGS = (POK,pPOK,UTF8) PV = 0x5ce1cc "\302\240"\0 [UTF8 "\x{a0}"] CUR = 2 LEN = 12 SV = PVMG(0x9ffbac) at 0x5c6d54 REFCNT = 1 FLAGS = (POK,pPOK) IV = 0 NV = 0 PV = 0x5fd334 "\240"\0 CUR = 1 LEN = 12 SV = PV(0x9f6274) at 0x5c7064 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x5fd244 "foo=\240\n"\0 CUR = 6 LEN = 12 That's what I think is going on, and probably the other failure is similar since switching cmd.exe code page also fixes that. However, I'm not sure what is best to do about this. It is possible to get the OEM and ANSI code pages on Windows and convert the OEM data received from 'set' back to ANSI before comparing with the expected value, but that clearly won't fix failures on other OSes...
|