
perlbug-followup at perl
Aug 12, 2013, 2:20 PM
Post #9 of 10
(8 views)
Permalink
|
|
[perl #117355] [lu]cfirst don't respect 'use bytes'
[In reply to]
|
|
sorry, RT corrupted this character in code (even RT don't like Latin1 chars, unlike "wide" chars, like Cyrilic !). I meant this one http://www.fileformat.info/info/unicode/char/da/index.htm On Mon Aug 12 14:17:19 2013, vsespb wrote: > On Mon Aug 12 13:57:25 2013, ikegami [at] adaelis wrote: > > On Mon, Aug 12, 2013 at 4:08 PM, Victor Efimov via RT < > > perlbug-followup [at] perl> wrote: > > > > > > sub try_drop_utf8_flag > > > { > > > Encode::_utf8_off($_[0]) if utf8::is_utf8($_[0]) && > > > (bytes::length($_[0]) == length($_[0])); > > > } > > > > > > That's just C<< utf8::downgrade($_[0], 1) >> > > Yes, you are right, except one small difference. > For characters > 127, but <= 255 it works different way. > Thus it cannot be used, when strings are filenames (like in example > above, also another example below). > > (That's btw exactly like I work with it in my program > https://github.com/vsespb/mt-aws-glacier - read millions of filenames, > split, try drop utf-8 flags, and process with regexps) > > use bytes (); > use utf8; > binmode STDOUT, ":encoding(utf-8)"; > use Devel::Peek; > sub try_drop_utf8_flag > { > Encode::_utf8_off($_[0]) if utf8::is_utf8($_[0]) && > (bytes::length($_[0]) == length($_[0])); > } > sub do_downgrade > { > utf8::downgrade($_[0], 1) > } > my $s = "�"; > my $s1 = $s; > try_drop_utf8_flag($s1); > my $s2 = $s; > do_downgrade($s2); > Dump($s1); > Dump($s2); > > > die unless $s1 eq $s2; > > open my $f, ">", "$s1.tmp"; > binmode $f; > syswrite $f, "TEST"; > close $f; > > open $f, "<", "$s2.tmp" or die "file not found $!"; > > > __END__ > SV = PVMG(0xfc00a0) at 0xfc1440 > REFCNT = 1 > FLAGS = (PADMY,SMG,POK,pPOK,UTF8) > IV = 0 > NV = 0 > PV = 0x1042b90 "\303\272"\0 [UTF8 "\x{fa}"] > CUR = 2 > LEN = 8 > MAGIC = 0x1094090 > MG_VIRTUAL = &PL_vtbl_utf8 > MG_TYPE = PERL_MAGIC_utf8(w) > MG_LEN = 1 > SV = PV(0xfd6538) at 0xfc1488 > REFCNT = 1 > FLAGS = (PADMY,POK,pPOK) > PV = 0xfdccd0 "\372"\0 > CUR = 1 > LEN = 8 > file not found No such file or directory at bench3-poc.pl line 29. --- via perlbug: queue: perl5 status: open https://rt.perl.org:443/rt3/Ticket/Display.html?id=117355
|