Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

[perl #70732] range operator does not use locale to create alphabet

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


perlbug-followup at perl

Nov 22, 2009, 3:06 PM

Post #1 of 7 (339 views)
Permalink
[perl #70732] range operator does not use locale to create alphabet

# New Ticket Created by WK
# Please include the string: [perl #70732]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=70732 >


This is a bug report for perl from wanradt [at] gmail,
generated with the help of perlbug 1.36 running under perl 5.10.0.

-----------------------------------------------------------------
[Please enter your report here]

When trying to make a list of alphabet with range operator (..), it
seems not use locale (et_EE.UTF-8) information, in Estonian alphabet
correct sequence would be: R S Š Z, when in English is R S T U V W X Y
Z

In example code i used lc to make sure, that locale information is got
for character operations:

#!/usr/bin/perl

use strict;
use utf8;
use locale;
use POSIX;
use open ':std', ':locale';

print setlocale( LC_CTYPE ), "\n\n";;

my @real = qw(R S Š Z);
my @fake = 'R'..'Z';
print "@real\n";
print "@fake\n";

__END__


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=medium
---
Site configuration information for perl 5.10.0:

Configured by Debian Project at Thu Oct 1 22:38:45 UTC 2009.

Summary of my perl5 (revision 5 version 10 subversion 0) configuration:
Platform:
osname=linux, osvers=2.6.24-23-server, archname=i486-linux-gnu-thread-multi
uname='linux vernadsky 2.6.24-23-server #1 smp wed apr 1 22:22:14
utc 2009 i686 gnulinux '
config_args='-Dusethreads -Duselargefiles -Dccflags=-DDEBIAN
-Dcccdlflags=-fPIC -Darchname=i486-linux-gnu -Dprefix=/usr
-Dprivlib=/usr/share/perl/5.10 -Darchlib=/usr/lib/perl/5.10
-Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5
-Dvendorarch=/usr/lib/perl5 -Dsiteprefix=/usr/local
-Dsitelib=/usr/local/share/perl/5.10.0
-Dsitearch=/usr/local/lib/perl/5.10.0 -Dman1dir=/usr/share/man/man1
-Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1
-Dsiteman3dir=/usr/local/man/man3 -Dman1ext=1 -Dman3ext=3perl
-Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio
-Uusenm -DDEBUGGING=-g -Doptimize=-O2 -Duseshrplib
-Dlibperl=libperl.so.5.10.0 -Dd_dosuid -des'
hint=recommended, useposix=true, d_sigaction=define
useithreads=define, usemultiplicity=define
useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
use64bitint=undef, use64bitall=undef, uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN
-fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE
-D_FILE_OFFSET_BITS=64',
optimize='-O2 -g',
cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing
-pipe -I/usr/local/include'
ccversion='', gccversion='4.4.1', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='cc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib /usr/lib64
libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
perllibs=-ldl -lm -lpthread -lc -lcrypt
libc=/lib/libc-2.10.1.so, so=so, useshrplib=true, libperl=libperl.so.5.10.0
gnulibc_version='2.10.1'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fPIC', lddlflags='-shared -O2 -g -L/usr/local/lib'

Locally applied patches:


---
@INC for perl 5.10.0:
/etc/perl
/usr/local/lib/perl/5.10.0
/usr/local/share/perl/5.10.0
/usr/lib/perl5
/usr/share/perl5
/usr/lib/perl/5.10
/usr/share/perl/5.10
/usr/local/lib/site_perl
.

---
Environment for perl 5.10.0:
HOME=/home/wanradt
LANG=et_EE.UTF-8
LANGUAGE=
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/home/wanradt/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games
PERL_BADLANG (unset)
SHELL=/bin/bash


--
Kõike hääd,

G


rgs at consttype

Nov 23, 2009, 4:35 AM

Post #2 of 7 (327 views)
Permalink
Re: [perl #70732] range operator does not use locale to create alphabet [In reply to]

2009/11/23 WK <perlbug-followup [at] perl>:
> When trying to make a list of alphabet with range operator (..), it
> seems not use locale (et_EE.UTF-8) information, in Estonian alphabet
> correct sequence would be: R S Š Z, when in English is R S T U V W X Y
> Z

I would say, let's not go there : locales are a dying system, and that
functionality could go in a CPAN module. Most languages use a
different alphabet anyway, diacritics being sometimes considered as
separate letters, sometimes not; ordering sometimes changes, too.


ikegami at adaelis

Nov 23, 2009, 9:07 AM

Post #3 of 7 (319 views)
Permalink
Re: [perl #70732] range operator does not use locale to create alphabet [In reply to]

On Mon, Nov 23, 2009 at 7:35 AM, Rafael Garcia-Suarez <rgs [at] consttype>wrote:

> 2009/11/23 WK <perlbug-followup [at] perl>:
> > When trying to make a list of alphabet with range operator (..), it
> > seems not use locale (et_EE.UTF-8) information, in Estonian alphabet
> > correct sequence would be: R S Š Z, when in English is R S T U V W X Y
> > Z
>
> I would say, let's not go there : locales are a dying system, and that
> functionality could go in a CPAN module. Most languages use a
> different alphabet anyway, diacritics being sometimes considered as
> separate letters, sometimes not; ordering sometimes changes, too.
>

Is that his point? Or are you saying that locale != language?


wanradt at gmail

Jan 17, 2010, 7:52 AM

Post #4 of 7 (267 views)
Permalink
Re: [perl #70732] range operator does not use locale to create alphabet [In reply to]

> rgs [at] consttype wrote:
>
>> I would say, let's not go there : locales are a dying system, and that
>> functionality could go in a CPAN module.

I'm sorry, but i hoped to get any feedback to my report. Until todays
Joe letter i had no clue about some answers to my report.

So, i'd like to ask what to use then instead this "dying system"? For
me locales is one simple solution to use, but i am open to use any
(systematic) other possible way.

We already have (had?) in Perl support to locales. Why break it? I'd
better fix it.

--
Wbr,
Kõike hääd,

WK


rgs at consttype

Jan 18, 2010, 5:31 AM

Post #5 of 7 (268 views)
Permalink
Re: [perl #70732] range operator does not use locale to create alphabet [In reply to]

2010/1/17 WK <wanradt [at] gmail>:
>> rgs [at] consttype wrote:
>>
>>> I would say, let's not go there : locales are a dying system, and that
>>> functionality could go in a CPAN module.
>
> I'm sorry, but i hoped to get any feedback to my report. Until todays
> Joe letter i had no clue about some answers to my report.
>
> So, i'd like to ask what to use then instead this "dying system"? For
> me locales is one simple solution to use, but i am open to use any
> (systematic) other possible way.

You should write code to handle alphabetical ordering respecting the
languages and contexts you want to process. This does not exist in
Perl yet, as far as I know, except for a few languages.

> We already have (had?) in Perl support to locales. Why break it? I'd
> better fix it.

Locales are broken by design. They were invented at a time where all
strings were char*, and all letters were one byte in a given character
set. That got a bit better afterwards, but in a hackish way. So, for a
start, locales won't handle properly languages with letters
represented by more that one character.


davidnicol at gmail

Jan 18, 2010, 1:46 PM

Post #6 of 7 (260 views)
Permalink
Re: [perl #70732] range operator does not use locale to create alphabet [In reply to]

On Sun, Jan 17, 2010 at 9:52 AM, WK <wanradt [at] gmail> wrote:
>
>
> We already have (had?) in Perl support to locales. Why break it? I'd
> better fix it.
>
> --
> Wbr,
> Kõike hääd,
>
> WK

The following has not been tested in any way:

sub ranger(@) {
my @letters = @_;
my %lookup = map { $letters[$_] => $_ } 0 .. $#letters;
sub($$){
@letters[$lookup{$_[0]} .. $lookup{$_[1]}]
}
}

my $EErange = ranger qw { A B C D E F G H I J K L M N O P Q R S Š Z Ž
T U V W Õ Ä Ö Ü X Y };

print "$_\n" for $EErange->('S','Z');


wanradt at gmail

Jan 19, 2010, 7:31 AM

Post #7 of 7 (256 views)
Permalink
Re: [perl #70732] range operator does not use locale to create alphabet [In reply to]

2010/1/18 Rafael Garcia-Suarez <rgs [at] consttype>:

> Locales are broken by design. They were invented at a time where all
> strings were char*, and all letters were one byte in a given character
> set. That got a bit better afterwards, but in a hackish way. So, for a
> start, locales won't handle properly languages with letters
> represented by more that one character.

You are porbably right in many aspects, but i'd like to add some bits
from my view:

- as far we don't have better system replacing locale we are freezing without it

- latin1 is also broken/hack from todays view, so maybe let's throw
this also away and lets everyone uses unicode ;) (meaning: we don't do
something like this and with purpose)

- sorting accepts my locale (i mean UTF8 and multibyte chars too), so
why not range operator ? Such little example works fine:

#!/usr/bin/perl

use strict;
use warnings;
use locale;

use utf8;
binmode STDIN, ":utf8";
binmode STDOUT, ":utf8";

my @a = qw(x y ü ö ä õ ž z š s); # chars are in opposite order

print "$_ " foreach sort @a; # comes: s š z ž õ ä ö ü x y
print "\n";

{
no locale;
print "$_ " foreach sort @a; # comes: s x y z ä õ ö ü š ž
print "\n";
}

- despite i have just average knowledge of Perl, i avoid every hackish
way if possible. I prefer systematic approach. Using locale is
systematic from my point of view, from inside it may be still hack.

- locale may be broken by architecture, but from point of use it is
very convenient to read from environment users preferences for data
formatting rules. So, if we have such mechanism, there is easy way to
swap some day locale to another future solution. And locale contains
so many information besides problem with multi-byte characters...

For my little problem there are certainly workarounds, but those are
just proving the point: shortcut is broken.

--
Wbr,
Kõike hääd,

WK

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.