
bret at pectopah
Mar 7, 2010, 10:11 AM
Post #4 of 7
(897 views)
Permalink
|
|
Re: Locales, sorting, and character encodings
[In reply to]
|
|
Hi Dawn, Yes, it works just fine. I was just confused about why. When you use the ".utf8" locales, characters are sorted in the wrong order, so that accented letters come at the end of the alphabet. Drop the extension, though, and just use something like "fr_FR," and your sorting comes out perfect. I had sort of expected the opposite to be the case and wondered if anyone knew why. But the happy news is that locale-based alphabetical sorting works just great, provided the locales you need are installed. (Thanks Alex!) Cheers, Bret On Sat, 2010-03-06 at 22:02 -0500, Dawn Buie wrote: > HI Bret - this looks complicated. > > Did you ever get it to work? > > Dawn > > On 15-Feb-10, at 1:37 PM, Bret Dawson wrote: > > > Hi everybody, > > > > I've just been fighting with sorting and alphabetical ordering in > > multiple languages, and I've got things to work, but I'm a little > > puzzled about how. So if anybody has any insight, I'd be grateful. > > > > This is for IFEX, on something called the "Digest." It's a > > regularly-published list of items recently published on the site. You > > can see an example here: > > > > http://www.ifex.org/2010/02/12/digest/ > > > > It's a big alphabetical list of regions (OK, "International" is at the > > top), and within each region is an alphabetical list of countries. > > > > I had been doing the alphabetization with the Schwartz, looking up the > > name of each country according to the output channel: > > > > my @alphabetized_cats = > > map { $_->[0] } > > sort { $a->[1] cmp $b->[1] } > > map { [ $_ => $m->scomp('/util/translations.mc', word => $_) ] } > > keys(%all_cats); > > > > (translations.mc maps category URIs to country names based on the > > current OC). > > > > This was mostly fine, except that the vanilla Perl sort is really only > > good for asciibetical order. In Friday's Digest, "Rwanda" was coming > > before "République démocratique du Congo." > > > > So I've been trying to use locales, like this: > > > > my %ocs_to_locales = ( > > 'Web (French)' => 'fr_FR.utf8', > > 'Web (Spanish)' => 'es_ES.utf8', > > 'Web (Russian)' => 'ru_RU.utf8', > > 'Web (Arabic)' => 'ar_EG.utf8', > > ); > > > > use POSIX; > > use locale; > > if ($ocs_to_locales{$burner->get_oc->get_name}) { > > POSIX::setlocale(LC_COLLATE, > > $ocs_to_locales{$burner->get_oc->get_name}); > > } > > > > ...then do the sort, and then add this line afterward: > > > > no locale; > > > > > > Sadly, the utf8 locales seem to have the characters in completely > > nutty > > order. "Rwanda" still came before "République démocratique du Congo." > > > > Dropping the ".utf8" from the French locale name, and using just > > "fr_FR" > > works, though. So I'm full of hope for Spanish and Arabic. > > > > Now, everything in the site is all UTF8, so I'm puzzled about why the > > ".utf8" locales turned out to be bad choices. Does anybody have any > > idea? > > > > > > Thanks, > > > > Bret > > > > > > > > -- > > Bret Dawson > > Producer > > Pectopah Productions Inc. > > (416) 895-7635 > > bret [at] pectopah > > www.pectopah.com > > > > -- Bret Dawson Producer Pectopah Productions Inc. (416) 895-7635 bret [at] pectopah www.pectopah.com
|