
dawn at thetyee
Mar 6, 2010, 7:02 PM
Views: 895
Permalink
|
|
Re: Locales, sorting, and character encodings
[In reply to]
|
|
HI Bret - this looks complicated. Did you ever get it to work? Dawn On 15-Feb-10, at 1:37 PM, Bret Dawson wrote: > Hi everybody, > > I've just been fighting with sorting and alphabetical ordering in > multiple languages, and I've got things to work, but I'm a little > puzzled about how. So if anybody has any insight, I'd be grateful. > > This is for IFEX, on something called the "Digest." It's a > regularly-published list of items recently published on the site. You > can see an example here: > > http://www.ifex.org/2010/02/12/digest/ > > It's a big alphabetical list of regions (OK, "International" is at the > top), and within each region is an alphabetical list of countries. > > I had been doing the alphabetization with the Schwartz, looking up the > name of each country according to the output channel: > > my @alphabetized_cats = > map { $_->[0] } > sort { $a->[1] cmp $b->[1] } > map { [ $_ => $m->scomp('/util/translations.mc', word => $_) ] } > keys(%all_cats); > > (translations.mc maps category URIs to country names based on the > current OC). > > This was mostly fine, except that the vanilla Perl sort is really only > good for asciibetical order. In Friday's Digest, "Rwanda" was coming > before "République démocratique du Congo." > > So I've been trying to use locales, like this: > > my %ocs_to_locales = ( > 'Web (French)' => 'fr_FR.utf8', > 'Web (Spanish)' => 'es_ES.utf8', > 'Web (Russian)' => 'ru_RU.utf8', > 'Web (Arabic)' => 'ar_EG.utf8', > ); > > use POSIX; > use locale; > if ($ocs_to_locales{$burner->get_oc->get_name}) { > POSIX::setlocale(LC_COLLATE, > $ocs_to_locales{$burner->get_oc->get_name}); > } > > ...then do the sort, and then add this line afterward: > > no locale; > > > Sadly, the utf8 locales seem to have the characters in completely > nutty > order. "Rwanda" still came before "République démocratique du Congo." > > Dropping the ".utf8" from the French locale name, and using just > "fr_FR" > works, though. So I'm full of hope for Spanish and Arabic. > > Now, everything in the site is all UTF8, so I'm puzzled about why the > ".utf8" locales turned out to be bad choices. Does anybody have any > idea? > > > Thanks, > > Bret > > > > -- > Bret Dawson > Producer > Pectopah Productions Inc. > (416) 895-7635 > bret [at] pectopah > www.pectopah.com >
|