Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Bricolage: users
Re: Locales, sorting, and character encodings
 

Index | Next | Previous | View Flat


dawn at thetyee

Mar 6, 2010, 7:02 PM


Views: 895
Permalink
Re: Locales, sorting, and character encodings [In reply to]

HI Bret - this looks complicated.

Did you ever get it to work?

Dawn

On 15-Feb-10, at 1:37 PM, Bret Dawson wrote:

> Hi everybody,
>
> I've just been fighting with sorting and alphabetical ordering in
> multiple languages, and I've got things to work, but I'm a little
> puzzled about how. So if anybody has any insight, I'd be grateful.
>
> This is for IFEX, on something called the "Digest." It's a
> regularly-published list of items recently published on the site. You
> can see an example here:
>
> http://www.ifex.org/2010/02/12/digest/
>
> It's a big alphabetical list of regions (OK, "International" is at the
> top), and within each region is an alphabetical list of countries.
>
> I had been doing the alphabetization with the Schwartz, looking up the
> name of each country according to the output channel:
>
> my @alphabetized_cats =
> map { $_->[0] }
> sort { $a->[1] cmp $b->[1] }
> map { [ $_ => $m->scomp('/util/translations.mc', word => $_) ] }
> keys(%all_cats);
>
> (translations.mc maps category URIs to country names based on the
> current OC).
>
> This was mostly fine, except that the vanilla Perl sort is really only
> good for asciibetical order. In Friday's Digest, "Rwanda" was coming
> before "République démocratique du Congo."
>
> So I've been trying to use locales, like this:
>
> my %ocs_to_locales = (
> 'Web (French)' => 'fr_FR.utf8',
> 'Web (Spanish)' => 'es_ES.utf8',
> 'Web (Russian)' => 'ru_RU.utf8',
> 'Web (Arabic)' => 'ar_EG.utf8',
> );
>
> use POSIX;
> use locale;
> if ($ocs_to_locales{$burner->get_oc->get_name}) {
> POSIX::setlocale(LC_COLLATE,
> $ocs_to_locales{$burner->get_oc->get_name});
> }
>
> ...then do the sort, and then add this line afterward:
>
> no locale;
>
>
> Sadly, the utf8 locales seem to have the characters in completely
> nutty
> order. "Rwanda" still came before "République démocratique du Congo."
>
> Dropping the ".utf8" from the French locale name, and using just
> "fr_FR"
> works, though. So I'm full of hope for Spanish and Arabic.
>
> Now, everything in the site is all UTF8, so I'm puzzled about why the
> ".utf8" locales turned out to be bad choices. Does anybody have any
> idea?
>
>
> Thanks,
>
> Bret
>
>
>
> --
> Bret Dawson
> Producer
> Pectopah Productions Inc.
> (416) 895-7635
> bret [at] pectopah
> www.pectopah.com
>

Subject User Time
Locales, sorting, and character encodings bret at pectopah Feb 15, 2010, 10:13 AM
    Locales, sorting, and character encodings bret at pectopah Feb 15, 2010, 10:37 AM
        Re: Locales, sorting, and character encodings dawn at thetyee Mar 6, 2010, 7:02 PM
            Re: Locales, sorting, and character encodings bret at pectopah Mar 7, 2010, 10:11 AM
                Re: Locales, sorting, and character encodings david at kineticode Mar 7, 2010, 10:22 AM
    Re: Locales, sorting, and character encodings bret at pectopah Mar 7, 2010, 12:45 PM
        Re: Locales, sorting, and character encodings david at kineticode Mar 7, 2010, 1:06 PM

  Index | Next | Previous | View Flat
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.