Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Bricolage: users
Re: Locales, sorting, and character encodings
 

Index | Next | Previous | View Flat


bret at pectopah

Mar 7, 2010, 10:11 AM


Views: 1281
Permalink
Re: Locales, sorting, and character encodings [In reply to]

Hi Dawn,


Yes, it works just fine. I was just confused about why.

When you use the ".utf8" locales, characters are sorted in the wrong
order, so that accented letters come at the end of the alphabet.

Drop the extension, though, and just use something like "fr_FR," and
your sorting comes out perfect.

I had sort of expected the opposite to be the case and wondered if
anyone knew why.

But the happy news is that locale-based alphabetical sorting works just
great, provided the locales you need are installed. (Thanks Alex!)


Cheers,

Bret



On Sat, 2010-03-06 at 22:02 -0500, Dawn Buie wrote:
> HI Bret - this looks complicated.
>
> Did you ever get it to work?
>
> Dawn
>
> On 15-Feb-10, at 1:37 PM, Bret Dawson wrote:
>
> > Hi everybody,
> >
> > I've just been fighting with sorting and alphabetical ordering in
> > multiple languages, and I've got things to work, but I'm a little
> > puzzled about how. So if anybody has any insight, I'd be grateful.
> >
> > This is for IFEX, on something called the "Digest." It's a
> > regularly-published list of items recently published on the site. You
> > can see an example here:
> >
> > http://www.ifex.org/2010/02/12/digest/
> >
> > It's a big alphabetical list of regions (OK, "International" is at the
> > top), and within each region is an alphabetical list of countries.
> >
> > I had been doing the alphabetization with the Schwartz, looking up the
> > name of each country according to the output channel:
> >
> > my @alphabetized_cats =
> > map { $_->[0] }
> > sort { $a->[1] cmp $b->[1] }
> > map { [ $_ => $m->scomp('/util/translations.mc', word => $_) ] }
> > keys(%all_cats);
> >
> > (translations.mc maps category URIs to country names based on the
> > current OC).
> >
> > This was mostly fine, except that the vanilla Perl sort is really only
> > good for asciibetical order. In Friday's Digest, "Rwanda" was coming
> > before "République démocratique du Congo."
> >
> > So I've been trying to use locales, like this:
> >
> > my %ocs_to_locales = (
> > 'Web (French)' => 'fr_FR.utf8',
> > 'Web (Spanish)' => 'es_ES.utf8',
> > 'Web (Russian)' => 'ru_RU.utf8',
> > 'Web (Arabic)' => 'ar_EG.utf8',
> > );
> >
> > use POSIX;
> > use locale;
> > if ($ocs_to_locales{$burner->get_oc->get_name}) {
> > POSIX::setlocale(LC_COLLATE,
> > $ocs_to_locales{$burner->get_oc->get_name});
> > }
> >
> > ...then do the sort, and then add this line afterward:
> >
> > no locale;
> >
> >
> > Sadly, the utf8 locales seem to have the characters in completely
> > nutty
> > order. "Rwanda" still came before "République démocratique du Congo."
> >
> > Dropping the ".utf8" from the French locale name, and using just
> > "fr_FR"
> > works, though. So I'm full of hope for Spanish and Arabic.
> >
> > Now, everything in the site is all UTF8, so I'm puzzled about why the
> > ".utf8" locales turned out to be bad choices. Does anybody have any
> > idea?
> >
> >
> > Thanks,
> >
> > Bret
> >
> >
> >
> > --
> > Bret Dawson
> > Producer
> > Pectopah Productions Inc.
> > (416) 895-7635
> > bret [at] pectopah
> > www.pectopah.com
> >
>
>


--
Bret Dawson
Producer
Pectopah Productions Inc.
(416) 895-7635
bret [at] pectopah
www.pectopah.com

Subject User Time
Locales, sorting, and character encodings bret at pectopah Feb 15, 2010, 10:13 AM
    Locales, sorting, and character encodings bret at pectopah Feb 15, 2010, 10:37 AM
        Re: Locales, sorting, and character encodings dawn at thetyee Mar 6, 2010, 7:02 PM
            Re: Locales, sorting, and character encodings bret at pectopah Mar 7, 2010, 10:11 AM
                Re: Locales, sorting, and character encodings david at kineticode Mar 7, 2010, 10:22 AM
    Re: Locales, sorting, and character encodings bret at pectopah Mar 7, 2010, 12:45 PM
        Re: Locales, sorting, and character encodings david at kineticode Mar 7, 2010, 1:06 PM

  Index | Next | Previous | View Flat
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.