
rgarciasuarez at gmail
May 21, 2008, 1:01 AM
Post #22 of 30
(436 views)
Permalink
|
|
Re: On the problem of strings and binary data in Perl.
[In reply to]
|
|
2008/5/21 Glenn Linderman <perl[at]nevcal.com>: >> But calculus operates on homogenous quantities. Text can be multi-lingual. >> How about uc($chunk_of_a_german_turkish_dictionary) ? It seems that the >> rules to apply belong to the operator and not to the string. > > > I'm not sure what you are saying here... it seems to me that the rules apply > to the operator, but are affected by the knowledge of the string you wish to > operate on. > > And regarding calculus, I could envision doing vector and/or matrix > operations on vectors and/or matrices that contain different units for each > element... which is about the same thing as multi-lingual strings... But down to the simple element calculus units should match. And to keep things complex, a true unit system will know that when you multiply, say, a mass by a surface and divide by the square of a duration, you get an energy. That's why unit type systems are best kept as specialized modules, right ? :) > So for your uc($chunk_of_a_german_turkish_dictionary), I assume that > $chunk_of_a_german_turkish_dictionary is a "multi-lingual string object", Actually not what I had in mind. No size fits all. In those cases I expect the careful programmer to use his knowledge of the format of the string (XML, whatever) to split it in chunks that are monolingual and thus upper-caseable separately. But then of course CPAN could hold classes that implement multi-lingual strings : > containing an ordered collection of text fragments represented as a Perl > string, and that each fragment has corresponding meta data describing its > language (and possibly dialect). I assume that uc can be and is overloaded > by the "multi-lingual string class", and that the native Perl uc operator is > invoked for the Unicode version of the string and including the appropriate > pragma, option, or parameter to the operator to make it use the appropriate > rules from SpecialCasing.txt > > The multi-lingual string class may understand how to parse data in some XML > format such as > > <locale language="German">Sprechen Sie Deutsch?</locale> > <locale language="Turkish">Türkçe biliyor musun?</locale> > > and/or other formats such as those defined by ODF, or perhaps even some > proprietary word processor systems. > > > But most programs and programmers aren't going to need to deal with that; > they'll set the values of the lexical pragmas to the appropriate language at > the top of their applications, and their program will appropriately handle > the language for the default (or configured) locale. No need for > multi-lingual string classes, overloaded string operators, or > internally-tagged, tree-structured Perl strings. People that need them can > make them, or (maybe someday) obtain them from CPAN. CPAN already has tools > to parse the XML, and even some for ODF, so the job of creating the > multi-lingual string class is partially complete. > > The important job for Perl-the-language is to fix the bugs with the > presently exposed string storage format, to fix the other bugs where > character semantics are inappropriately provided (chr, for example), provide > features and facilities to normalize and validate Unicode for various > purposes, and enhance the available operators with options, parameters or > pragmas, that allow them to implement the necessary Unicode semantics for > the specified language. > > If, after all those building blocks are in place, the language wants to > implement a "multilingual string type" that has internal tags that override > the pragmas, that might be nice, but it seems to me a poor use of CORE > development time, when at the moment, we can't even write programs to handle > one language at a time with proper semantics, and we can't even open and > read all the files under Windows, because of them having non-ANSI characters > in their names. > > That said, if there is something about multi-lingual strings that can't be > done as an object, I'm all ears.
|