
mal at egenix
Jul 2, 2008, 1:39 AM
Post #12 of 14
(739 views)
Permalink
|
|
Re: convert unicode characters to visibly similar ascii characters
[In reply to]
|
|
On 2008-07-01 20:31, Peter Bulychev wrote: > Hello. > > I want to convert unicode character into ascii one. > The method ".encode('ASCII') " can convert only those unicode characters, > which fit into 0..128 range. > > But there are still lots of characters beyond this range, which can be > manually converted to some visibly similar ascii characters. For instance, > there are several quotation marks in unicode, which can be converted into > ascii quotation mark. > > Can this conversion be performed in automatic manner? After googling I've > only found that there exists Unicode database, which stores human-readable > information on notation of all unicode characters ( > ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt). And there also exists > the Python adapter for this database ( > http://docs.python.org/lib/module-unicodedata.html). Using this database I > can do something like `if notation.find('QUOTATION')!=-1:\n\treturn "'"`. I > believe there is more elegant way. Am I right? You could write a codec which translates Unicode into a ASCII lookalike characters, but AFAIK there is no standard for doing this. I guess the best choice is to use the Unicode code point names as basis. These can be accessed via unicodedata.name(). You can then create a mapping which can be processed by the character map codec. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Jul 02 2008) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ 2008-07-07: EuroPython 2008, Vilnius, Lithuania 4 days to go :::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,MacOSX for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 -- http://mail.python.org/mailman/listinfo/python-list
|