Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Bugs

[issue14304] Implement utf-8-bmp codec

 

 

Python bugs RSS feed   Index | Next | Previous | View Threaded


report at bugs

Mar 31, 2012, 4:07 PM

Post #1 of 24 (312 views)
Permalink
[issue14304] Implement utf-8-bmp codec

Antoine Pitrou <pitrou [at] free> added the comment:

The solution outlined in the issue title ("utf-8-bmp codec") sounds like a rather dubious idea.

----------
nosy: +loewis, pitrou

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Mar 31, 2012, 6:35 PM

Post #2 of 24 (299 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Martin v. Löwis <martin [at] v> added the comment:

pitrou: can you elaborate?

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 1, 2012, 12:38 AM

Post #3 of 24 (300 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

''.join(c if ord(c) < 0x10000 else escape(c) for c in s)

----------
nosy: +storchaka

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 13, 2012, 9:31 PM

Post #4 of 24 (295 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Changes by Ezio Melotti <ezio.melotti [at] gmail>:


----------
nosy: +ezio.melotti

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 15, 2012, 2:59 PM

Post #5 of 24 (287 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

STINNER Victor <victor.stinner [at] gmail> added the comment:

What is this codec? What do you mean by "escpe non-ascii"?

----------
nosy: +haypo

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 16, 2012, 5:45 AM

Post #6 of 24 (285 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Martin v. Löwis <martin [at] v> added the comment:

This codec is one that is equal to UTF-8, but restricted to the BMP. For non-BMP character, the error handler is called. It will be the stdout codec for the IDLE interactive shell, causing non-BMP results to be ascii() escaped.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 16, 2012, 5:50 AM

Post #7 of 24 (289 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Andrew Svetlov <andrew.svetlov [at] gmail> added the comment:

Tkinter (as Tcl itself) has no support of non-BMP characters in any form.
It looks like support of UTF-16 without surrogates.
I like to implement codec for that which will process different error modes (strict, replace, ignore etc) as well as others codecs does.

It will allow to support BMP well and control processing of non-BMP in IDLE.

About your second question.
IDLE has interactive shell. This shell in REPL will try to print expression result. It it contains non-BMP whole result is converted to ASCII with escaping. It's different from standard python console. From my perspective expected behavior is to pass BMP chars and escape only non-BMP.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 16, 2012, 7:56 AM

Post #8 of 24 (287 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

Example:

>>> '\u0100'
'Ā'
>>> '\u0100\U00010000'
'\u0100\U00010000'
>>> print('\u0100')
Ā
>>> print('\u0100\U00010000')
Traceback (most recent call last):
File "<pyshell#33>", line 1, in <module>
print('\u0100\U00010000')
UnicodeEncodeError: 'UCS-2' codec can't encode characters in position 1-1: Non-BMP character not supported in Tk

But I think that it is too specific problem and too specific solution. It would be better if IDLE itself escapes the string in the most appropriate way.

def utf8bmp_encode(s):
return ''.join(c if ord(c) <= 0xffff else '\\U%08x' % ord(c) for c in s).encode('utf-8')

or

def utf8bmp_encode(s):
return re.sub('[^\x00-\uffff]', lambda m: '\\U%08x' % ord(m.group()), s).encode('utf-8')

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 16, 2012, 8:28 AM

Post #9 of 24 (284 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Andrew Svetlov <andrew.svetlov [at] gmail> added the comment:

The way is named 'codec'.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 16, 2012, 8:35 AM

Post #10 of 24 (286 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Martin v. Löwis <martin [at] v> added the comment:

> But I think that it is too specific problem and too specific
> solution. It would be better if IDLE itself escapes the string in the
> most appropriate way.

That is not implementable correctly. If you think otherwise, please
submit a patch. If not, please trust me on that judgment.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 16, 2012, 10:13 AM

Post #11 of 24 (286 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

May be I did not correctly understand the problem, but I can assume,
that this patch solves it.

'Агов!\U00010000'

----------
keywords: +patch
Added file: http://bugs.python.org/file25244/idle_escape_nonbmp.patch

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
Attachments: idle_escape_nonbmp.patch (0.64 KB)


report at bugs

Apr 16, 2012, 10:32 AM

Post #12 of 24 (284 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

Sorry, the mail daemon has eaten a piece of example.

>>> '\u0410\u0433\u043e\u0432!\U00010000'
'Агов!\U00010000'

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 27, 2012, 2:48 PM

Post #13 of 24 (270 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

Andrew, the patch solves your issue?

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 28, 2012, 10:43 AM

Post #14 of 24 (268 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Martin v. Löwis <martin [at] v> added the comment:

The patch is incorrect, i.e. it deviates from what the command line interface does. When you try to write to sys.stdout, and the characters are not supported you get UnicodeError. Only when it is interactive mode, and tries to represent some result, ascii escaping happens.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 28, 2012, 11:32 AM

Post #15 of 24 (268 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

I don't see what the patch worse than the current behavior.

Unpatched:
>>> ''.join(map(chr, [76, 246, 119, 105, 115]))
'Löwis'
>>> ''.join(map(chr, [76, 246, 119, 105, 115, 65536]))
'L\xf6wis\U00010000'

Patched:
>>> ''.join(map(chr, [76, 246, 119, 105, 115]))
'Löwis'
>>> ''.join(map(chr, [76, 246, 119, 105, 115, 65536]))
'Löwis\U00010000'

In the case of the Cyrillic alphabet all text becomes unreadable, if there are some non-bmp characters in it.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 28, 2012, 1:34 PM

Post #16 of 24 (269 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Martin v. Löwis <martin [at] v> added the comment:

> In the case of the Cyrillic alphabet all text becomes unreadable, if
> there are some non-bmp characters in it.

And indeed, that's the correct, desired behavior, as it models what the
interactive shell does.

If you want to change this, you need to also change the interactive console,
which is an issue independent of this one.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 28, 2012, 1:58 PM

Post #17 of 24 (270 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Martin v. Löwis <martin [at] v> added the comment:

I take that back; the interactive shell uses the backslashescape error handler.

Still, I don't think IDLE should setup a displayhook in the first place. What if an application replaces the displayhook?

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 28, 2012, 2:39 PM

Post #18 of 24 (267 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

> Still, I don't think IDLE should setup a displayhook in the first place. What if an application replaces the displayhook?

IDLE *is* the application.

If another application that uses the idlelib, replace displayhook, it
must itself to worry about the correct encoding and escaping.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 28, 2012, 3:07 PM

Post #19 of 24 (268 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Andrew Svetlov <andrew.svetlov [at] gmail> added the comment:

Serhiy, I like to fix tkinter itself, not only IDLE.
There are other problems like idle is crashing if non-bmp char will be pasted from clipboard.
Moreover, non-bmp behavior is different from one Tk widget to other.
I still want to make codec for it and then try to solve tk problems.
Maybe solution will force to extend tkinter interface for process codec errors with reasonable well specified default behavior.
Sorry for my silence. I hope to make some progress next weeks.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 28, 2012, 3:30 PM

Post #20 of 24 (266 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Martin v. Löwis <martin [at] v> added the comment:

> IDLE *is* the application.

No, IDLE is the development environment. The application is
whatever is being developed with IDLE.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 28, 2012, 3:51 PM

Post #21 of 24 (271 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

I don't understand how the utf-8-bmp codec will help to fix the tkinter. To fix the tkinter, you need to fix the Tcl/Tk, but it is outside of Python. While Tcl does not support non-bmp characters, correct and non-ambiguous working with non-bmp characters is not possible. You should choose the method of encoding of non-bmp characters and these methods will be different for different applications.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 28, 2012, 4:04 PM

Post #22 of 24 (269 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

> No, IDLE is the development environment. The application is
> whatever is being developed with IDLE.

If the application replaces the displayhook, than it is the development
environment too.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 28, 2012, 10:52 PM

Post #23 of 24 (268 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Serhiy Storchaka <storchaka [at] gmail> added the comment:

Andrew, imagine that the utf-8-bmp codec is already there (I will do it
for you, if I see its necessity). How are you going to use it? Show a
patch that fixes IDLE and tkinter using this codec. It seems to me that
any result can be achieved without the codec, and not higher cost. And
that's not counting cost of the codec itself.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Apr 30, 2012, 6:38 AM

Post #24 of 24 (269 views)
Permalink
[issue14304] Implement utf-8-bmp codec [In reply to]

Changes by Arfrever Frehtes Taifersar Arahesis <Arfrever.FTA [at] GMail>:


----------
nosy: +Arfrever

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue14304>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

Python bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.