Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Bugs

[issue2630] repr() should not escape non-ASCII characters

 

 

Python bugs RSS feed   Index | Next | Previous | View Threaded


report at bugs

May 27, 2008, 5:56 AM

Post #1 of 20 (503 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters

atsuo ishimoto <ishimoto[at]users.sourceforge.net> added the comment:

I updated a patch as per latest PEP.

- io.TextIOWrapper doesn't provide API to change error handler
at this time. I should update this patch after the API is
provided.

- This patch contains a fix for Tools/unicode/makeunicodedata.py
in rev 63378.

Added file: http://bugs.python.org/file10447/diff4.txt

__________________________________
Tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
__________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

May 28, 2008, 12:39 AM

Post #2 of 20 (486 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Atsuo Ishimoto <ishimoto[at]gembook.org> added the comment:

docdiff1.txt contains a documentation for functions I added.

Added file: http://bugs.python.org/file10456/docdiff1.txt

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 1, 2008, 5:53 AM

Post #3 of 20 (467 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Atsuo Ishimoto <ishimoto[at]gembook.org> added the comment:

diff5.txt contains both code and documentation patch for PEP 3138.

- In this patch, default error-handler of sys.stdout is always 'strict'.

Added file: http://bugs.python.org/file10491/diff5.txt

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 3, 2008, 3:13 AM

Post #4 of 20 (466 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Georg Brandl <georg[at]python.org> added the comment:

Review:

* Why is an empty string not printable? In any case, the empty string
should be among the test cases for isprintable().

* Why not use PyUnicode_DecodeASCII instead of
PyUnicode_FromEncodedObject? It should be a bit faster.

* If old-style string formatting gets "%a", .format() must get a "!a"
specifier.

* The ascii() and repr() tests should be expanded so that both test the
same set of objects, and the expected differences. Are there tests for
failing cases?

* This is just "return ascii" (in builtin_ascii):
+ if (ascii == NULL)
+ return NULL;
+
+ return ascii;

* For PyBool_FromLong(1) and PyBool_FromLong(0) there is Py_RETURN_TRUE
and Py_RETURN_FALSE. (You're not to blame, the rest of unicodeobject.c
seems to use them too, probably a legacy.)

* There appear to be some space indentations in tab-indented files like
bltinmodule.c and vice versa (unicodeobject.c).

* C docs/isprintable() docs: The spec
+ Characters defined in the Unicode character database as "Other"
+ or "Separator" other than ASCII space(0x20) are not considered
+ printable.
is unclear, better say "All character except those ... are considered
printable".

* ascii() docs:
+ the non-ASCII
+ characters in the string returned by :func:`ascii`() are hex-escaped
+ to generate a same string as :func:`repr` in Python 2.

should be

"the non-ASCII characters in the string returned by :func:`repr` are
backslash-escaped (with ``\x``, ``\u`` or ``\U``) to generate ...".

* makeunicodedata: len(list(n for n in names if n is not None)) could
better be expressed as sum(1 for n in names if n is not None).

Otherwise, the patch is fine IMO. (I'm surprised that only so few tests
needed adaptation, that's a sign that we're not testing Unicode enough.)

----------
nosy: +georg.brandl

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 3, 2008, 3:31 AM

Post #5 of 20 (464 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Georg Brandl <georg[at]python.org> added the comment:

One more thing: with r63891 the encoding and errors arguments for the
creation of sys.stderr were made configurable; you'll have to adapt the
patch so that it defaults to backslashescape but can be overridden by
PYTHONIOENCODING.

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 3, 2008, 3:33 AM

Post #6 of 20 (462 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Atsuo Ishimoto <ishimoto[at]gembook.org> added the comment:

This patch contains following changes.

- Added the new C API PyObject_ASCII() for consistency.
- Added the new string formatting operater for str.format() and
PyUnicode_FromFormat.

Added file: http://bugs.python.org/file10507/diff6.txt

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 3, 2008, 4:00 AM

Post #7 of 20 (460 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Atsuo Ishimoto <ishimoto[at]gembook.org> added the comment:

Thank you for your review!
I filed a new patch just before I see your comments.

On Tue, Jun 3, 2008 at 7:13 PM, Georg Brandl <report[at]bugs.python.org> wrote:
>
> Georg Brandl <georg[at]python.org> added the comment:
>
> Review:
>
> * Why is an empty string not printable? In any case, the empty string
> should be among the test cases for isprintable().

Well, my intuition came from str.islower() was wrong. An empty string is
printable, of cource.

> * Why not use PyUnicode_DecodeASCII instead of
> PyUnicode_FromEncodedObject? It should be a bit faster.
>

Okay, thank you.

> * If old-style string formatting gets "%a", .format() must get a "!a"
> specifier.
>
I added the format string in my latest patch.

> * The ascii() and repr() tests should be expanded so that both test the
> same set of objects, and the expected differences. Are there tests for
> failing cases?
>

Okay, thank you.

> * This is just "return ascii" (in builtin_ascii):
> + if (ascii == NULL)
> + return NULL;
> +
> + return ascii;

Fixed in my latest patch.

>
> * For PyBool_FromLong(1) and PyBool_FromLong(0) there is Py_RETURN_TRUE
> and Py_RETURN_FALSE. (You're not to blame, the rest of unicodeobject.c
> seems to use them too, probably a legacy.)

Okay, thank you.

>
> * There appear to be some space indentations in tab-indented files like
> bltinmodule.c and vice versa (unicodeobject.c).
>

I think bltinmodule.c is fixed with latest patch, but I don't know what
is correct indentation for unicodeobject.c. I guess latest patch is
acceptable.

> * C docs/isprintable() docs: The spec
> + Characters defined in the Unicode character database as "Other"
> + or "Separator" other than ASCII space(0x20) are not considered
> + printable.
> is unclear, better say "All character except those ... are considered
> printable".
>
> * ascii() docs:
> + the non-ASCII
> + characters in the string returned by :func:`ascii`() are hex-escaped
> + to generate a same string as :func:`repr` in Python 2.
>
> should be
>
> "the non-ASCII characters in the string returned by :func:`repr` are
> backslash-escaped (with ``\x``, ``\u`` or ``\U``) to generate ...".
>

Okay, thank you.

> * makeunicodedata: len(list(n for n in names if n is not None)) could
> better be expressed as sum(1 for n in names if n is not None).

I don't want to change here, because this is reversion of rev 63378.

> One more thing: with r63891 the encoding and errors arguments for the
> creation of sys.stderr were made configurable; you'll have to adapt the
> patch so that it defaults to backslashescape but can be overridden by
> PYTHONIOENCODING.

I think sys.stderr should be default to 'backslashreplace' always. I'll
post a messege to Py3k-list later.

>
> Otherwise, the patch is fine IMO. (I'm surprised that only so few tests
> needed adaptation, that's a sign that we're not testing Unicode enough.)
>

Thank you very much! I'll file new patch soon.

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 3, 2008, 4:06 AM

Post #8 of 20 (463 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Atsuo Ishimoto <ishimoto[at]gembook.org> added the comment:

BTW, are new C APIs and functions should be ported to Python 2.6 for
compatibility, without modifing repr() itself? If so, I'll prepare a
patch for Python 2.6.

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 3, 2008, 4:10 AM

Post #9 of 20 (469 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Georg Brandl <georg[at]python.org> added the comment:

ascii() should probably be in future_builtins.

Whether the C API stuff and .isprintable() should be backported to 2.6
is something for Guido to decide.

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 3, 2008, 10:50 AM

Post #10 of 20 (462 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Atsuo Ishimoto <ishimoto[at]gembook.org> added the comment:

I updated the patch as per Georg's advice.

Added file: http://bugs.python.org/file10511/diff7.txt

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 3, 2008, 10:57 AM

Post #11 of 20 (468 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Changes by Atsuo Ishimoto <ishimoto[at]gembook.org>:


Removed file: http://bugs.python.org/file10511/diff7.txt

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 3, 2008, 11:05 AM

Post #12 of 20 (462 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Atsuo Ishimoto <ishimoto[at]gembook.org> added the comment:

I'm sorry, I missed a file to be uploaded. diff7_1.txt is correct file.

Added file: http://bugs.python.org/file10512/diff7_1.txt

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 3, 2008, 11:48 AM

Post #13 of 20 (462 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Guido van Rossum <guido[at]python.org> added the comment:

> Whether the C API stuff and .isprintable() should be backported to 2.6
> is something for Guido to decide.

No way -- while all of this makes sense in Py3k, where all strings are
Unicode, it would cause no end of problems in 2.6, and it would break
backward compatibility badly.

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 3, 2008, 12:06 PM

Post #14 of 20 (462 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Changes by Eric Smith <eric[at]trueblade.com>:


----------
nosy: +eric.smith

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 4, 2008, 10:52 AM

Post #15 of 20 (447 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Atsuo Ishimoto <ishimoto[at]gembook.org> added the comment:

stringlib can be compiled for Python 2.6 now, but the '!a' converter is
disabled by #ifdef for now.

Added file: http://bugs.python.org/file10518/diff8.patch

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 4, 2008, 2:30 PM

Post #16 of 20 (441 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Antoine Pitrou <pitrou[at]free.fr> added the comment:

Shall the method be called isprintable() or simply printable()? For the
record, in the io classes, the writable()/readable() convention was chosen.

----------
nosy: +pitrou

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 4, 2008, 2:34 PM

Post #17 of 20 (446 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Georg Brandl <georg[at]python.org> added the comment:

I would expect "abc".isprintable() give me a bool and "abc".printable()
to return a printable string, as with "abc".lower() and "abc".islower().

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 4, 2008, 2:36 PM

Post #18 of 20 (443 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Antoine Pitrou <pitrou[at]free.fr> added the comment:

You are right, I had forgotton about lower()/islower().

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 11, 2008, 11:38 AM

Post #19 of 20 (397 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Georg Brandl <georg[at]python.org> added the comment:

Patch committed to Py3k branch in r64138. Thanks all!

----------
resolution: -> accepted
status: open -> closed

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Jun 11, 2008, 7:44 PM

Post #20 of 20 (396 views)
Permalink
[issue2630] repr() should not escape non-ASCII characters [In reply to]

Atsuo Ishimoto <ishimoto[at]gembook.org> added the comment:

Great, thank you!

_______________________________________
Python tracker <report[at]bugs.python.org>
<http://bugs.python.org/issue2630>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

Python bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.