Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

deleting setdefaultencoding iin site.py is evil

 

 

First page Previous page 1 2 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded


chris at simplistix

Aug 25, 2009, 9:08 AM

Post #1 of 26 (2256 views)
Permalink
deleting setdefaultencoding iin site.py is evil

Hi All,

Would anyone object if I removed the deletion of of
sys.setdefaultencoding in site.py?

I'm guessing "yes!" so thought I'd state my reasons now:

This deletion appears to be pretty flimsy; reload(sys) and you have it
back. Which is lucky, because I need it after it's been deleted...

Why? Well, because you can no longer put sitecustomize.py in a
project-specific location (http://bugs.python.org/issue1734860) and
because for some projects the only way I can deal with encoded strings
sensibly is to use setdefaultencoding, in my case at the start of a
script generated by zc.buildout's zc.recipe.egg (I *know* all the
encodings in this project are utf-8, but I don't want to go playing
whack-a-mole with whatever modules this rather large project uses that
haven't been made properly unicode aware).

Yes, it needs to be used as early as possible, and the docs should say
this, but deleting it seems to be petty in terms of stopping its use
when sitecustomize.py is too early and too system-wide and spraying
.decode('utf-8')'s all over a code base made up of a load of eggs
managed by buildout simply isn't feasible...

Thoughts?

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


exarkun at twistedmatrix

Aug 25, 2009, 9:23 AM

Post #2 of 26 (2205 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

On 04:08 pm, chris [at] simplistix wrote:
>Hi All,
>
>Would anyone object if I removed the deletion of of
>sys.setdefaultencoding in site.py?
>
>I'm guessing "yes!" so thought I'd state my reasons now:
>
>This deletion appears to be pretty flimsy; reload(sys) and you have it
>back. Which is lucky, because I need it after it's been deleted...

The ability to change the default encoding is a misfeature. There's
essentially no way to write correct Python code in the presence of this
feature.

Using setdefaultencoding is never the sensible way to deal with encoded
strings. Actually exposing this function in the sys module would lead
all kinds of people who haven't fully grasped the way str, unicode, and
encodings work to doing horrible things to create broken programs. It's
bad enough that it's already possible to get this function back with the
reload(sys) trick.
>
>Why? Well, because you can no longer put sitecustomize.py in a project-
>specific location (http://bugs.python.org/issue1734860) and because for
>some projects the only way I can deal with encoded strings sensibly is
>to use setdefaultencoding, in my case at the start of a script
>generated by zc.buildout's zc.recipe.egg (I *know* all the encodings in
>this project are utf-8, but I don't want to go playing whack-a-mole
>with whatever modules this rather large project uses that haven't been
>made properly unicode aware).
>
>Yes, it needs to be used as early as possible, and the docs should say
>this, but deleting it seems to be petty in terms of stopping its use
>when sitecustomize.py is too early and too system-wide and spraying
>.decode('utf-8')'s all over a code base made up of a load of eggs
>managed by buildout simply isn't feasible...
>
>Thoughts?

It may be a major task, but the best thing you can do is find each str
and unicode operation in the software you're working with and make them
correct with respect to your inputs and outputs. Flipping a giant
switch for the entire process is just going to change which things are
wrong.

Jean-Paul
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


mal at egenix

Aug 25, 2009, 9:49 AM

Post #3 of 26 (2208 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

Chris Withers wrote:
> Hi All,
>
> Would anyone object if I removed the deletion of of
> sys.setdefaultencoding in site.py?
>
> I'm guessing "yes!" so thought I'd state my reasons now:
>
> This deletion appears to be pretty flimsy; reload(sys) and you have it
> back. Which is lucky, because I need it after it's been deleted...
>
> Why? Well, because you can no longer put sitecustomize.py in a
> project-specific location (http://bugs.python.org/issue1734860) and
> because for some projects the only way I can deal with encoded strings
> sensibly is to use setdefaultencoding, in my case at the start of a
> script generated by zc.buildout's zc.recipe.egg (I *know* all the
> encodings in this project are utf-8, but I don't want to go playing
> whack-a-mole with whatever modules this rather large project uses that
> haven't been made properly unicode aware).
>
> Yes, it needs to be used as early as possible, and the docs should say
> this, but deleting it seems to be petty in terms of stopping its use
> when sitecustomize.py is too early and too system-wide and spraying
> .decode('utf-8')'s all over a code base made up of a load of eggs
> managed by buildout simply isn't feasible...
>
> Thoughts?

Let's look at this from another angle: sys.setdefaultencoding()
is only made available for use in site.py. This is documented
and by design (since a site may want to set the default encoding
based on the locale or to "utf-8").

If you use it anywhere else, you're on your own. Such usage
is not supported and may very well break your interpreter or
cause data corruption (the default encoded versions of Unicode
objects are cached inside the objects).

Now, in your particular case, you're probably better off just
tweaking site.py directly in your custom Python interpreter
rather than relying on sitecustomize.py (see setencoding() in
site.py).

To answer your question: yes, this particular API may not be
used outside site.py.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Aug 25 2009)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


guido at python

Aug 25, 2009, 10:37 AM

Post #4 of 26 (2207 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

In retrospect, it should have been called sys._setdefaultencoding().
That sends an extra signal that it's not meant for general use.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


robert.kern at gmail

Aug 25, 2009, 11:10 AM

Post #5 of 26 (2208 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

On 2009-08-25 12:37 PM, Guido van Rossum wrote:
> In retrospect, it should have been called sys._setdefaultencoding().
> That sends an extra signal that it's not meant for general use.

Considering all of the sys._getframe() hacks out there, I suspect that this
would encourage more abuse of the function than the current situation.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


guido at python

Aug 25, 2009, 11:29 AM

Post #6 of 26 (2209 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

On Tue, Aug 25, 2009 at 11:10 AM, Robert Kern<robert.kern [at] gmail> wrote:
> On 2009-08-25 12:37 PM, Guido van Rossum wrote:
>>
>> In retrospect, it should have been called sys._setdefaultencoding().
>> That sends an extra signal that it's not meant for general use.
>
> Considering all of the sys._getframe() hacks out there, I suspect that this
> would encourage more abuse of the function than the current situation.

Why? It would still be deleted by site.py. The abuse of
sys._getframe() exists because it fills a real need. (As does abuse of
sys.setdefaultencoding(). However abusing it is actually more
troublesome, because the problems are much less theoretical.)

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


robert.kern at gmail

Aug 25, 2009, 11:35 AM

Post #7 of 26 (2210 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

On 2009-08-25 13:29 PM, Guido van Rossum wrote:
> On Tue, Aug 25, 2009 at 11:10 AM, Robert Kern<robert.kern [at] gmail> wrote:
>> On 2009-08-25 12:37 PM, Guido van Rossum wrote:
>>>
>>> In retrospect, it should have been called sys._setdefaultencoding().
>>> That sends an extra signal that it's not meant for general use.
>>
>> Considering all of the sys._getframe() hacks out there, I suspect that this
>> would encourage more abuse of the function than the current situation.
>
> Why? It would still be deleted by site.py.

Ah, yes. You're right. For whatever reason I thought it lived as
site.setdefaultencoding() when I read your message and thought that you were
proposing to move it to sys. Never mind me.

--
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
that is made terrible by our own mad attempt to interpret it as though it had
an underlying truth."
-- Umberto Eco

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


chris at simplistix

Aug 26, 2009, 4:51 PM

Post #8 of 26 (2183 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

exarkun [at] twistedmatrix wrote:
> The ability to change the default encoding is a misfeature. There's
> essentially no way to write correct Python code in the presence of this
> feature.

How so? If every single piece of text in your project is encoded in a
superset of ascii (such as utf-8), why would this be a problem?
Even if you were evil/stupid and mixed encodings, surely all you'd get
is different unicode errors or mayvbe the odd strange character during
display?

> It may be a major task, but the best thing you can do is find each str
> and unicode operation in the software you're working with and make them
> correct with respect to your inputs and outputs. Flipping a giant
> switch for the entire process is just going to change which things are
> wrong.

Well, flipping that giant switch has worked in production for the past 5
years, so I'm afraid I'll respectfully disagree. I'd suspect the
pragmatics of real world software are with that function even exists,
and it's extremely useful when used correctly...

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


chris at simplistix

Aug 26, 2009, 4:59 PM

Post #9 of 26 (2180 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

M.-A. Lemburg wrote:
> Let's look at this from another angle: sys.setdefaultencoding()
> is only made available for use in site.py.

...see this:

http://mail.python.org/pipermail/python-dev/2009-August/091391.html

I would like to use sitecustomize.py for all the very good reasons given
in this thread:

- I don't want to change the default encoding for every project that
uses the python installation in question

- I don't even want to change the default encoding for every python
script run by the current user

- I only want to change the default encoding for one particular project.

Sadly, for the reasons I describe in the thread, site.py won't find a
sitecustomize.py in this situation...

> If you use it anywhere else, you're on your own.

No problem with that. To be specific, this is a Zope 2.12 instance
driven by this buildout:

[instance]
recipe = zc.recipe.egg
eggs = ${buildout:eggs}
interpreter = py
entry-points=
runzope=Zope2.Startup.run:run
zopectl=Zope2.Startup.zopectl:main
scripts = runzope zopectl
initialization =
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
sys.argv[1:1] = ['-C','${buildout:directory}/etc/instance.conf']

The call to sys.setdefaultencoding is *very* early in the scheme of
things... The runzope script that gets run only has some sys.path
manipulation before sys.setdefaultencoding gets called. What problems
could there be by calling sys.setdefaultencoding there?

> Such usage
> is not supported and may very well break your interpreter

Can you give an example?

> or
> cause data corruption (the default encoded versions of Unicode
> objects are cached inside the objects).

When called as early as in the above script, what objects would have
encoded strings cached in them?

> Now, in your particular case, you're probably better off just
> tweaking site.py directly in your custom Python interpreter
> rather than relying on sitecustomize.py (see setencoding() in
> site.py).

Why?

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


chris at simplistix

Aug 26, 2009, 5:00 PM

Post #10 of 26 (2183 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

Guido van Rossum wrote:
> In retrospect, it should have been called sys._setdefaultencoding().
> That sends an extra signal that it's not meant for general use.

Crazy idea: how about mutating it into sys._setdefaultencoding rather
than deleting it?

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Aug 26, 2009, 11:47 PM

Post #11 of 26 (2181 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

>> The ability to change the default encoding is a misfeature. There's
>> essentially no way to write correct Python code in the presence of
>> this feature.
>
> How so? If every single piece of text in your project is encoded in a
> superset of ascii (such as utf-8), why would this be a problem?

What is "every single piece of text"? Every string occurring in source
code? or also every single string that may be read from a file, a
socket, out of a database, or from a user interface?

How can you be certain that any string is UTF-8 when doing any
reasonable IO?

> Even if you were evil/stupid and mixed encodings, surely all you'd get
> is different unicode errors or mayvbe the odd strange character during
> display?

One specific problem is dictionaries will stop working correctly if you
set the default encoding to anything but ASCII. The reason is that
with UTF-8 as the default encoding, you get

True
False

So objects that compare equal will not hash equal. As a consequence, you
may have two different values for what should be the same key in a
dictionary.

> Well, flipping that giant switch has worked in production for the past 5
> years, so I'm afraid I'll respectfully disagree. I'd suspect the
> pragmatics of real world software are with that function even exists,
> and it's extremely useful when used correctly...

It has worked in your application. See my example above: it is very easy
to create applications that stop working correctly if you use
setdefaultencoding (at all - the only supported value is "latin-1",
since Unicode strings hash the same as byte strings if all characters
are in row 0).

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Aug 26, 2009, 11:53 PM

Post #12 of 26 (2176 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

>> In retrospect, it should have been called sys._setdefaultencoding().
>> That sends an extra signal that it's not meant for general use.
>
> Crazy idea: how about mutating it into sys._setdefaultencoding rather
> than deleting it?

Please don't post crazy ideas unless you really mean them.

This specific crazy idea must be rejected; it would break backwards
compatibility, for no good reason.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


chris at simplistix

Aug 27, 2009, 12:27 AM

Post #13 of 26 (2179 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

Martin v. Löwis wrote:
>>> In retrospect, it should have been called sys._setdefaultencoding().
>>> That sends an extra signal that it's not meant for general use.
>> Crazy idea: how about mutating it into sys._setdefaultencoding rather
>> than deleting it?
>
> Please don't post crazy ideas unless you really mean them.
>
> This specific crazy idea must be rejected; it would break backwards
> compatibility, for no good reason.

How is it breaking backwards compatibility?

- If people were somehow relying on sys.setdefaultencoding to be
deleted, that's fine, it's still gone

- If people were somehow relying on sys not having an attribute called
_setdefaultencoding, or were relying on stuffing an attribute into sys
called _setdefaultencoding then... well... that seems pretty unlikely ;-)

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


chris at simplistix

Aug 27, 2009, 12:42 AM

Post #14 of 26 (2181 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

Martin v. Löwis wrote:
>>> The ability to change the default encoding is a misfeature. There's
>>> essentially no way to write correct Python code in the presence of
>>> this feature.
>> How so? If every single piece of text in your project is encoded in a
>> superset of ascii (such as utf-8), why would this be a problem?

I guess I should have said "every single piece of text in your project
is encoded in a superset of ascii (such as utf-8) or is decoded into a
unicode object at the application boundaries, such as an incoming http
request or in the process of parsing a file off disk", in which case:

> What is "every single piece of text"? Every string occurring in source
> code?

Yes.

> or also every single string that may be read from a file,

Yes.

> a
> socket,

Yes.

> out of a database,

Yes.

> or from a user interface?

Yes.

Any others I can say Yes to? ;-)

> How can you be certain that any string is UTF-8 when doing any
> reasonable IO?

Careful checking, and a knowledge for people working on the app's
development that anything else will result in severe pain, both physical
and mental ;-)

>> Even if you were evil/stupid and mixed encodings, surely all you'd get
>> is different unicode errors or mayvbe the odd strange character during
>> display?
>
> One specific problem is dictionaries will stop working correctly if you
> set the default encoding to anything but ASCII.

...except they haven't.

> The reason is that
> with UTF-8 as the default encoding, you get
>
> py> u"\u20ac" == u"\u20ac".encode("utf-8")
> True
> py> hash(u"\u20ac") == hash(u"\u20ac".encode("utf-8"))
> False
>
> So objects that compare equal will not hash equal. As a consequence, you
> may have two different values for what should be the same key in a
> dictionary.

Indeed, but this doesn't happen because the app never has a situation
where strings and unicodes are put in the same dict. However, it does
have plenty of situations where lists containing a mixture of utf-8
encoded strings and unicodes exist, where changing the default encoding
removes a *lot* of pain.

> It has worked in your application. See my example above: it is very easy
> to create applications that stop working correctly if you use
> setdefaultencoding (at all - the only supported value is "latin-1",
> since Unicode strings hash the same as byte strings if all characters
> are in row 0).

Would anyone object if I added this snippet to the .rst that generates:
http://docs.python.org/library/sys.html

It doesn't seem to be recorded anywhere anyone who's likely to use
setdefaultencoding is likely to find it...

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Aug 27, 2009, 12:53 AM

Post #15 of 26 (2180 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

Chris Withers wrote:
> Martin v. Löwis wrote:
>>>> In retrospect, it should have been called sys._setdefaultencoding().
>>>> That sends an extra signal that it's not meant for general use.
>>> Crazy idea: how about mutating it into sys._setdefaultencoding rather
>>> than deleting it?
>>
>> Please don't post crazy ideas unless you really mean them.
>>
>> This specific crazy idea must be rejected; it would break backwards
>> compatibility, for no good reason.
>
> How is it breaking backwards compatibility?
>
> - If people were somehow relying on sys.setdefaultencoding to be
> deleted, that's fine, it's still gone
>
> - If people were somehow relying on sys not having an attribute called
> _setdefaultencoding, or were relying on stuffing an attribute into sys
> called _setdefaultencoding then... well... that seems pretty unlikely ;-)

If people were using the reload trickery, that would break if the
function changed its name.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


chris at simplistix

Aug 27, 2009, 1:01 AM

Post #16 of 26 (2184 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

Martin v. Löwis wrote:
>> - If people were somehow relying on sys not having an attribute called
>> _setdefaultencoding, or were relying on stuffing an attribute into sys
>> called _setdefaultencoding then... well... that seems pretty unlikely ;-)
>
> If people were using the reload trickery, that would break if the
> function changed its name.

No it doesn't:

$ svn diff
Index: Lib/site.py
===================================================================
--- Lib/site.py (revision 74552)
+++ Lib/site.py (working copy)
@@ -540,6 +540,7 @@
if hasattr(sys, "setdefaultencoding"):
+ sys._setdefaultencoding = sys.setdefaultencoding
del sys.setdefaultencoding

>>> import sys
>>> sys._setdefaultencoding
<built-in function setdefaultencoding>
>>> sys.setdefaultencoding
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'setdefaultencoding'

>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding
<built-in function setdefaultencoding>
>>> sys._setdefaultencoding
<built-in function setdefaultencoding>

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Aug 27, 2009, 1:02 AM

Post #17 of 26 (2179 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

>> One specific problem is dictionaries will stop working correctly if you
>> set the default encoding to anything but ASCII.
>
> ...except they haven't.

In your application. Can you please agree that this a semantical problem
that is completely unacceptable for language design?

> Indeed, but this doesn't happen because the app never has a situation
> where strings and unicodes are put in the same dict. However, it does
> have plenty of situations where lists containing a mixture of utf-8
> encoded strings and unicodes exist, where changing the default encoding
> removes a *lot* of pain.

So you should convert all byte strings to UTF-8 before adding them
to the list. Assuming you have used proper encapsulation and
object-oriented design, it shouldn't be too difficult to find, for each
such list, where the places are that modify the list.

> Would anyone object if I added this snippet to the .rst that generates:
> http://docs.python.org/library/sys.html

The snippet explaining the problem? I don't mind, but Raymond is on
record for objecting to any addition of a warning box to the
documentation, because it gives the impression that Python is full of
problems, when many these warnings really refer to boundary cases only.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Aug 27, 2009, 1:06 AM

Post #18 of 26 (2174 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

> if hasattr(sys, "setdefaultencoding"):
> + sys._setdefaultencoding = sys.setdefaultencoding
> del sys.setdefaultencoding

Ah, so you didn't want to rename the function. I agree that this
would not break backwards compatibility.

I guess the basic objection remains: making it so would make
_setdefaultencoding a supported feature, which would then mean
that we should fix all the bugs that it causes - when we already
know (because we thought many years about this) that it is not
possible to implement setdefaultencoding correctly and efficiently
(so the current implementation is only efficient, but not correct).

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


mal at egenix

Aug 27, 2009, 1:34 AM

Post #19 of 26 (2175 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

Chris Withers wrote:
> M.-A. Lemburg wrote:
>> Let's look at this from another angle: sys.setdefaultencoding()
>> is only made available for use in site.py.
>
> ...see this:
>
> http://mail.python.org/pipermail/python-dev/2009-August/091391.html
>
> I would like to use sitecustomize.py for all the very good reasons given
> in this thread:
>
> - I don't want to change the default encoding for every project that
> uses the python installation in question
>
> - I don't even want to change the default encoding for every python
> script run by the current user
>
> - I only want to change the default encoding for one particular project.
>
> Sadly, for the reasons I describe in the thread, site.py won't find a
> sitecustomize.py in this situation...
>
>> If you use it anywhere else, you're on your own.
>
> No problem with that. To be specific, this is a Zope 2.12 instance
> driven by this buildout:
>
> [instance]
> recipe = zc.recipe.egg
> eggs = ${buildout:eggs}
> interpreter = py
> entry-points=
> runzope=Zope2.Startup.run:run
> zopectl=Zope2.Startup.zopectl:main
> scripts = runzope zopectl
> initialization =
> import sys
> reload(sys)
> sys.setdefaultencoding('utf-8')
> sys.argv[1:1] = ['-C','${buildout:directory}/etc/instance.conf']
>
> The call to sys.setdefaultencoding is *very* early in the scheme of
> things... The runzope script that gets run only has some sys.path
> manipulation before sys.setdefaultencoding gets called. What problems
> could there be by calling sys.setdefaultencoding there?
>
>> Such usage
>> is not supported and may very well break your interpreter
>
> Can you give an example?

You can get strange effects caused by the fact that some
string objects will now compare equal while not necessarily
having the same hash value.

Unicode objects and strings have the same hash value provided
that they are both ASCII.

With the ASCII default encoding, a non-ASCII string cannot
be compared to a Unicode object, so the problem does not
occur.

>> or
>> cause data corruption (the default encoded versions of Unicode
>> objects are cached inside the objects).
>
> When called as early as in the above script, what objects would have
> encoded strings cached in them?

Difficult to say. This depends a lot on the environment
where you are running the script.

Note that the codecs are loaded at a very early stage in
the interpreter startup and a lot of them do use Unicode
strings. This wasn't the case in Python 1.6 when
the whole site.py approach to setting the default
encoding was designed, but added later on, in Python 2.1
IIRC, when noone really considered using a different
default encoding anymore.

Using UTF-8 as new default encoding will not cause much
trouble with this, since it is an ASCII superset.

However, changing it more than once will cause the earlier
Unicode objects to still use the old default encoding
value.

Using a different non-ASCII compatible encoding, such
as UTF-16, will cause breakage for the same reason.

The default encoded string version of a Unicode object is
cached in the object and never recreated after it has
first been successfully encoded.

When only changing the default encoding once and using
UTF-8 as the new default encoding, you'll only run into
the hash value problem.

If that's not an issue for your
application, e.g. you don't mix Unicode and string key
objects in your dictionaries and don't rely on the special
relationship between hashes and comparisons elsewhere,
you should be fine.

>> Now, in your particular case, you're probably better off just
>> tweaking site.py directly in your custom Python interpreter
>> rather than relying on sitecustomize.py (see setencoding() in
>> site.py).
>
> Why?

To get the job done :-)

You could rewrite setencoding() to get the encoding information
from e.g. an os.environ variable or some config file.

--
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source (#1, Aug 27 2009)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stephen at xemacs

Aug 27, 2009, 3:29 AM

Post #20 of 26 (2175 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

Chris Withers writes:

> > How can you be certain that any string is UTF-8 when doing any
> > reasonable IO?
>
> Careful checking, and a knowledge for people working on the app's
> development that anything else will result in severe pain, both physical
> and mental ;-)

If you're *that* careful, the additional effort to hack around this is
negligible. The problem is that most people are *never* that careful,
and *all* people are rarely that careful.

I understand your use case, but I don't see a case for exposing this
to the general public.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


exarkun at twistedmatrix

Aug 27, 2009, 6:08 AM

Post #21 of 26 (2167 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

On 26 Aug, 11:51 pm, chris [at] simplistix wrote:
>exarkun [at] twistedmatrix wrote:
>>The ability to change the default encoding is a misfeature. There's
>>essentially no way to write correct Python code in the presence of
>>this feature.
>
>How so? If every single piece of text in your project is encoded in a
>superset of ascii (such as utf-8), why would this be a problem?
>Even if you were evil/stupid and mixed encodings, surely all you'd get
>is different unicode errors or mayvbe the odd strange character during
>display?

This is what I meant when I said what I said about correct code. If
you're happy to have encoding errors and corrupt data, then I guess
you're happy to have a function like setdefaultencoding.
>>It may be a major task, but the best thing you can do is find each str
>>and unicode operation in the software you're working with and make
>>them correct with respect to your inputs and outputs. Flipping a
>>giant switch for the entire process is just going to change which
>>things are wrong.
>
>Well, flipping that giant switch has worked in production for the past
>5 years, so I'm afraid I'll respectfully disagree. I'd suspect the
>pragmatics of real world software are with that function even exists,
>and it's extremely useful when used correctly...

I suppose it's fortunate for you that the function exists, then. For my
part, I have managed to write and operate a lot of code in production
for at least as long without ever touching it. Generally speaking, I
also don't find that I encounter lots of unicode errors or corrupted
data (*sometimes* I do; in those cases, I fix the broken code and it
doesn't happen again).

Jean-Paul
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


barry at python

Aug 27, 2009, 7:17 AM

Post #22 of 26 (2161 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

On Aug 27, 2009, at 9:08 AM, exarkun [at] twistedmatrix wrote:

> This is what I meant when I said what I said about correct code. If
> you're happy to have encoding errors and corrupt data, then I guess
> you're happy to have a function like setdefaultencoding.

Whatever happened to "we're all adults here"[1]? I have no problem
with making it difficult but possible to write buggy but practical
code. Software engineering is a messy business.

-Barry

[1] That may not be literally true any more, but still :)
Attachments: PGP.sig (0.81 KB)


guido at python

Aug 27, 2009, 9:50 AM

Post #23 of 26 (2168 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

2009/8/27 Barry Warsaw <barry [at] python>:
> On Aug 27, 2009, at 9:08 AM, exarkun [at] twistedmatrix wrote:
>
>> This is what I meant when I said what I said about correct code.  If you're happy to have encoding errors and corrupt data, then I guess you're happy to have a function like setdefaultencoding.
>
> Whatever happened to "we're all adults here"[1]?  I have no problem with making it difficult but possible to write buggy but practical code.  Software engineering is a messy business.

Being adults about it also means when to give up. Chris, please stop
arguing about this. There are plenty of techniques you can use to get
what you want without changing Python, for example virtualenv, which
allows you to create a custom Python environment for each project. Or
you could switch to Python 3.1, whose different approach to
distinguishing between encoded and decoded string means that you won't
have to worry about the default encoding quite as much (and you are
free to change the default *filesystem* encoding in Py3k). Or you
could invoke python -S, which skips site.py and sitecustomize.py, so
you are free to mess up any way you want.

The fundamental reason the designers of Python's 2.x standard library
don't want you to be able to set the default encoding in your app, is
that the standard library is written with the assumption that the
default encoding is fixed, and no guarantees about the correct
workings of the standard library can be made when you change it. There
are no tests for this situation. Nobody knows what will fail when. And
you (or worse, your users) *will* come back to us with complaints if
the standard library suddenly starts doing things you didn't expect.

--
--Guido van Rossum (home page: http://www.python.org/~guido/)
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


chris at simplistix

Aug 31, 2009, 5:23 AM

Post #24 of 26 (1994 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

Guido van Rossum wrote:
> Being adults about it also means when to give up. Chris, please stop
> arguing about this.

Sure. Even if people had agreed to this change, it wouldn't end up in a
python release I could use for this project.

> There are plenty of techniques you can use to get
> what you want without changing Python, for example virtualenv, which
> allows you to create a custom Python environment for each project.

Yep, I'll resort to wrapping the buildout in a virtualenv iff the
reload(sys) hack ends up causing problems...

> Or
> you could switch to Python 3.1,

I would love to, once Python 3 has a viable web app story...

cheers,

Chris

--
Simplistix - Content Management, Batch Processing & Python Consulting
- http://www.simplistix.co.uk
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


fumanchu at aminus

Aug 31, 2009, 7:49 AM

Post #25 of 26 (1994 views)
Permalink
Re: deleting setdefaultencoding iin site.py is evil [In reply to]

Chris Withers wrote:
> Guido van Rossum wrote:
> > Being adults about it also means when to give up. Chris, please stop
> > arguing about this.
>
> Sure. Even if people had agreed to this change, it wouldn't end up in
a
> python release I could use for this project.
>
> > There are plenty of techniques you can use to get
> > what you want without changing Python, for example virtualenv, which
> > allows you to create a custom Python environment for each project.
>
> Yep, I'll resort to wrapping the buildout in a virtualenv iff the
> reload(sys) hack ends up causing problems...
>
> > Or
> > you could switch to Python 3.1,
>
> I would love to, once Python 3 has a viable web app story...

CherryPy 3.2 is now in beta, and mod_wsgi is nearly ready as well. Both
support Python 3. :)


Robert Brewer
fumanchu [at] aminus

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

First page Previous page 1 2 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.