Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Logging, Unicode and sockets

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


vinay_sajip at yahoo

Oct 7, 2009, 11:11 PM

Post #1 of 4 (885 views)
Permalink
Logging, Unicode and sockets

>
Thanks to

http://bugs.python.org/issue7077

I've noticed that the socket-based logging handlers - SocketHandler,
DatagramHandler and SysLogHandler - aren't Unicode-aware and can break in the
presence of Unicode messages. I'd like to fix this by giving these handlers an
optional (encoding=None) parameter in their __init__, and then using this to
encode on output. If no encoding is specified, is it best to use
locale.getpreferredencoding(), sys.getdefaultencoding(),
sys.getfilesystemencoding(), 'utf-8' or something else? On my system:

>>> sys.getdefaultencoding()
'ascii'
>>> sys.getfilesystemencoding()
'mbcs'
>>> locale.getpreferredencoding()
'cp1252'

which suggests to me that the locale.getpreferredencoding() should be the
default. However, as I'm not a Unicode maven, any suggestions would be welcome.

Regards,


Vinay Sajip

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


python at mrabarnett

Oct 8, 2009, 8:57 AM

Post #2 of 4 (822 views)
Permalink
Re: Logging, Unicode and sockets [In reply to]

Vinay Sajip wrote:
> Thanks to
>
> http://bugs.python.org/issue7077
>
> I've noticed that the socket-based logging handlers - SocketHandler,
> DatagramHandler and SysLogHandler - aren't Unicode-aware and can break in the
> presence of Unicode messages. I'd like to fix this by giving these handlers an
> optional (encoding=None) parameter in their __init__, and then using this to
> encode on output. If no encoding is specified, is it best to use
> locale.getpreferredencoding(), sys.getdefaultencoding(),
> sys.getfilesystemencoding(), 'utf-8' or something else? On my system:
>
>>>> sys.getdefaultencoding()
> 'ascii'
>>>> sys.getfilesystemencoding()
> 'mbcs'
>>>> locale.getpreferredencoding()
> 'cp1252'
>
> which suggests to me that the locale.getpreferredencoding() should be the
> default. However, as I'm not a Unicode maven, any suggestions would be welcome.
>
Well, encodings can vary from machine to machine, and if the encoding
doesn't cover all the Unicode codepoints then you could get an encoding
exception. For these reasons I'd vote for UTF-8.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Oct 8, 2009, 12:00 PM

Post #3 of 4 (824 views)
Permalink
Re: Logging, Unicode and sockets [In reply to]

> I've noticed that the socket-based logging handlers - SocketHandler,
> DatagramHandler and SysLogHandler - aren't Unicode-aware and can break in the
> presence of Unicode messages.

I can't understand what the problem with SocketHandler/DatagramHandler
is. As they use pickle, they should surely be able to send records with
Unicode strings in them, no?

OTOH, why is SMTPHandler not in your list?

> I'd like to fix this by giving these handlers an
> optional (encoding=None) parameter in their __init__, and then using this to
> encode on output.

For syslog, I don't think that's appropriate. I presume this is meant to
follow RFC 5424? If so, it SHOULD send the data in UTF-8, in which case
it MUST include a BOM also. A.8 then says that if you are not certain
that it is UTF-8 (which you wouldn't be if the application passes a byte
string), you MAY omit the BOM.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


vinay_sajip at yahoo

Oct 8, 2009, 12:55 PM

Post #4 of 4 (826 views)
Permalink
Re: Logging, Unicode and sockets [In reply to]

Martin v. Löwis <martin <at> v.loewis.de> writes:

> I can't understand what the problem with SocketHandler/DatagramHandler
> is. As they use pickle, they should surely be able to send records with
> Unicode strings in them, no?

Of course you are right. When I posted that it was a knee-jerk reaction to the
issue that was raised for SysLogHandler configured to use UDP. I did realise a
bit later that the issue didn't apply to the other two handlers but I was hoping
nobody would notice ;-)

> OTOH, why is SMTPHandler not in your list?

I assumed smtp.sendmail() would deal with it, as it deals with the wire
protocol, but perhaps I was wrong to do so. I noticed that Issue 521270 (SMTP
does not handle Unicode) was closed, but I didn't look at it closely. I now see
it was perhaps only a partial solution. I did a bit of searching and found this
post by Marius Gedminas:

http://mg.pov.lt/blog/unicode-emails-in-python.html

Now if that's the right approach, shouldn't it be catered for in a more general
part of the stdlib than logging - perhaps in smtplib itself? Or, seeing that
Marius' post is five years old, is there a better way of doing it using the
stdlib as it is now?

> For syslog, I don't think that's appropriate. I presume this is meant to
> follow RFC 5424? If so, it SHOULD send the data in UTF-8, in which case
> it MUST include a BOM also. A.8 then says that if you are not certain
> that it is UTF-8 (which you wouldn't be if the application passes a byte
> string), you MAY omit the BOM.

So ISTM that the right thing to do on 2.x would be: if str to be sent, send as
is; if unicode to be sent, encode using utf-8 and send with a BOM. For 3.x, just
encode using utf-8 and send with a BOM.

Does that seem right?

Thanks and regards,

Vinay Sajip

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.