
__peter__ at web
Nov 6, 2009, 12:59 AM
Post #2 of 2
(36 views)
Permalink
|
|
Re: decoding a byte array that is unicode escaped?
[In reply to]
|
|
sam wrote: > I have a byte stream read over the internet: > > responseByteStream = urllib.request.urlopen( httpRequest ); > responseByteArray = responseByteStream.read(); > > The characters are encoded with unicode escape sequences, for example > a copyright symbol appears in the stream as the bytes: > > 5C 75 30 30 61 39 > > which translates to: > \u00a9 > > which is unicode for the copyright symbol. > > I am simply trying to display this copyright symbol on a webpage, so > how do I encode the byte array to utf-8 given that it is 'escape > encoded' in the above way? I tried: > > responseByteArray.decode('utf-8') > and responseByteArray.decode('unicode_escape') > and str(responseByteArray). > > I am using Python 3.1. Convert the bytes to unicode first: >>> u = b"\\u00a9".decode("unicode-escape") >>> u '©' Then convert the string to bytes: >>> u.encode("utf-8") b'\xc2\xa9' -- http://mail.python.org/mailman/listinfo/python-list
|