Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Unicode Proposal: Version 0.7

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


mal at lemburg

Nov 18, 1999, 11:15 AM

Post #1 of 7 (256 views)
Permalink
Unicode Proposal: Version 0.7

FYI, I've uploaded a new version of the proposal which includes
new codec APIs, a new codec search mechanism and some minor
fixes here and there.

The latest version of the proposal is available at:

http://starship.skyport.net/~lemburg/unicode-proposal.txt

Older versions are available as:

http://starship.skyport.net/~lemburg/unicode-proposal-X.X.txt

Some POD (points of discussion) that are still open:

Unicode objects support for %-formatting

Design of the internal C API and the Python API for
the Unicode character properties database

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 43 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/


skip at mojam

Nov 18, 1999, 12:09 PM

Post #2 of 7 (251 views)
Permalink
Re: Unicode Proposal: Version 0.7 [In reply to]

I haven't been following this discussion closely at all, and have no
previous experience with Unicode, so please pardon a couple stupid questions
from the peanut gallery:

1. What does U+0061 mean (other than 'a')? That is, what is U?

2. I saw nothing about encodings in the Codec/StreamReader/StreamWriter
description. Given a Unicode object with encoding e1, how do I write
it to a file that is to be encoded with encoding e2? Seems like I
would do something like

u1 = unicode(s, encoding=e1)
f = open("somefile", "wb")
u2 = unicode(u1, encoding=e2)
f.write(u2)

Is that how it would be done? Does this question even make sense?

3. What will the impact be on programmers such as myself currently
living with blinders on (that is, writing in plain old 7-bit ASCII)?

Thx,

Skip Montanaro | http://www.mojam.com/
skip [at] mojam | http://www.musi-cal.com/
847-971-7098 | Python: Programming the way Guido indented...


mal at lemburg

Nov 18, 1999, 4:41 PM

Post #3 of 7 (247 views)
Permalink
Re: Unicode Proposal: Version 0.7 [In reply to]

Skip Montanaro wrote:
>
> I haven't been following this discussion closely at all, and have no
> previous experience with Unicode, so please pardon a couple stupid questions
> from the peanut gallery:
>
> 1. What does U+0061 mean (other than 'a')? That is, what is U?

U+XXXX means Unicode character with ordinal hex number XXXX. It is
basically just another way to say, hey I want the Unicode character
at position 0xXXXX in the Unicode spec.

> 2. I saw nothing about encodings in the Codec/StreamReader/StreamWriter
> description. Given a Unicode object with encoding e1, how do I write
> it to a file that is to be encoded with encoding e2? Seems like I
> would do something like
>
> u1 = unicode(s, encoding=e1)
> f = open("somefile", "wb")
> u2 = unicode(u1, encoding=e2)
> f.write(u2)
>
> Is that how it would be done? Does this question even make sense?

The unicode() constructor converts all input to Unicode as
basis for other conversions. In the above example, s would be
converted to Unicode using the assumption that the bytes in
s represent characters encoded using the encoding given in e1.
The line with u2 would raise a TypeError, because u1 is not
a string. To convert a Unicode object u1 to another encoding,
you would have to call the .encode() method with the intended
new encoding. The Unicode object will then take care of the
conversion of its internal Unicode data into a string using
the given encoding, e.g. you'd write:

f.write(u1.encode(e2))

> 3. What will the impact be on programmers such as myself currently
> living with blinders on (that is, writing in plain old 7-bit ASCII)?

If you don't want your scripts to know about Unicode, nothing
will really change. In case you do use e.g. Latin-1 characters
in your scripts for strings, you are asked to include a pragma
in the comment lines at the beginning of the script (so that
programmers viewing your code using other encoding have a chance
to figure out what you've written).

Here's the text from the proposal:
"""
Note that you should provide some hint to the encoding you used to
write your programs as pragma line in one the first few comment lines
of the source file (e.g. '# source file encoding: latin-1'). If you
only use 7-bit ASCII then everything is fine and no such notice is
needed, but if you include Latin-1 characters not defined in ASCII, it
may well be worthwhile including a hint since people in other
countries will want to be able to read you source strings too.
"""

Other than that you can continue to use normal strings like
you always have.

Hope that clarifies things at least a bit,
--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 43 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/


mal at lemburg

Nov 23, 1999, 6:32 AM

Post #4 of 7 (251 views)
Permalink
Re: Unicode Proposal: Version 0.8 [In reply to]

FYI, I've uploaded a new version of the proposal which includes
the encodings package, definition of the 'raw unicode escape'
encoding (available via e.g. ur""), Unicode format strings and
a new method .breaklines().

The latest version of the proposal is available at:

http://starship.skyport.net/~lemburg/unicode-proposal.txt

Older versions are available as:

http://starship.skyport.net/~lemburg/unicode-proposal-X.X.txt

Some POD (points of discussion) that are still open:

Stream readers:

What about .readline(), .readlines() ? These could be implemented
using .read() as generic functions instead of requiring their
implementation by all codecs. Also see Line Breaks.

Python interface for the Unicode property database

What other special Unicode formatting characters should be
enhanced to work with Unicode input ? Currently only the
following special semantics are defined:

u"%s %s" % (u"abc", "abc") should return u"abc abc".


Pretty quiet around here lately...
--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 38 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/


mhammond at skippinet

Nov 23, 1999, 3:45 PM

Post #5 of 7 (247 views)
Permalink
RE: Unicode Proposal: Version 0.8 [In reply to]

> Pretty quiet around here lately...

My guess is that most positions and opinions have been covered. It is
now probably time for less talk, and more code!

It is time to start an implementation plan? Do we start with /F's
Unicode implementation (which /G *smirk* seemed to approve of)? Who
does what? When can we start to play with it?

And a key point that seems to have been thrust in our faces at the
start and hardly mentioned recently - does the proposal as it stands
meet our sponsor's (HP) requirements?

Mark.


mal at lemburg

Nov 24, 1999, 1:34 AM

Post #6 of 7 (251 views)
Permalink
Re: Unicode Proposal: Version 0.8 [In reply to]

Mark Hammond wrote:
>
> > Pretty quiet around here lately...
>
> My guess is that most positions and opinions have been covered. It is
> now probably time for less talk, and more code!

Or that everybody is on holidays... like Guido.

> It is time to start an implementation plan? Do we start with /F's
> Unicode implementation (which /G *smirk* seemed to approve of)? Who
> does what? When can we start to play with it?

This depends on whether HP agrees on the current specs. If they
do, there should be code by mid December, I guess.

> And a key point that seems to have been thrust in our faces at the
> start and hardly mentioned recently - does the proposal as it stands
> meet our sponsor's (HP) requirements?

Haven't heard anything from them yet (this is probably mainly
due to Guido being offline).

--
Marc-Andre Lemburg
______________________________________________________________________
Y2000: 37 days left
Business: http://www.lemburg.com/
Python Pages: http://www.lemburg.com/python/


captainrobbo at yahoo

Nov 24, 1999, 4:40 AM

Post #7 of 7 (249 views)
Permalink
RE: Unicode Proposal: Version 0.8 [In reply to]

--- Mark Hammond <mhammond [at] skippinet> wrote:
> > Pretty quiet around here lately...
>
> My guess is that most positions and opinions have
> been covered. It is
> now probably time for less talk, and more code!
>
> It is time to start an implementation plan? Do we
> start with /F's
> Unicode implementation (which /G *smirk* seemed to
> approve of)? Who
> does what? When can we start to play with it?
>
> And a key point that seems to have been thrust in
> our faces at the
> start and hardly mentioned recently - does the
> proposal as it stands
> meet our sponsor's (HP) requirements?
>
> Mark.

I had a long chat with them on Friday :-) They want
it done, but nobody is actively working on it now as
far as I can tell, and they are very busy.

The per-thread thing was a red herring - they just
want to be able to do (for example) web servers
handling different encodings from a central unicode
database, so per-output-stream works just fine.

They will be at IPC8; I'd suggest that a round of
prototyping, we insist they read it and then discuss
it at IPC8, and be prepared to rework things
thereafter are important. Hopefully then we'll have a
plan on how to tackle the much larger (but less
interesting to python-dev) job of writing and
verifying all the codecs and utilities.


Andy Robinson



=====
Andy Robinson
Robinson Analytics Ltd.
------------------
My opinions are the official policy of Robinson Analytics Ltd.
They just vary from day to day.

__________________________________________________
Do You Yahoo!?
Thousands of Stores. Millions of Products. All in one place.
Yahoo! Shopping: http://shopping.yahoo.com

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.