Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Optimize Unicode strings in Python 3.3

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


victor.stinner at gmail

May 3, 2012, 4:45 PM

Post #1 of 9 (652 views)
Permalink
Optimize Unicode strings in Python 3.3

Hi,

Different people are working on improving performances of Unicode
strings in Python 3.3. This Python version is very different from
Python 3.2 because of the PEP 393, and it is still unclear to me what
is the best way to create a new Unicode string.

There are different approachs:

* Use the legacy (Py_UNICODE) API, PyUnicode_READY() converts the
result to the canonical form. CJK codecs are still using this API.
* Use a Py_UCS4 buffer and then convert to the canonical form (ASCII,
UCS1 or UCS2). Approach taken by io.StringIO. io.StringIO is not only
used to write, but also to read and so a Py_UCS4 buffer is a good
compromise.
* PyAccu API: optimized version of chunks=[]; for ...: ...
chunks.append(text); return ''.join(chunks).
* Two steps: compute the length and maximum character of the output
string, allocate the output string and then write characters. str%args
was using it.
* Optimistic approach. Start with a ASCII buffer, enlarge and widen
(to UCS2 and then UCS4) the buffer when new characters are written.
Approach used by the UTF-8 decoder and by str%args since today.

The optimistic approach uses realloc() to resize the string. It is
faster than the PyAccu approach (at least for short ASCII strings),
maybe because it avoids the creating of temporary short strings.
realloc() looks to be efficient on Linux and Windows (at least Seven).

Various notes:
* PyUnicode_READ() is slower than reading a Py_UNICODE array.
* Some decoders unroll the main loop to process 4 or 8 bytes (32 or
64 bits CPU) at each step.

I am interested if you know other tricks to optimize Unicode strings
in Python, or if you are interested to work on this topic.

There are open issues related to optimizing Unicode:

#11313: Speed up default encode()/decode()
#12807: Optimization/refactoring for {bytearray, bytes, unicode}.strip()
#14419: Faster ascii decoding
#14624: Faster utf-16 decoder
#14625: Faster utf-32 decoder
#14654: More fast utf-8 decoding
#14716: Use unicode_writer API for str.format()

Victor
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

May 3, 2012, 5:52 PM

Post #2 of 9 (595 views)
Permalink
Re: Optimize Unicode strings in Python 3.3 [In reply to]

> Various notes:
> * PyUnicode_READ() is slower than reading a Py_UNICODE array.
> * Some decoders unroll the main loop to process 4 or 8 bytes (32 or
> 64 bits CPU) at each step.
>
> I am interested if you know other tricks to optimize Unicode strings
> in Python, or if you are interested to work on this topic.

Beyond creation, the most frequent approach is to specialize loops for
all three possible width, allowing the compiler to hard-code the element
size. This brings it back in performance to the speed of accessing a
Py_UNICODE array (or faster for 1-byte strings).

A possible micro-optimization might be to use pointer arithmetic instead
of indexing. However, I would expect that compilers will already convert
a counting loop into pointer arithmetic if the index is only ever used
for array access.

A source of slow-down appears to be widening copy operations. I wonder
whether microprocessors are able to do this faster than what the compiler
generates out of a naive copying loop.

Another potential area for further optimization is to better pass-through
PyObject*. Some APIs still use char* or Py_UNICODE*, when the caller actually
holds a PyObject*, and the callee ultimate recreates an object out of the
pointers being passed.

Some people (hi Larry) still think that using a rope representation for
string concatenation might improve things, see #1569040.

Regards,
Martin


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


storchaka at gmail

May 4, 2012, 2:00 AM

Post #3 of 9 (583 views)
Permalink
Re: Optimize Unicode strings in Python 3.3 [In reply to]

04.05.12 02:45, Victor Stinner написав(ла):
> * Two steps: compute the length and maximum character of the output
> string, allocate the output string and then write characters. str%args
> was using it.
> * Optimistic approach. Start with a ASCII buffer, enlarge and widen
> (to UCS2 and then UCS4) the buffer when new characters are written.
> Approach used by the UTF-8 decoder and by str%args since today.

In real today UTF-8 decoder uses two-steps approach. Only after
encountering an error it switches to optimistic approach.

> The optimistic approach uses realloc() to resize the string. It is
> faster than the PyAccu approach (at least for short ASCII strings),
> maybe because it avoids the creating of temporary short strings.
> realloc() looks to be efficient on Linux and Windows (at least Seven).

IMHO, realloc() has no relationship to this. The case in the cost of
managing of the list and creating of temporary strings.

> Various notes:
> * PyUnicode_READ() is slower than reading a Py_UNICODE array.

And PyUnicode_WRITE() is slower than writing a Py_UNICODE/PyUCS* array.

> * Some decoders unroll the main loop to process 4 or 8 bytes (32 or
> 64 bits CPU) at each step.

Note, this is not only CPU-, but OS-depending (LP64 vs LLP64).

> I am interested if you know other tricks to optimize Unicode strings
> in Python, or if you are interested to work on this topic.

Optimized ASCII decoder (issue 14419) is not only reads 4 or 8 bytes at
a time, but writes them all at a time. This is a very specific optimization.

More general principle is replacing serial scanning and translating on
an one-pass optimistic reading and writing. This improves the efficiency
of the memory cache.

I'm going to try it in UTF-8 decoder, it will allow to increase the
speed of decoding ASCII-only strings up to speed of optimized ASCII decoder.

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


victor.stinner at gmail

May 29, 2012, 3:44 PM

Post #4 of 9 (565 views)
Permalink
Re: Optimize Unicode strings in Python 3.3 [In reply to]

Hi,

>  * Use a Py_UCS4 buffer and then convert to the canonical form (ASCII,
> UCS1 or UCS2). Approach taken by io.StringIO. io.StringIO is not only
> used to write, but also to read and so a Py_UCS4 buffer is a good
> compromise.
>  * PyAccu API: optimized version of chunks=[]; for ...: ...
> chunks.append(text); return ''.join(chunks).
>  * Two steps: compute the length and maximum character of the output
> string, allocate the output string and then write characters. str%args
> was using it.
>  * Optimistic approach. Start with a ASCII buffer, enlarge and widen
> (to UCS2 and then UCS4) the buffer when new characters are written.
> Approach used by the UTF-8 decoder and by str%args since today.

I ran extensive benchmarks on these 4 methods for str%args and str.format(args).

The "two steps" method is not promising: parsing the format string
twice is slower than other methods.

The PyAccu API is faster than a Py_UCS4 buffer to concatenate a lot of
strings, but it is slower in many other cases.

I implemented the last method as the new internal "_PyUnicodeWriter"
API: resize / widen the string buffer when writing new characters. I
implemented more optimizations:
* overallocate the buffer to limit the cost of realloc()
* write characters directly in the buffer, avoid temporary buffers
when possible (it is possible in most cases)
* disable overallocation when formating the last argument
* don't copy by value but copy by reference if the result is just a
string (optimization already implemented indirectly in the PyAccu API)

The _PyUnicodeWriter is the fastest method: it gives a speed up of 30%
over the Py_UCS4 / PyAccu in general, and from 60% to 100% in some
specific cases!

I also compared str%args and str.format() with Python 2.7 (byte
strings), 3.2 (UTF-16 or UCS-4) and 3.3 (PEP 393): Python 3.3 is as
fast as Python 2.7 and sometimes faster! (Whereras Python 3.2 is 10 to
30% slower than Python 2 in general)

--

I wrote a tool to run benchmarks and to compare results:
https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py
https://bitbucket.org/haypo/misc/src/tip/python/bench_str.py

Run the benchmark:
./python benchmark.py --file=FILE script bench_str.py

Compare results:
./python benchmark.py compare_to FILE1 FILE2 FILE3 ...

--

Python 2.7 vs 3.2 vs 3.3:

http://bugs.python.org/file25685/REPORT_32BIT_2.7_3.2_writer
http://bugs.python.org/file25687/REPORT_64BIT_2.7_3.2_writer
http://bugs.python.org/file25757/report_windows7

Warning: For the Windows benchmark, Python 3.3 is compiled in 32 bits,
whereas 2.7 and 3.2 are compiled in 64 bits (formatting integers is
slower in 32 bits).

--

UCS4 vs PyAccu vs _PyUnicodeWriter:

http://bugs.python.org/file25686/REPORT_32BIT_3.3
http://bugs.python.org/file25688/REPORT_64BIT_3.3

Victor
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

May 29, 2012, 3:51 PM

Post #5 of 9 (547 views)
Permalink
Re: Optimize Unicode strings in Python 3.3 [In reply to]

On Wed, May 30, 2012 at 8:44 AM, Victor Stinner
<victor.stinner [at] gmail> wrote:
> I also compared str%args and str.format() with Python 2.7 (byte
> strings), 3.2 (UTF-16 or UCS-4) and 3.3 (PEP 393): Python 3.3 is as
> fast as Python 2.7 and sometimes faster! (Whereras Python 3.2 is 10 to
> 30% slower than Python 2 in general)

Very cool news!

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


v+python at g

May 29, 2012, 9:45 PM

Post #6 of 9 (539 views)
Permalink
Re: Optimize Unicode strings in Python 3.3 [In reply to]

On 5/29/2012 3:51 PM, Nick Coghlan wrote:
> On Wed, May 30, 2012 at 8:44 AM, Victor Stinner
> <victor.stinner [at] gmail> wrote:
>> I also compared str%args and str.format() with Python 2.7 (byte
>> strings), 3.2 (UTF-16 or UCS-4) and 3.3 (PEP 393): Python 3.3 is as
>> fast as Python 2.7 and sometimes faster! (Whereras Python 3.2 is 10 to
>> 30% slower than Python 2 in general)
> Very cool news!
>
> Cheers,
> Nick.
>
Very cool indeed! Thanks, Victor. I have programs that are just full of
formatting operations, that will benefit from this work.


storchaka at gmail

May 30, 2012, 2:32 AM

Post #7 of 9 (537 views)
Permalink
Re: Optimize Unicode strings in Python 3.3 [In reply to]

On 30.05.12 01:44, Victor Stinner wrote:
> The "two steps" method is not promising: parsing the format string
> twice is slower than other methods.

The "1.5 steps" method is more promising -- first parse the format
string in an efficient internal representation, and then allocate the
output string and then write characters (or enlarge and widen the
buffer, but with more information in any case). The internal
representation can be cached (as for struct module) that for a repeated
formatting will reduce the cost of parsing to zero.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


victor.stinner at gmail

May 30, 2012, 4:26 AM

Post #8 of 9 (543 views)
Permalink
Re: Optimize Unicode strings in Python 3.3 [In reply to]

>> The "two steps" method is not promising: parsing the format string
>> twice is slower than other methods.
>
> The "1.5 steps" method is more promising -- first parse the format string in
> an efficient internal representation, and then allocate the output string
> and then write characters (or enlarge and widen the buffer, but with more
> information in any case). The internal representation can be cached (as for
> struct module) that for a repeated formatting will reduce the cost of
> parsing to zero.

I implemented something like that, and it was not efficient and very complex.

See for example the (incomplete) patch for str%args attached to the
issue #14687:
http://bugs.python.org/file25413/pyunicode_format-2.patch

IMO this approach is less efficient than the "Unicode writer" approach because:

- you have to create many substrings or temporary strings in the
first step, or (worse) compute each argument twice: the writer
approach is more efficient here because it avoids computing substrings
and temporary strings

- you have to parse the format string twice, or you have to write two
versions of the code: first create a list of items, then concatenate
items. The PyAccu method concatenates substrings at the end, it is
less efficient than the writer method (e.g. it has to create a string
of N fill characters to pad to WIDTH characters).

- the code is more complex than the writer method (which is very
similar to what is used in Python 2.7 and 3.2)

I wrote a much more complex patch for str%args to remember variables
of the first step to avoid most of the parsing work in the second
step. The patch was very complex and hard to maintain. I chose to not
publish it and try another approach (the Unicode writer).

Note: I'm talking about str%args and str.format(args), the Unicode
writer is not the most efficient method for *any* function creating
strings!

Victor
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


storchaka at gmail

May 30, 2012, 5:08 AM

Post #9 of 9 (547 views)
Permalink
Re: Optimize Unicode strings in Python 3.3 [In reply to]

On 30.05.12 14:26, Victor Stinner wrote:
> I implemented something like that, and it was not efficient and very complex.
>
> See for example the (incomplete) patch for str%args attached to the
> issue #14687:
> http://bugs.python.org/file25413/pyunicode_format-2.patch

I have seen and commented on this patch. That's not what I'm talking about.

> IMO this approach is less efficient than the "Unicode writer" approach because:

I brought this approach is not for the opposition of the "Unicode
writer", and for comparison with a straight "two steps" method. Of
course, this can be combined with the "Unicode writer" to get the
benefits of both methods. For example, you can advance to widen the
output buffer to a width of format string, or disable overallocation
when formating the last argument with non-empty suffix.

> - you have to create many substrings or temporary strings in the
> first step, or (worse) compute each argument twice: the writer
> approach is more efficient here because it avoids computing substrings
> and temporary strings

Not on the first step but on the second step (and this is the only
single step if you use caching), if you use the "Unicode writer".

> - you have to parse the format string twice, or you have to write two
> versions of the code: first create a list of items, then concatenate
> items. The PyAccu method concatenates substrings at the end, it is
> less efficient than the writer method (e.g. it has to create a string
> of N fill characters to pad to WIDTH characters).

The code is divided into the compiler and the interpreter. Only the
first one parses the format string. See Modules/_struct.c.

> - the code is more complex than the writer method (which is very
> similar to what is used in Python 2.7 and 3.2)

The code that uses the writer method to be rather complicated, the
difference in the total complexity of these approaches has become
smaller. ;-)

But it is really not easy work, not assure success, so let waits for its
time.

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.