Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

3.3 str timings

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


tjreedy at udel

Aug 18, 2012, 2:17 PM

Post #1 of 18 (323 views)
Permalink
3.3 str timings

The issue came up in python-list about string operations being slower in
3.3. (The categorical claim is false as some things are actually
faster.) Some things I understand, this one I do not.

Win7-64, 3.3.0b2 versus 3.2.3
print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230
# .6 in 3.2, 1.2 in 3.3

Why is searching for a two-byte char in a two-bytes per char string so
much faster in 3.2? Is this worth a tracker issue (I searched and could
not find one) or is there a known and un-fixable cause?

print(timeit("a.encode()", "a = 'a'*1000"))
# 1.5 in 3.2, .26 in 3.3

print(timeit("a.encode(encoding='utf-8')", "a = 'a'*1000"))
# 1.7 in 3.2, .51 in 3.3

This is one of the 3.3 improvements. But since the results are equal:
('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8')
and 3.3 should know that for an all-ascii string, I do not see why
adding the parameter should double the the time. Another issue or known
and un-fixable?

--
Terry Jan Reedy


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Aug 18, 2012, 2:27 PM

Post #2 of 18 (318 views)
Permalink
Re: 3.3 str timings [In reply to]

On Sat, 18 Aug 2012 17:17:14 -0400
Terry Reedy <tjreedy [at] udel> wrote:
> The issue came up in python-list about string operations being slower in
> 3.3. (The categorical claim is false as some things are actually
> faster.) Some things I understand, this one I do not.
>
> Win7-64, 3.3.0b2 versus 3.2.3
> print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230
> # .6 in 3.2, 1.2 in 3.3

I get opposite numbers:

$ python3.2 -m timeit -s "c = '…'; a = 'a'*1000+c" "c in a"
1000000 loops, best of 3: 0.599 usec per loop
$ python3.3 -m timeit -s "c = '…'; a = 'a'*1000+c" "c in a"
10000000 loops, best of 3: 0.119 usec per loop

However, in both cases the operation is blindingly fast (less than
1µs), which should make it pretty much a non-issue.

> Why is searching for a two-byte char in a two-bytes per char string so
> much faster in 3.2? Is this worth a tracker issue (I searched and could
> not find one) or is there a known and un-fixable cause?

I don't think it's worth a tracker issue. First, because as said above
it's practically a non-issue. Second, given the nature and depth of
changes brought by the switch to the PEP 393 implementation, an
individual micro-benchmark like this is not very useful; you'd need to
make a more extensive analysis of string performance (as a hint, we
have the stringbench benchmark in the Tools directory).

> This is one of the 3.3 improvements. But since the results are equal:
> ('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8')
> and 3.3 should know that for an all-ascii string, I do not see why
> adding the parameter should double the the time. Another issue or known
> and un-fixable?

When observing performance differences, you should ask yourself whether
they matter at all or not.

Regards

Antoine.



--
Software development and contracting: http://pro.pitrou.net


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Aug 18, 2012, 2:34 PM

Post #3 of 18 (319 views)
Permalink
Re: 3.3 str timings [In reply to]

Zitat von Terry Reedy <tjreedy [at] udel>:

> Is this worth a tracker issue (I searched and could not find one) or
> is there a known and un-fixable cause?

There is a third option: it's not known, but it's also unimportant.
I'd say posting it to python-dev is enough: either there is somebody
with sufficient time and interest to research it and provide you
with an explanation (or a fix). If nobody picks it up right away,
it's IMO fine to wait for somebody to report it who has a real
problem with this change in runtime.

Regards,
Martin


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


rdmurray at bitdance

Aug 18, 2012, 4:19 PM

Post #4 of 18 (318 views)
Permalink
Re: 3.3 str timings [In reply to]

On Sat, 18 Aug 2012 17:17:14 -0400, Terry Reedy <tjreedy [at] udel> wrote:
> print(timeit("a.encode()", "a = 'a'*1000"))
> # 1.5 in 3.2, .26 in 3.3
>
> print(timeit("a.encode(encoding='utf-8')", "a = 'a'*1000"))
> # 1.7 in 3.2, .51 in 3.3
>
> This is one of the 3.3 improvements. But since the results are equal:
> ('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8')
> and 3.3 should know that for an all-ascii string, I do not see why
> adding the parameter should double the the time. Another issue or known
> and un-fixable?

At one point there was an issue with certain spellings taking a fast path
(avoiding a codec lookup?) and other spellings not. I thought we'd fixed
that, but perhaps we didn't?

--David
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


tjreedy at udel

Aug 18, 2012, 7:54 PM

Post #5 of 18 (314 views)
Permalink
Re: 3.3 str timings [In reply to]

On 8/18/2012 5:27 PM, Antoine Pitrou wrote:
> On Sat, 18 Aug 2012 17:17:14 -0400
> Terry Reedy <tjreedy [at] udel> wrote:
>> The issue came up in python-list about string operations being slower in
>> 3.3. (The categorical claim is false as some things are actually
>> faster.) Some things I understand, this one I do not.
>>
>> Win7-64, 3.3.0b2 versus 3.2.3
>> print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230
>> # .6 in 3.2, 1.2 in 3.3
>
> I get opposite numbers:

Just curious, what system?
>
> $ python3.2 -m timeit -s "c = '…'; a = 'a'*1000+c" "c in a"
> 1000000 loops, best of 3: 0.599 usec per loop
> $ python3.3 -m timeit -s "c = '…'; a = 'a'*1000+c" "c in a"
> 10000000 loops, best of 3: 0.119 usec per loop
>
> However, in both cases the operation is blindingly fast (less than
> 1µs), which should make it pretty much a non-issue.

The current default 'number' of 1000000 is higher that I remember. Good
to know.

>> Why is searching for a two-byte char in a two-bytes per char string so
>> much faster in 3.2? Is this worth a tracker issue (I searched and could
>> not find one) or is there a known and un-fixable cause?
>
> I don't think it's worth a tracker issue. First, because as said above
> it's practically a non-issue. Second, given the nature and depth of
> changes brought by the switch to the PEP 393 implementation, an
> individual micro-benchmark like this is not very useful; you'd need to
> make a more extensive analysis of string performance (as a hint, we
> have the stringbench benchmark in the Tools directory).

It is not in my 3.3.0b2 windows install, but I have heard of it. Another
good reminder. My main interest was in refuting '3.3 strings ops are
always slower'. Both points above are also good 'ammo'. I am sure this
discussion will re-occur after the release.

--
Terry Jan Reedy


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


lukasz at langa

Aug 19, 2012, 2:53 AM

Post #6 of 18 (315 views)
Permalink
Re: 3.3 str timings [In reply to]

WiadomoϾ napisana przez Antoine Pitrou <solipsis [at] pitrou> w dniu 18 sie 2012, o godz. 23:27:

> On Sat, 18 Aug 2012 17:17:14 -0400
> Terry Reedy <tjreedy [at] udel> wrote:
>> The issue came up in python-list about string operations being slower in
>> 3.3. (The categorical claim is false as some things are actually
>> faster.) Some things I understand, this one I do not.
>>
>> Win7-64, 3.3.0b2 versus 3.2.3
>> print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230
>> # .6 in 3.2, 1.2 in 3.3
>
> I get opposite numbers:

Me too. 3.2 is slower for me in every case. Mac OS X 10.8.

--
Best regards,
£ukasz Langa
Senior Systems Architecture Engineer

IT Infrastructure Department
Grupa Allegro Sp. z o.o.

http://lukasz.langa.pl/
+48 791 080 144


victor.stinner at gmail

Aug 21, 2012, 6:04 AM

Post #7 of 18 (297 views)
Permalink
Re: 3.3 str timings [In reply to]

2012/8/18 Terry Reedy <tjreedy [at] udel>:
> The issue came up in python-list about string operations being slower in
> 3.3. (The categorical claim is false as some things are actually faster.)

Yes, some operations are slower, but others are faster :-) There was
an important effort to limit the overhead of the PEP 393 (when the
branch was merged, most operations were slower). I tried to fix all
performance regressions. If you find cases where Python 3.3 is slower,
I can investigate and try to optimize it (in Python 3.4) or at least
explain why it is slower :-)

As said by Antoine, use the stringbench tool if you would like to get
a first overview of string performances.

> Some things I understand, this one I do not.
>
> Win7-64, 3.3.0b2 versus 3.2.3
> print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230
> # .6 in 3.2, 1.2 in 3.3

On Linux with narrow build (UTF-16), I get:

$ python3.2 -m timeit -s "c=chr(8230); a='a'*1000+c" "c in a"
100000 loops, best of 3: 4.25 usec per loop
$ python3.3 -m timeit -s "c=chr(8230); a='a'*1000+c" "c in a"
100000 loops, best of 3: 3.21 usec per loop

Linux-2.6.30.10-105.2.23.fc11.i586-i686-with-fedora-11-Leonidas
Python 3.2.2+ (3.2:1453d2fe05bf, Aug 21 2012, 14:21:05)
Python 3.3.0b2+ (default:b36ce0a3a844, Aug 21 2012, 14:05:23)

I'm not sure that I read your benchmark correctly: you write c='...'
and then ord(c)=8230. Algorithms to find a substring are different if
the substring is a single character or if the substring is longer. For
1 character, Antoine Pitrou modified the code to use memchr() and
memrchr(), even if the string is not UCS1 (if this benchmark, the
string uses a UCS2 storage): it may find false positives.

> Why is searching for a two-byte char in a two-bytes per char string so much
> faster in 3.2?

Can you reproduce your benchmark on other Windows platforms? Do you
run the benchmark more than once? I always run a benchmark 3 times.

I don't like the timeit module for micro benchmarks, it is really
unstable (default settings are not written for micro benchmarks).
Example of 4 runs on the same platform:

$ ./python -m timeit -s "a='a'*1000" "a.encode()"
100000 loops, best of 3: 2.79 usec per loop
$ ./python -m timeit -s "a='a'*1000" "a.encode()"
100000 loops, best of 3: 2.61 usec per loop
$ ./python -m timeit -s "a='a'*1000" "a.encode()"
100000 loops, best of 3: 3.16 usec per loop
$ ./python -m timeit -s "a='a'*1000" "a.encode()"
100000 loops, best of 3: 2.76 usec per loop

I wrote my own benchmark tool, based on timeit, to have more stable
results on micro benchmarks:
https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py

Example of 4 runs:

3.18 us: c=chr(8230); a='a'*1000+c; c in a
3.18 us: c=chr(8230); a='a'*1000+c; c in a
3.21 us: c=chr(8230); a='a'*1000+c; c in a
3.18 us: c=chr(8230); a='a'*1000+c; c in a

My benchmark.py script calibrates automatically the number of loops to
take at least 100 ms, and then repeat the test during at least 1.0
second.

Using time instead of a fixed number of loops is more reliable because
the test is less dependent on the system activity.

> print(timeit("a.encode()", "a = 'a'*1000"))
> # 1.5 in 3.2, .26 in 3.3
>
> print(timeit("a.encode(encoding='utf-8')", "a = 'a'*1000"))
> # 1.7 in 3.2, .51 in 3.3

This test doesn't compare performances of the UTF-8 encoder: "encode"
an ASCII string to UTF-8 in Python 3.3 is a no-op, it just duplicates
the memory (ASCII is compatible with UTF-8)...

So your benchmark just measures the performances of
PyArg_ParseTupleAndKeywords()... Try also str.encode('utf-8').

If you want to benchmark the UTF-8 encoder, use at least a non-ASCII
character like "\x80".

At least, your benchmark shows that Python 3.3 is *much* faster than
Python 3.2 to "encode" pure ASCII strings to UTF-8 :-)

Victor
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


agriff at tin

Aug 21, 2012, 8:20 AM

Post #8 of 18 (294 views)
Permalink
Re: 3.3 str timings [In reply to]

> My benchmark.py script calibrates automatically the number of loops to
> take at least 100 ms, and then repeat the test during at least 1.0
> second.
>
> Using time instead of a fixed number of loops is more reliable because
> the test is less dependent on the system activity.

I've also been bitten in the past by something that is probably quite
obvious but I didn't think to, that is dynamic cpu frequency. Many
modern CPUs can dynamically change the frequency depending on the load
and temperature and the switch can take more than one second.

When doing benchmarks now I've a small script (based on cpufreq-set)
that just blocks all the cores into fast mode.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Aug 21, 2012, 8:28 AM

Post #9 of 18 (294 views)
Permalink
Re: 3.3 str timings [In reply to]

>> print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230

> I'm not sure that I read your benchmark correctly: you write c='...'

Apparenly you didn't - or your MUA was not able to display it
correctly. He didn't say

'...' # U+002E U+002E U+002E, 3x FULL STOP

but

'…' # U+2026, HORIZONTAL ELLIPSIS

Regards,
Martin


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Aug 21, 2012, 8:53 AM

Post #10 of 18 (297 views)
Permalink
Re: 3.3 str timings [In reply to]

On Tue, 21 Aug 2012 17:20:14 +0200
Andrea Griffini <agriff [at] tin> wrote:
> > My benchmark.py script calibrates automatically the number of loops to
> > take at least 100 ms, and then repeat the test during at least 1.0
> > second.
> >
> > Using time instead of a fixed number of loops is more reliable because
> > the test is less dependent on the system activity.
>
> I've also been bitten in the past by something that is probably quite
> obvious but I didn't think to, that is dynamic cpu frequency. Many
> modern CPUs can dynamically change the frequency depending on the load
> and temperature and the switch can take more than one second.
>
> When doing benchmarks now I've a small script (based on cpufreq-set)
> that just blocks all the cores into fast mode.

For the record, under Linux, the following command:

$ sudo cpufreq-set -rg performance

should do the trick.

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


steve at pearwood

Aug 21, 2012, 10:25 AM

Post #11 of 18 (296 views)
Permalink
Re: 3.3 str timings [In reply to]

On 21/08/12 23:04, Victor Stinner wrote:

> I don't like the timeit module for micro benchmarks, it is really
> unstable (default settings are not written for micro benchmarks).
[...]
> I wrote my own benchmark tool, based on timeit, to have more stable
> results on micro benchmarks:
> https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py

I am surprised, because the whole purpose of timeit is to time micro
code snippets.

If it is as unstable as you suggest, and if you have an alternative
which is more stable and accurate, I would love to see it in the
standard library.



--
Steven
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


python-dev at masklinn

Aug 21, 2012, 10:56 AM

Post #12 of 18 (295 views)
Permalink
Re: 3.3 str timings [In reply to]

On 21 août 2012, at 19:25, Steven D'Aprano <steve [at] pearwood> wrote:
> On 21/08/12 23:04, Victor Stinner wrote:
>
>> I don't like the timeit module for micro benchmarks, it is really
>> unstable (default settings are not written for micro benchmarks).
> [...]
>> I wrote my own benchmark tool, based on timeit, to have more stable
>> results on micro benchmarks:
>> https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py
>
> I am surprised, because the whole purpose of timeit is to time micro
> code snippets.

And when invoked from the command-line, it is already time-based: unless -n is specified, python guesstimates the number of iterations to be a power of 10 resulting in at least 0.2s per test (the repeat defaults to 3 though)

As a side-note, every time I use timeit programmatically, it annoys me that this behavior is not available and has to be implemented manually.

> If it is as unstable as you suggest, and if you have an alternative
> which is more stable and accurate, I would love to see it in the
> standard library.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Aug 21, 2012, 11:38 AM

Post #13 of 18 (296 views)
Permalink
Re: 3.3 str timings [In reply to]

Xavier Morel, 21.08.2012 19:56:
> On 21 août 2012, at 19:25, Steven D'Aprano wrote:
>> On 21/08/12 23:04, Victor Stinner wrote:
>>> I don't like the timeit module for micro benchmarks, it is really
>>> unstable (default settings are not written for micro benchmarks).
>> [...]
>>> I wrote my own benchmark tool, based on timeit, to have more stable
>>> results on micro benchmarks:
>>> https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py
>>
>> I am surprised, because the whole purpose of timeit is to time micro
>> code snippets.
>
> And when invoked from the command-line, it is already time-based: unless
> -n is specified, python guesstimates the number of iterations to be a
> power of 10 resulting in at least 0.2s per test (the repeat defaults to
> 3 though)
>
> As a side-note, every time I use timeit programmatically, it annoys me
> that this behavior is not available and has to be implemented manually.

+100, sounds like someone should contribute a patch for this.

Stefan


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


alexander.belopolsky at gmail

Aug 21, 2012, 11:39 AM

Post #14 of 18 (299 views)
Permalink
Re: 3.3 str timings [In reply to]

On Tue, Aug 21, 2012 at 1:56 PM, Xavier Morel <python-dev [at] masklinn> wrote:
> As a side-note, every time I use timeit programmatically, it annoys me that this behavior is not available and has to be implemented manually.

You are not alone:

http://bugs.python.org/issue6422
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Aug 21, 2012, 11:41 AM

Post #15 of 18 (295 views)
Permalink
Re: 3.3 str timings [In reply to]

On Wed, 22 Aug 2012 03:25:21 +1000
Steven D'Aprano <steve [at] pearwood> wrote:
> On 21/08/12 23:04, Victor Stinner wrote:
>
> > I don't like the timeit module for micro benchmarks, it is really
> > unstable (default settings are not written for micro benchmarks).
> [...]
> > I wrote my own benchmark tool, based on timeit, to have more stable
> > results on micro benchmarks:
> > https://bitbucket.org/haypo/misc/src/tip/python/benchmark.py
>
> I am surprised, because the whole purpose of timeit is to time micro
> code snippets.
>
> If it is as unstable as you suggest, and if you have an alternative
> which is more stable and accurate, I would love to see it in the
> standard library.

In my experience timeit is stable enough to know whether a change is
significant or not. No need for three-digit precision when the
question is whether there is at least a 10% performance difference
between two approaches.

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


storchaka at gmail

Aug 21, 2012, 1:36 PM

Post #16 of 18 (294 views)
Permalink
Re: 3.3 str timings [In reply to]

On 19.08.12 00:17, Terry Reedy wrote:
> This is one of the 3.3 improvements. But since the results are equal:
> ('a'*1000).encode() == ('a'*1000).encode(encoding='utf-8')
> and 3.3 should know that for an all-ascii string, I do not see why
> adding the parameter should double the the time. Another issue or known
> and un-fixable?

This is a cost of argument packing/unpacking.

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


tjreedy at udel

Aug 21, 2012, 3:08 PM

Post #17 of 18 (296 views)
Permalink
Re: 3.3 str timings [In reply to]

On 8/21/2012 9:04 AM, Victor Stinner wrote:
> 2012/8/18 Terry Reedy <tjreedy [at] udel>:
>> The issue came up in python-list about string operations being slower in
>> 3.3. (The categorical claim is false as some things are actually faster.)
>
> Yes, some operations are slower, but others are faster :-)

Yes, that is what I wrote, showed, and posted to python-list :-)

I was and am posting here in response to a certain French writer who
dislikes the fact that 3.3 unicode favors text written with the first
256 code points, which do not include all the characters needed for
French, and do not include the euro symbol invented years after that set
was established. His opinion aside, his search for 'evidence' did turn
up a version of the example below.

> an important effort to limit the overhead of the PEP 393 (when the
> branch was merged, most operations were slower). I tried to fix all
> performance regressions.

Yes, I read and appreciated the speed-up patches by you and others.

> If you find cases where Python 3.3 is slower,
> I can investigate and try to optimize it (in Python 3.4) or at least
> explain why it is slower :-)

Replacement appears to be as much as 6.5 times slower on some Win 7
machines. (I factored out the setup part, which increased the ratio
since it takes the same time on both machines.)

ttr = timeit.repeat
# 3.2.3
>>> ttr("euroreplace('€', 'œ')", "euroreplace = ('€'*100).replace")
[0.385043233078477, 0.35294282203631155, 0.3468394370770511]

# 3.3.0b2
>>> ttr("euroreplace('€', 'œ')", "euroreplace = ('€'*100).replace")
[2.2624885911213823, 2.245330314124203, 2.2531118686461014]

How do this compare on *nix?

> As said by Antoine, use the stringbench tool if you would like to get
> a first overview of string performances.

I found it, ran it on 3.2 and 3.3, and posted to python-list that 3.3
unicode looks quite good. It is overall comparable to both byte
operations and 3.2 unicode operations. Replace operations were
relatively the slowest, though I do not remember any as bad as the
example above.

>> Some things I understand, this one I do not.
>>
>> Win7-64, 3.3.0b2 versus 3.2.3
>> print(timeit("c in a", "c = '…'; a = 'a'*1000+c")) # ord(c) = 8230
>> # .6 in 3.2, 1.2 in 3.3
>
> On Linux with narrow build (UTF-16), I get:
>
> $ python3.2 -m timeit -s "c=chr(8230); a='a'*1000+c" "c in a"
> 100000 loops, best of 3: 4.25 usec per loop
> $ python3.3 -m timeit -s "c=chr(8230); a='a'*1000+c" "c in a"
> 100000 loops, best of 3: 3.21 usec per loop

The slowdown seems to be specific to (some?) windows systems. Perhaps we
as hitting a difference in the VC2008 and VC2010 compilers or runtimes.
Someone on python-list wondered whether the 3.3.0 betas have the same
compile optimization settings as 3.2.3 final. Martin?

> Can you reproduce your benchmark on other Windows platforms? Do you
> run the benchmark more than once? I always run a benchmark 3 times.

Always, and now I see the repeat does this for me.

> I don't like the timeit module for micro benchmarks, it is really
> unstable (default settings are not written for micro benchmarks).

I am reporting rounded lowest times. As other said, make timeit better
if you can.

>> print(timeit("a.encode()", "a = 'a'*1000"))
>> # 1.5 in 3.2, .26 in 3.3
>>
>> print(timeit("a.encode(encoding='utf-8')", "a = 'a'*1000"))
>> # 1.7 in 3.2, .51 in 3.3
>
> This test doesn't compare performances of the UTF-8 encoder: "encode"
> an ASCII string to UTF-8 in Python 3.3 is a no-op, it just duplicates
> the memory (ASCII is compatible with UTF-8)...

That is what I thought, and why I was puzzled, ...

> So your benchmark just measures the performances of
> PyArg_ParseTupleAndKeywords()...,

having forgotten about arg processing. I should have factored out the
.encode lookup (as I did with .replace). The following suggests that you
are correct. The difference, about .3, is independent of the length of
string being copied.

>>> ttr("aenc()", "aenc = ('a'*10000).encode")
[0.588499543029684, 0.5760222493490801, 0.5757037691037112]
>>> ttr("aenc(encoding='utf-8')", "aenc = ('a'*10000).encode")
[0.8973955632254729, 0.887000380270365, 0.884113153942053]

>>> ttr("aenc()", "aenc = ('a'*50000).encode")
[3.6618914099180984, 3.650091040467487, 3.6542183723140624]
>>> ttr("aenc(encoding='utf-8')", "aenc = ('a'*50000).encode")
[3.964849740958016, 3.9363826484832316, 3.937290440151628]

--
Terry Jan Reedy


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Aug 21, 2012, 3:46 PM

Post #18 of 18 (295 views)
Permalink
Re: 3.3 str timings [In reply to]

Zitat von Terry Reedy <tjreedy [at] udel>:

> I was and am posting here in response to a certain French writer who
> dislikes the fact that 3.3 unicode favors text written with the
> first 256 code points, which do not include all the characters
> needed for French, and do not include the euro symbol invented years
> after that set was established. His opinion aside, his search for
> 'evidence' did turn up a version of the example below.

I personally don't see a need to "defend" this or any other deliberate
change. There is a need to defend changes before they are made, to convince
co-contributors and other Python users, this is what the PEP process is
good for. One point of the PEP process is that once the PEP is accepted,
discussion ought to stop - or anybody continuing in discussion doesn't
deserve an answer by anybody not interested.

Anybody who doesn't like the change is free not to use Python 3.3, or
stay at 2.7, use PyPy, or switch to Ruby altogether. Neither bothers
me to the slightest. If people find proper bugs, they are encouraged
to report them; if they contribute patches along, the better. If they
merely want to complain - let them complain. If they want to see an
agreed-upon patch reverted, they can try to lobby a BDFL pronouncement.

I certainly think the performance of str in 3.3 is fine, and thought
so even before Serhiy or Victor submitted their patches. I actually
dislike some of the code complication that these improvements brought,
but I can accept that a certain loss of maintainability that gives
better performance makes a lot of people happy. But I will continue
to object further complications that support irrelevant special
cases.

Regards,
Martin


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.