Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Benchmarking Python 3.3 against Python 2.7 (wide build)

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


brett at python

Sep 30, 2012, 4:12 PM

Post #1 of 16 (3905 views)
Permalink
Benchmarking Python 3.3 against Python 2.7 (wide build)

I am presenting the talk "Python 3.3: Trust Me, It's Better Than 2.7" as
PyCon Argentina and Brasil (and US if they accept the talk). As part of
that talk I need to be able to benchmark Python 3.3 against 2.7 (both from
tip) using the unladen benchmarks (which now include benchmarks from PyPy
that can be relatively easily ported to Python 3).

To make sure the unladen benchmarks run fine against Python 3.3, I did a
fast run of the benchmarks. I figured people might be interested in the
quick-and-dirty results on my 2 GHz Intel Core i7 MacBook Pro w/ 8 GB RAM
and no attempt to control for performance beyond not actively browsing the
web. As I said, quick-and-dirty and not authoritative; all done just to
make sure all the benchmarks could run to completion (including the django,
html5lib, and genshi benchmarks which are only on my laptop ATM until those
projects cut a release with official Python 3 support).

One thing to keep in mind is that many benchmarks use a raw str for things,
so the benchmarks often compare Python 2.7 str vs. Python 3.3 str (i.e. str
vs. unicode). While this might seem unfair, this is what real-world
comparisons in performance will be from users so it's an (somewhat unfair)
comparison that we just have to live with. I might take the time to try to
make some tests run under both raw strings and unicode so both comparisons
are available.

If you care about helping out with the benchmarks (e.g. helping spot where
the iteration counts should be higher, etc.) then head over to the
speed [at] mailin list.



> python3 perf.py -T --basedir ../benchmarks -f -b py3k
../cpython/builds/2.7-wide/bin/python ../cpython/builds/3.3/bin/python3.3

... output about the command line for the benchmarks ...

### 2to3 ###
0.785234 -> 0.722169: 1.09x faster

### call_method ###
Min: 0.491433 -> 0.414841: 1.18x faster
Avg: 0.493640 -> 0.416564: 1.19x faster
Significant (t=127.21)
Stddev: 0.00170 -> 0.00162: 1.0513x smaller

### call_method_slots ###
Min: 0.492749 -> 0.416280: 1.18x faster
Avg: 0.497888 -> 0.419275: 1.19x faster
Significant (t=61.72)
Stddev: 0.00433 -> 0.00237: 1.8304x smaller

### call_method_unknown ###
Min: 0.575536 -> 0.427234: 1.35x faster
Avg: 0.577286 -> 0.433428: 1.33x faster
Significant (t=66.09)
Stddev: 0.00117 -> 0.00835: 7.1621x larger

### call_simple ###
Min: 0.413011 -> 0.338923: 1.22x faster
Avg: 0.415862 -> 0.340699: 1.22x faster
Significant (t=111.94)
Stddev: 0.00223 -> 0.00134: 1.6616x smaller

### chaos ###
Min: 0.375286 -> 0.435456: 1.16x slower
Avg: 0.382798 -> 0.459515: 1.20x slower
Significant (t=-5.01)
Stddev: 0.01116 -> 0.03234: 2.8980x larger

### fastpickle ###
Min: 0.853560 -> 0.770580: 1.11x faster
Avg: 0.879498 -> 0.776249: 1.13x faster
Significant (t=8.24)
Stddev: 0.02771 -> 0.00407: 6.7995x smaller

### float ###
Min: 0.476596 -> 0.391101: 1.22x faster
Avg: 0.486164 -> 0.411553: 1.18x faster
Significant (t=9.07)
Stddev: 0.01049 -> 0.01511: 1.4411x larger

### formatted_logging ###
Min: 0.346703 -> 0.451643: 1.30x slower
Avg: 0.351218 -> 0.454626: 1.29x slower
Significant (t=-51.50)
Stddev: 0.00376 -> 0.00246: 1.5265x smaller

### genshi ###
Min: 0.275107 -> 0.294309: 1.07x slower
Avg: 0.287433 -> 0.299026: 1.04x slower
Significant (t=-3.82)
Stddev: 0.01077 -> 0.00467: 2.3044x smaller

### go ###
Min: 0.719160 -> 0.781042: 1.09x slower
Avg: 0.729322 -> 0.798135: 1.09x slower
Significant (t=-8.54)
Stddev: 0.01300 -> 0.01248: 1.0415x smaller

### hexiom2 ###
203.842661 -> 187.107363: 1.09x faster

### iterative_count ###
Min: 0.145088 -> 0.153285: 1.06x slower
Avg: 0.146369 -> 0.154425: 1.06x slower
Significant (t=-9.21)
Stddev: 0.00134 -> 0.00142: 1.0569x larger

### json_dump_v2 ###
Min: 3.512367 -> 4.040813: 1.15x slower
Avg: 3.521879 -> 4.057966: 1.15x slower
Significant (t=-64.29)
Stddev: 0.01071 -> 0.01526: 1.4247x larger

### json_load ###
Min: 1.024560 -> 0.642353: 1.60x faster
Avg: 1.025255 -> 0.644000: 1.59x faster
Significant (t=426.59)
Stddev: 0.00049 -> 0.00194: 3.9240x larger

### mako_v2 ###
Min: 0.137584 -> 0.287701: 2.09x slower
Avg: 0.140620 -> 0.293204: 2.09x slower
Significant (t=-296.14)
Stddev: 0.00243 -> 0.00272: 1.1195x larger

### meteor_contest ###
Min: 0.284739 -> 0.254285: 1.12x faster
Avg: 0.286174 -> 0.255323: 1.12x faster
Significant (t=38.02)
Stddev: 0.00124 -> 0.00133: 1.0725x larger

### nbody ###
Min: 0.491416 -> 0.336127: 1.46x faster
Avg: 0.493339 -> 0.337467: 1.46x faster
Significant (t=185.50)
Stddev: 0.00164 -> 0.00092: 1.7927x smaller

### normal_startup ###
Min: 0.639285 -> 0.898157: 1.40x slower
Avg: 0.645513 -> 0.901586: 1.40x slower
Significant (t=-90.10)
Stddev: 0.00575 -> 0.00270: 2.1309x smaller

### nqueens ###
Min: 0.399351 -> 0.429575: 1.08x slower
Avg: 0.403643 -> 0.430284: 1.07x slower
Significant (t=-9.83)
Stddev: 0.00603 -> 0.00053: 11.3092x smaller

### pathlib ###
Min: 0.137462 -> 0.170506: 1.24x slower
Avg: 0.145370 -> 0.172849: 1.19x slower
Significant (t=-11.09)
Stddev: 0.01232 -> 0.00128: 9.6403x smaller

### pidigits ###
Min: 0.400265 -> 0.379307: 1.06x faster
Avg: 0.401755 -> 0.381171: 1.05x faster
Significant (t=14.65)
Stddev: 0.00259 -> 0.00178: 1.4496x smaller

### raytrace ###
Min: 1.770596 -> 1.958350: 1.11x slower
Avg: 1.773719 -> 1.968401: 1.11x slower
Significant (t=-44.19)
Stddev: 0.00439 -> 0.00882: 2.0099x larger

### regex_effbot ###
Min: 0.076566 -> 0.098124: 1.28x slower
Avg: 0.077491 -> 0.098696: 1.27x slower
Significant (t=-54.47)
Stddev: 0.00052 -> 0.00069: 1.3227x larger

### regex_v8 ###
Min: 0.091530 -> 0.109116: 1.19x slower
Avg: 0.092308 -> 0.113627: 1.23x slower
Significant (t=-5.72)
Stddev: 0.00088 -> 0.00829: 9.4271x larger

### richards ###
Min: 0.257974 -> 0.232134: 1.11x faster
Avg: 0.259248 -> 0.234325: 1.11x faster
Significant (t=23.80)
Stddev: 0.00144 -> 0.00185: 1.2823x larger

### simple_logging ###
Min: 0.326569 -> 0.416797: 1.28x slower
Avg: 0.331694 -> 0.418844: 1.26x slower
Significant (t=-36.32)
Stddev: 0.00523 -> 0.00122: 4.3004x smaller

### spectral_norm ###
Min: 0.483011 -> 0.741558: 1.54x slower
Avg: 0.487128 -> 0.749741: 1.54x slower
Significant (t=-57.40)
Stddev: 0.00512 -> 0.00886: 1.7299x larger

### startup_nosite ###
Min: 0.220444 -> 0.374521: 1.70x slower
Avg: 0.222773 -> 0.376785: 1.69x slower
Significant (t=-176.17)
Stddev: 0.00166 -> 0.00221: 1.3331x larger

### threaded_count ###
Min: 0.171352 -> 0.151892: 1.13x faster
Avg: 0.183180 -> 0.153634: 1.19x faster
Significant (t=8.12)
Stddev: 0.00801 -> 0.00140: 5.7241x smaller

### unpack_sequence ###
Min: 0.000075 -> 0.000061: 1.23x faster
Avg: 0.000101 -> 0.000065: 1.54x faster
Significant (t=206.90)
Stddev: 0.00001 -> 0.00000: 3.2374x smaller

The following not significant results are hidden, use -v to show them:
chameleon, fannkuch, fastunpickle, regex_compile, silent_logging

### django ###
Min: 0.868956 -> 0.894571: 1.03x slower
Avg: 0.873620 -> 0.905274: 1.04x slower
Significant (t=-6.97)
Stddev: 0.00313 -> 0.00966: 3.0912x larger

### genshi ###
Min: 0.269615 -> 0.286348: 1.06x slower
Avg: 0.272206 -> 0.290708: 1.07x slower
Significant (t=-12.29)
Stddev: 0.00253 -> 0.00526: 2.0793x larger

### html5lib ###
12.279808 -> 11.862586: 1.04x faster


brett at python

Sep 30, 2012, 4:50 PM

Post #2 of 16 (3863 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

I accidentally left out the telco benchmark, which is bad since cdecimal
makes it just scream on Python 3.3 (and I verified with Python 3.2 that
this is an actual speedup and not some silly screw-up like I initially had
with spectral_norm):

### telco ###
Min: 0.897108 -> 0.016880: 53.15x faster
Avg: 0.899742 -> 0.017443: 51.58x faster
Significant (t=692.55)
Stddev: 0.00283 -> 0.00032: 8.8470x smaller

On Sun, Sep 30, 2012 at 7:12 PM, Brett Cannon <brett [at] python> wrote:

> I am presenting the talk "Python 3.3: Trust Me, It's Better Than 2.7" as
> PyCon Argentina and Brasil (and US if they accept the talk). As part of
> that talk I need to be able to benchmark Python 3.3 against 2.7 (both from
> tip) using the unladen benchmarks (which now include benchmarks from PyPy
> that can be relatively easily ported to Python 3).
>
> To make sure the unladen benchmarks run fine against Python 3.3, I did a
> fast run of the benchmarks. I figured people might be interested in the
> quick-and-dirty results on my 2 GHz Intel Core i7 MacBook Pro w/ 8 GB RAM
> and no attempt to control for performance beyond not actively browsing the
> web. As I said, quick-and-dirty and not authoritative; all done just to
> make sure all the benchmarks could run to completion (including the django,
> html5lib, and genshi benchmarks which are only on my laptop ATM until those
> projects cut a release with official Python 3 support).
>
> One thing to keep in mind is that many benchmarks use a raw str for
> things, so the benchmarks often compare Python 2.7 str vs. Python 3.3 str
> (i.e. str vs. unicode). While this might seem unfair, this is what
> real-world comparisons in performance will be from users so it's an
> (somewhat unfair) comparison that we just have to live with. I might take
> the time to try to make some tests run under both raw strings and unicode
> so both comparisons are available.
>
> If you care about helping out with the benchmarks (e.g. helping spot where
> the iteration counts should be higher, etc.) then head over to the speed [at] mailin list.
>
>
>
> > python3 perf.py -T --basedir ../benchmarks -f -b py3k
> ../cpython/builds/2.7-wide/bin/python ../cpython/builds/3.3/bin/python3.3
>
> ... output about the command line for the benchmarks ...
>
> ### 2to3 ###
> 0.785234 -> 0.722169: 1.09x faster
>
> ### call_method ###
> Min: 0.491433 -> 0.414841: 1.18x faster
> Avg: 0.493640 -> 0.416564: 1.19x faster
> Significant (t=127.21)
> Stddev: 0.00170 -> 0.00162: 1.0513x smaller
>
> ### call_method_slots ###
> Min: 0.492749 -> 0.416280: 1.18x faster
> Avg: 0.497888 -> 0.419275: 1.19x faster
> Significant (t=61.72)
> Stddev: 0.00433 -> 0.00237: 1.8304x smaller
>
> ### call_method_unknown ###
> Min: 0.575536 -> 0.427234: 1.35x faster
> Avg: 0.577286 -> 0.433428: 1.33x faster
> Significant (t=66.09)
> Stddev: 0.00117 -> 0.00835: 7.1621x larger
>
> ### call_simple ###
> Min: 0.413011 -> 0.338923: 1.22x faster
> Avg: 0.415862 -> 0.340699: 1.22x faster
> Significant (t=111.94)
> Stddev: 0.00223 -> 0.00134: 1.6616x smaller
>
> ### chaos ###
> Min: 0.375286 -> 0.435456: 1.16x slower
> Avg: 0.382798 -> 0.459515: 1.20x slower
> Significant (t=-5.01)
> Stddev: 0.01116 -> 0.03234: 2.8980x larger
>
> ### fastpickle ###
> Min: 0.853560 -> 0.770580: 1.11x faster
> Avg: 0.879498 -> 0.776249: 1.13x faster
> Significant (t=8.24)
> Stddev: 0.02771 -> 0.00407: 6.7995x smaller
>
> ### float ###
> Min: 0.476596 -> 0.391101: 1.22x faster
> Avg: 0.486164 -> 0.411553: 1.18x faster
> Significant (t=9.07)
> Stddev: 0.01049 -> 0.01511: 1.4411x larger
>
> ### formatted_logging ###
> Min: 0.346703 -> 0.451643: 1.30x slower
> Avg: 0.351218 -> 0.454626: 1.29x slower
> Significant (t=-51.50)
> Stddev: 0.00376 -> 0.00246: 1.5265x smaller
>
> ### genshi ###
> Min: 0.275107 -> 0.294309: 1.07x slower
> Avg: 0.287433 -> 0.299026: 1.04x slower
> Significant (t=-3.82)
> Stddev: 0.01077 -> 0.00467: 2.3044x smaller
>
> ### go ###
> Min: 0.719160 -> 0.781042: 1.09x slower
> Avg: 0.729322 -> 0.798135: 1.09x slower
> Significant (t=-8.54)
> Stddev: 0.01300 -> 0.01248: 1.0415x smaller
>
> ### hexiom2 ###
> 203.842661 -> 187.107363: 1.09x faster
>
> ### iterative_count ###
> Min: 0.145088 -> 0.153285: 1.06x slower
> Avg: 0.146369 -> 0.154425: 1.06x slower
> Significant (t=-9.21)
> Stddev: 0.00134 -> 0.00142: 1.0569x larger
>
> ### json_dump_v2 ###
> Min: 3.512367 -> 4.040813: 1.15x slower
> Avg: 3.521879 -> 4.057966: 1.15x slower
> Significant (t=-64.29)
> Stddev: 0.01071 -> 0.01526: 1.4247x larger
>
> ### json_load ###
> Min: 1.024560 -> 0.642353: 1.60x faster
> Avg: 1.025255 -> 0.644000: 1.59x faster
> Significant (t=426.59)
> Stddev: 0.00049 -> 0.00194: 3.9240x larger
>
> ### mako_v2 ###
> Min: 0.137584 -> 0.287701: 2.09x slower
> Avg: 0.140620 -> 0.293204: 2.09x slower
> Significant (t=-296.14)
> Stddev: 0.00243 -> 0.00272: 1.1195x larger
>
> ### meteor_contest ###
> Min: 0.284739 -> 0.254285: 1.12x faster
> Avg: 0.286174 -> 0.255323: 1.12x faster
> Significant (t=38.02)
> Stddev: 0.00124 -> 0.00133: 1.0725x larger
>
> ### nbody ###
> Min: 0.491416 -> 0.336127: 1.46x faster
> Avg: 0.493339 -> 0.337467: 1.46x faster
> Significant (t=185.50)
> Stddev: 0.00164 -> 0.00092: 1.7927x smaller
>
> ### normal_startup ###
> Min: 0.639285 -> 0.898157: 1.40x slower
> Avg: 0.645513 -> 0.901586: 1.40x slower
> Significant (t=-90.10)
> Stddev: 0.00575 -> 0.00270: 2.1309x smaller
>
> ### nqueens ###
> Min: 0.399351 -> 0.429575: 1.08x slower
> Avg: 0.403643 -> 0.430284: 1.07x slower
> Significant (t=-9.83)
> Stddev: 0.00603 -> 0.00053: 11.3092x smaller
>
> ### pathlib ###
> Min: 0.137462 -> 0.170506: 1.24x slower
> Avg: 0.145370 -> 0.172849: 1.19x slower
> Significant (t=-11.09)
> Stddev: 0.01232 -> 0.00128: 9.6403x smaller
>
> ### pidigits ###
> Min: 0.400265 -> 0.379307: 1.06x faster
> Avg: 0.401755 -> 0.381171: 1.05x faster
> Significant (t=14.65)
> Stddev: 0.00259 -> 0.00178: 1.4496x smaller
>
> ### raytrace ###
> Min: 1.770596 -> 1.958350: 1.11x slower
> Avg: 1.773719 -> 1.968401: 1.11x slower
> Significant (t=-44.19)
> Stddev: 0.00439 -> 0.00882: 2.0099x larger
>
> ### regex_effbot ###
> Min: 0.076566 -> 0.098124: 1.28x slower
> Avg: 0.077491 -> 0.098696: 1.27x slower
> Significant (t=-54.47)
> Stddev: 0.00052 -> 0.00069: 1.3227x larger
>
> ### regex_v8 ###
> Min: 0.091530 -> 0.109116: 1.19x slower
> Avg: 0.092308 -> 0.113627: 1.23x slower
> Significant (t=-5.72)
> Stddev: 0.00088 -> 0.00829: 9.4271x larger
>
> ### richards ###
> Min: 0.257974 -> 0.232134: 1.11x faster
> Avg: 0.259248 -> 0.234325: 1.11x faster
> Significant (t=23.80)
> Stddev: 0.00144 -> 0.00185: 1.2823x larger
>
> ### simple_logging ###
> Min: 0.326569 -> 0.416797: 1.28x slower
> Avg: 0.331694 -> 0.418844: 1.26x slower
> Significant (t=-36.32)
> Stddev: 0.00523 -> 0.00122: 4.3004x smaller
>
> ### spectral_norm ###
> Min: 0.483011 -> 0.741558: 1.54x slower
> Avg: 0.487128 -> 0.749741: 1.54x slower
> Significant (t=-57.40)
> Stddev: 0.00512 -> 0.00886: 1.7299x larger
>
> ### startup_nosite ###
> Min: 0.220444 -> 0.374521: 1.70x slower
> Avg: 0.222773 -> 0.376785: 1.69x slower
> Significant (t=-176.17)
> Stddev: 0.00166 -> 0.00221: 1.3331x larger
>
> ### threaded_count ###
> Min: 0.171352 -> 0.151892: 1.13x faster
> Avg: 0.183180 -> 0.153634: 1.19x faster
> Significant (t=8.12)
> Stddev: 0.00801 -> 0.00140: 5.7241x smaller
>
> ### unpack_sequence ###
> Min: 0.000075 -> 0.000061: 1.23x faster
> Avg: 0.000101 -> 0.000065: 1.54x faster
> Significant (t=206.90)
> Stddev: 0.00001 -> 0.00000: 3.2374x smaller
>
> The following not significant results are hidden, use -v to show them:
> chameleon, fannkuch, fastunpickle, regex_compile, silent_logging
>
> ### django ###
> Min: 0.868956 -> 0.894571: 1.03x slower
> Avg: 0.873620 -> 0.905274: 1.04x slower
> Significant (t=-6.97)
> Stddev: 0.00313 -> 0.00966: 3.0912x larger
>
> ### genshi ###
> Min: 0.269615 -> 0.286348: 1.06x slower
> Avg: 0.272206 -> 0.290708: 1.07x slower
> Significant (t=-12.29)
> Stddev: 0.00253 -> 0.00526: 2.0793x larger
>
> ### html5lib ###
> 12.279808 -> 11.862586: 1.04x faster
>


alexandre at peadrop

Sep 30, 2012, 5:07 PM

Post #3 of 16 (3858 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Sun, Sep 30, 2012 at 4:50 PM, Brett Cannon <brett [at] python> wrote:

> I accidentally left out the telco benchmark, which is bad since cdecimal
> makes it just scream on Python 3.3 (and I verified with Python 3.2 that
> this is an actual speedup and not some silly screw-up like I initially had
> with spectral_norm):


You could also make the pickle benchmark use the C accelerator module by
passing the --use_cpickle flag. The Python 3 version should be a lot faster.


greg at krypto

Sep 30, 2012, 5:14 PM

Post #4 of 16 (3859 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

Interesting results!

Another data point for the benchmarks that would be interesting is memory
consumption of the python process during the runs.

In 3.3 a reasonable place to gather this would be to add a callback to the
new gc.callbacks and save a snapshot of the process's memory usage before
every collection to gather peak, average and median usage over the life of
the process. 2.7 doesn't have this feature but there is a backport of this
to 2.7 in the bugtracker.

I guess I should join speed@ :)

-gps


solipsis at pitrou

Sep 30, 2012, 5:28 PM

Post #5 of 16 (3855 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Sun, 30 Sep 2012 19:12:47 -0400
Brett Cannon <brett [at] python> wrote:
>
> ### mako_v2 ###
> Min: 0.137584 -> 0.287701: 2.09x slower
> Avg: 0.140620 -> 0.293204: 2.09x slower
> Significant (t=-296.14)
> Stddev: 0.00243 -> 0.00272: 1.1195x larger

Note that Mako can use the Markupsafe library for faster operation.
This will skew the result if one of your Pythons has Markupsafe
installed and the other does not.

Perhaps the benchmark runner should launch its subtests in a controlled
environment to avoid such issues?
(shipping a copy of Markupsafe would not be very practical, because it
has a C extension)

Regards

Antoine.


--
Software development and contracting: http://pro.pitrou.net


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


steve at pearwood

Sep 30, 2012, 6:35 PM

Post #6 of 16 (3852 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Sun, Sep 30, 2012 at 07:12:47PM -0400, Brett Cannon wrote:

> > python3 perf.py -T --basedir ../benchmarks -f -b py3k
> ../cpython/builds/2.7-wide/bin/python ../cpython/builds/3.3/bin/python3.3

> ### call_method ###
> Min: 0.491433 -> 0.414841: 1.18x faster
> Avg: 0.493640 -> 0.416564: 1.19x faster
> Significant (t=127.21)
> Stddev: 0.00170 -> 0.00162: 1.0513x smaller

I'm not sure if this is the right place to discuss this, but what is the
justification for recording the average and std deviation of the
benchmarks?

If the benchmarks are based on timeit, the timeit docs warn against
taking any statistic other than the minimum.



--
Steven
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


brett at python

Sep 30, 2012, 6:40 PM

Post #7 of 16 (3855 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Sun, Sep 30, 2012 at 8:07 PM, Alexandre Vassalotti <alexandre [at] peadrop
> wrote:

>
>
> On Sun, Sep 30, 2012 at 4:50 PM, Brett Cannon <brett [at] python> wrote:
>
>> I accidentally left out the telco benchmark, which is bad since cdecimal
>> makes it just scream on Python 3.3 (and I verified with Python 3.2 that
>> this is an actual speedup and not some silly screw-up like I initially had
>> with spectral_norm):
>
>
> You could also make the pickle benchmark use the C accelerator module by
> passing the --use_cpickle flag. The Python 3 version should be a lot faster.
>

perf.py already uses --use_cpickle:

Running fastpickle...
INFO:root:Running ../cpython/builds/3.3/bin/python3.3
performance/bm_pickle.py -n 5 --use_cpickle pickle
INFO:root:Running ../cpython/builds/2.7-wide/bin/python
performance/bm_pickle.py -n 5 --use_cpickle pickle

One thing that might make a change is using -1 for the protocol instead of
2, but that means losing the perk of perf.py doing all of the calculations,
etc.


brett at python

Sep 30, 2012, 6:41 PM

Post #8 of 16 (3852 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Sun, Sep 30, 2012 at 8:14 PM, Gregory P. Smith <greg [at] krypto> wrote:

> Interesting results!
>
> Another data point for the benchmarks that would be interesting is memory
> consumption of the python process during the runs.
>
> In 3.3 a reasonable place to gather this would be to add a callback to the
> new gc.callbacks and save a snapshot of the process's memory usage before
> every collection to gather peak, average and median usage over the life of
> the process. 2.7 doesn't have this feature but there is a backport of this
> to 2.7 in the bugtracker.
>
> I guess I should join speed@ :)
>
>
There is already support in perf.py to track memory:

-m, --track_memory Track memory usage. This only works on Linux.


brett at python

Sep 30, 2012, 6:49 PM

Post #9 of 16 (3858 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Sun, Sep 30, 2012 at 8:28 PM, Antoine Pitrou <solipsis [at] pitrou> wrote:

> On Sun, 30 Sep 2012 19:12:47 -0400
> Brett Cannon <brett [at] python> wrote:
> >
> > ### mako_v2 ###
> > Min: 0.137584 -> 0.287701: 2.09x slower
> > Avg: 0.140620 -> 0.293204: 2.09x slower
> > Significant (t=-296.14)
> > Stddev: 0.00243 -> 0.00272: 1.1195x larger
>
> Note that Mako can use the Markupsafe library for faster operation.
> This will skew the result if one of your Pythons has Markupsafe
> installed and the other does not.
>

Should probably have the benchmark print out a warning when markupsafe is
used. Turns out I have it installed in my user directory for Python 2.7 so
that probably came into play.


>
> Perhaps the benchmark runner should launch its subtests in a controlled
> environment to avoid such issues?
>

If we had venv in Python 2.7 that might be easy to do, but otherwise is
there an easy way without having to try to pull in virtualenv or something
crazy like a chroot or something?

-Brett


(shipping a copy of Markupsafe would not be very practical, because it
> has a C extension)
>
> Regards
>
> Antoine.
>
>
> --
> Software development and contracting: http://pro.pitrou.net
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>


brett at python

Sep 30, 2012, 6:51 PM

Post #10 of 16 (3850 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Sun, Sep 30, 2012 at 9:35 PM, Steven D'Aprano <steve [at] pearwood>wrote:

> On Sun, Sep 30, 2012 at 07:12:47PM -0400, Brett Cannon wrote:
>
> > > python3 perf.py -T --basedir ../benchmarks -f -b py3k
> > ../cpython/builds/2.7-wide/bin/python ../cpython/builds/3.3/bin/python3.3
>
> > ### call_method ###
> > Min: 0.491433 -> 0.414841: 1.18x faster
> > Avg: 0.493640 -> 0.416564: 1.19x faster
> > Significant (t=127.21)
> > Stddev: 0.00170 -> 0.00162: 1.0513x smaller
>
> I'm not sure if this is the right place to discuss this,


The speed mailing list would be best.


> but what is the
> justification for recording the average and std deviation of the
> benchmarks?
>

Because the tests, when run in a more rigorous fashion, run many more
iterations so the average is used to even out bumps thanks to executing,
e.g. 50 times. And the stddev is there to know how variable the results
were in the end.


>
> If the benchmarks are based on timeit, the timeit docs warn against
> taking any statistic other than the minimum.
>

They don't use timeit.


robertc at robertcollins

Sep 30, 2012, 10:07 PM

Post #11 of 16 (3844 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Mon, Oct 1, 2012 at 2:35 PM, Steven D'Aprano <steve [at] pearwood> wrote:
> On Sun, Sep 30, 2012 at 07:12:47PM -0400, Brett Cannon wrote:
>
>> > python3 perf.py -T --basedir ../benchmarks -f -b py3k
>> ../cpython/builds/2.7-wide/bin/python ../cpython/builds/3.3/bin/python3.3
>
>> ### call_method ###
>> Min: 0.491433 -> 0.414841: 1.18x faster
>> Avg: 0.493640 -> 0.416564: 1.19x faster
>> Significant (t=127.21)
>> Stddev: 0.00170 -> 0.00162: 1.0513x smaller
>
> I'm not sure if this is the right place to discuss this, but what is the
> justification for recording the average and std deviation of the
> benchmarks?
>
> If the benchmarks are based on timeit, the timeit docs warn against
> taking any statistic other than the minimum.

Also because timeit is wrong to give that recommendation.

There are factors - such as garbage collection - that affect
operations on average, even though they may not kick in in every run.
If you want to know how something will perform as part of a larger
system, taking the best possible and extrapolating from it is a
mistake. As a concrete example, consider an algorithm that generates
cycles with several hundred MB of memory in them. Best case the RAM is
available, nothing swaps, and gc doesn't kick in during the
algorithm's execution. However, the larger program has to deal with
those several hundred MB of memory sitting around until gc *does* kick
in, has to pay the price of a gc run over a large heap, and deal with
the impact on disk read cache. When you do enough runs to see those
effects *that will affect the whole program* kick in, then you can
extrapolate from that basis. e.g. the question timeit optimises itself
to answer isn't the question most folk need most of the time.

-Rob
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


fijall at gmail

Oct 1, 2012, 12:43 AM

Post #12 of 16 (3847 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Mon, Oct 1, 2012 at 7:07 AM, Robert Collins
<robertc [at] robertcollins> wrote:
> On Mon, Oct 1, 2012 at 2:35 PM, Steven D'Aprano <steve [at] pearwood> wrote:
>> On Sun, Sep 30, 2012 at 07:12:47PM -0400, Brett Cannon wrote:
>>
>>> > python3 perf.py -T --basedir ../benchmarks -f -b py3k
>>> ../cpython/builds/2.7-wide/bin/python ../cpython/builds/3.3/bin/python3.3
>>
>>> ### call_method ###
>>> Min: 0.491433 -> 0.414841: 1.18x faster
>>> Avg: 0.493640 -> 0.416564: 1.19x faster
>>> Significant (t=127.21)
>>> Stddev: 0.00170 -> 0.00162: 1.0513x smaller
>>
>> I'm not sure if this is the right place to discuss this, but what is the
>> justification for recording the average and std deviation of the
>> benchmarks?
>>
>> If the benchmarks are based on timeit, the timeit docs warn against
>> taking any statistic other than the minimum.
>
> Also because timeit is wrong to give that recommendation.
>
> There are factors - such as garbage collection - that affect
> operations on average, even though they may not kick in in every run.
> If you want to know how something will perform as part of a larger
> system, taking the best possible and extrapolating from it is a
> mistake. As a concrete example, consider an algorithm that generates
> cycles with several hundred MB of memory in them. Best case the RAM is
> available, nothing swaps, and gc doesn't kick in during the
> algorithm's execution. However, the larger program has to deal with
> those several hundred MB of memory sitting around until gc *does* kick
> in, has to pay the price of a gc run over a large heap, and deal with
> the impact on disk read cache. When you do enough runs to see those
> effects *that will affect the whole program* kick in, then you can
> extrapolate from that basis. e.g. the question timeit optimises itself
> to answer isn't the question most folk need most of the time.
>
> -Rob
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/fijall%40gmail.com

Timeit disables the GC for good measure (which is very bad IMO, but it
was already discussed)
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


fijall at gmail

Oct 1, 2012, 12:44 AM

Post #13 of 16 (3853 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Mon, Oct 1, 2012 at 1:12 AM, Brett Cannon <brett [at] python> wrote:
> I am presenting the talk "Python 3.3: Trust Me, It's Better Than 2.7" as
> PyCon Argentina and Brasil (and US if they accept the talk). As part of that
> talk I need to be able to benchmark Python 3.3 against 2.7 (both from tip)
> using the unladen benchmarks (which now include benchmarks from PyPy that
> can be relatively easily ported to Python 3).
>

Hi Brett.

*If* you're talking about benchmarks, would be cool if you mention
that pypy is actually much faster on most of them. Also a very sad
fact is that a lot of actually interesting benchmarks don't work on
py3k (although a growing number). Twisted and sympy are very
informative for example
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Oct 1, 2012, 4:54 AM

Post #14 of 16 (3841 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Sun, 30 Sep 2012 21:49:20 -0400
Brett Cannon <brett [at] python> wrote:
> > Note that Mako can use the Markupsafe library for faster operation.
> > This will skew the result if one of your Pythons has Markupsafe
> > installed and the other does not.
> >
>
> Should probably have the benchmark print out a warning when markupsafe is
> used. Turns out I have it installed in my user directory for Python 2.7 so
> that probably came into play.
>
>
> >
> > Perhaps the benchmark runner should launch its subtests in a controlled
> > environment to avoid such issues?
> >
>
> If we had venv in Python 2.7 that might be easy to do, but otherwise is
> there an easy way without having to try to pull in virtualenv or something
> crazy like a chroot or something?

The mako benchmark could manually exclude markupsafe from sys.modules.
That only addresses that specific benchmark, though.

Regards

Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


brett at python

Oct 1, 2012, 5:23 AM

Post #15 of 16 (3849 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Mon, Oct 1, 2012 at 3:44 AM, Maciej Fijalkowski <fijall [at] gmail> wrote:

> On Mon, Oct 1, 2012 at 1:12 AM, Brett Cannon <brett [at] python> wrote:
> > I am presenting the talk "Python 3.3: Trust Me, It's Better Than 2.7" as
> > PyCon Argentina and Brasil (and US if they accept the talk). As part of
> that
> > talk I need to be able to benchmark Python 3.3 against 2.7 (both from
> tip)
> > using the unladen benchmarks (which now include benchmarks from PyPy that
> > can be relatively easily ported to Python 3).
> >
>
> Hi Brett.
>
> *If* you're talking about benchmarks, would be cool if you mention
> that pypy is actually much faster on most of them.


I will definitely mention that PyPy is actively working on Python 3 support
and people should help out where they can (whether it be technical or
financial) since PyPy will be faster than CPython in this regard and if you
needed a good chance to switch interpreters this would be it.

BTW, now that 3.3 is out is Antonio going to aim for 3.3 compatibility for
the initial release or stay back on 3.2?


> Also a very sad
> fact is that a lot of actually interesting benchmarks don't work on
> py3k (although a growing number). Twisted and sympy are very
> informative for example
>

As soon as those projects are ported we can obviously add those benchmarks.


brett at python

Oct 1, 2012, 5:23 AM

Post #16 of 16 (3842 views)
Permalink
Re: Benchmarking Python 3.3 against Python 2.7 (wide build) [In reply to]

On Mon, Oct 1, 2012 at 7:54 AM, Antoine Pitrou <solipsis [at] pitrou> wrote:

> On Sun, 30 Sep 2012 21:49:20 -0400
> Brett Cannon <brett [at] python> wrote:
> > > Note that Mako can use the Markupsafe library for faster operation.
> > > This will skew the result if one of your Pythons has Markupsafe
> > > installed and the other does not.
> > >
> >
> > Should probably have the benchmark print out a warning when markupsafe is
> > used. Turns out I have it installed in my user directory for Python 2.7
> so
> > that probably came into play.
> >
> >
> > >
> > > Perhaps the benchmark runner should launch its subtests in a controlled
> > > environment to avoid such issues?
> > >
> >
> > If we had venv in Python 2.7 that might be easy to do, but otherwise is
> > there an easy way without having to try to pull in virtualenv or
> something
> > crazy like a chroot or something?
>
> The mako benchmark could manually exclude markupsafe from sys.modules.
> That only addresses that specific benchmark, though)
>

Good point. Might be a good short term fix but it would be nice to have a
solution to prevent similar issues in the future.

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.