Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Bugs

[issue15596] pickle: Faster serialization of Unicode strings

 

 

Python bugs RSS feed   Index | Next | Previous | View Threaded


report at bugs

Aug 8, 2012, 3:38 PM

Post #1 of 8 (212 views)
Permalink
[issue15596] pickle: Faster serialization of Unicode strings

New submission from STINNER Victor:

Serialization of Unicode strings in the pickle module is suboptimal, especially for long strings.

Attached patch optimize the serialization thanks to new properties of Unicode strings (PEP 393):

* text (protocol 0): avoid any temporary buffer if the string is an ASCII or latin1 string without "\\" or "\n" character; otherwise use a small buffer of 64 KB (instead of two buffer)
* binary (protocol 1, 2): avoid any temporary buffer if string is an ASCII string or if the string is already available encoded as UTF-8

The current code for protocol 0 uses raw_unicode_escape() which is really suboptimal: it uses a first buffer to write the escape string, and then a new temporary buffer to store the buffer with the right size (instead of just calling _PyBytes_Resize).

----------
components: Library (Lib)
files: pickle_unicode.patch
keywords: patch
messages: 167730
nosy: alexandre.vassalotti, haypo, pitrou
priority: normal
severity: normal
status: open
title: pickle: Faster serialization of Unicode strings
type: performance
versions: Python 3.4
Added file: http://bugs.python.org/file26730/pickle_unicode.patch

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15596>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 8, 2012, 3:41 PM

Post #2 of 8 (201 views)
Permalink
[issue15596] pickle: Faster serialization of Unicode strings [In reply to]

STINNER Victor added the comment:

Oh, I forgot to explain that I initially wrote the patch to fix the following failure on our "bigmem" buildbot.

http://buildbot.python.org/all/builders/AMD64%20Ubuntu%20LTS%20bigmem%203.x/builds/165/steps/test/logs/stdio

======================================================================
ERROR: test_huge_str_32b (test.test_pickle.InMemoryPickleTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/support.py", line 1281, in wrapper
return f(self, maxsize)
File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/pickletester.py", line 1267, in test_huge_str_32b
pickled = self.dumps(data, protocol=proto)
File "/opt/python-bigmem/3.x.langa-bigmem/build/Lib/test/test_pickle.py", line 49, in dumps
return pickle.dumps(arg, protocol)
MemoryError

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15596>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 9, 2012, 10:10 AM

Post #3 of 8 (195 views)
Permalink
[issue15596] pickle: Faster serialization of Unicode strings [In reply to]

Antoine Pitrou added the comment:

Looks interesting. Can you post benchmark numbers?
(you can use the pickle tests from http://hg.python.org/benchmarks )

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15596>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 9, 2012, 2:49 PM

Post #4 of 8 (199 views)
Permalink
[issue15596] pickle: Faster serialization of Unicode strings [In reply to]

STINNER Victor added the comment:

Here is a benchmark comparing Python 3.3 without and with my patch

ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../default/python ../fasterpickle/python
Running fastpickle...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle
Running pickle_dict...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
Running pickle_list...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
Running slowpickle...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 pickle
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 pickle

Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64
Total CPU cores: 8

### fastpickle ###
Min: 0.530622 -> 0.332841: 1.59x faster
Avg: 0.539450 -> 0.336833: 1.60x faster
Significant (t=232.04)
Stddev: 0.00552 -> 0.00276: 2.0032x smaller
Timeline: b'http://tinyurl.com/dyu3vap'

The following not significant results are hidden, use -v to show them:
pickle_dict, pickle_list, slowpickle.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15596>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 9, 2012, 3:03 PM

Post #5 of 8 (197 views)
Permalink
[issue15596] pickle: Faster serialization of Unicode strings [In reply to]

STINNER Victor added the comment:

For your information, results of benchmark comparing Python 3.2 to 3.3:

ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../3.2/python ../default/python
Running fastpickle...
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle
Running pickle_dict...
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
Running pickle_list...
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
Running slowpickle...
INFO:root:Running ../default/python performance/bm_pickle.py -n 50 pickle
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 pickle

Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64
Total CPU cores: 8

### fastpickle ###
Min: 0.455842 -> 0.542103: 1.19x slower
Avg: 0.462334 -> 0.547271: 1.18x slower
Significant (t=-101.15)
Stddev: 0.00362 -> 0.00471: 1.3028x larger
Timeline: b'http://tinyurl.com/btr644x'

### pickle_dict ###
Min: 0.360125 -> 0.345850: 1.04x faster
Avg: 0.364019 -> 0.348431: 1.04x faster
Significant (t=30.84)
Stddev: 0.00308 -> 0.00181: 1.6973x smaller
Timeline: b'http://tinyurl.com/cd3ashu'

### pickle_list ###
Min: 0.803941 -> 0.584800: 1.37x faster
Avg: 0.811115 -> 0.589200: 1.38x faster
Significant (t=455.00)
Stddev: 0.00261 -> 0.00225: 1.1612x smaller
Timeline: b'http://tinyurl.com/8u4m2wf'

### slowpickle ###
Min: 0.409008 -> 0.461257: 1.13x slower
Avg: 0.413668 -> 0.466201: 1.13x slower
Significant (t=-115.31)
Stddev: 0.00236 -> 0.00219: 1.0772x smaller
Timeline: b'http://tinyurl.com/czrg5kf'

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15596>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 9, 2012, 3:08 PM

Post #6 of 8 (194 views)
Permalink
[issue15596] pickle: Faster serialization of Unicode strings [In reply to]

Alexandre Vassalotti added the comment:

Amazing! Though, it would probably be good idea to benchmarks non-ASCII strings as well.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15596>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 9, 2012, 3:08 PM

Post #7 of 8 (197 views)
Permalink
[issue15596] pickle: Faster serialization of Unicode strings [In reply to]

STINNER Victor added the comment:

Last one: Python 3.2 vs patched Python 3.3.

ned$ python3 perf.py -b fastpickle,pickle_dict,pickle_list,slowpickle ../3.2/python ../fasterpickle/python
Running fastpickle...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle
Running pickle_dict...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_dict
Running pickle_list...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 --use_cpickle pickle_list
Running slowpickle...
INFO:root:Running ../fasterpickle/python performance/bm_pickle.py -n 50 pickle
INFO:root:Running ../3.2/python performance/bm_pickle.py -n 50 pickle

Report on Linux ned 3.4.4-4.fc16.x86_64 #1 SMP Thu Jul 5 20:01:38 UTC 2012 x86_64 x86_64
Total CPU cores: 8

### fastpickle ###
Min: 0.470211 -> 0.322453: 1.46x faster
Avg: 0.475718 -> 0.328496: 1.45x faster
Significant (t=205.65)
Stddev: 0.00317 -> 0.00395: 1.2456x larger
Timeline: b'http://tinyurl.com/9qpphzp'

### pickle_dict ###
Min: 0.353965 -> 0.347959: 1.02x faster
Avg: 0.358980 -> 0.350596: 1.02x faster
Significant (t=10.44)
Stddev: 0.00545 -> 0.00160: 3.3956x smaller
Timeline: b'http://tinyurl.com/9pfeqf9'

### pickle_list ###
Min: 0.838222 -> 0.593497: 1.41x faster
Avg: 0.844636 -> 0.599491: 1.41x faster
Significant (t=296.53)
Stddev: 0.00520 -> 0.00267: 1.9521x smaller
Timeline: b'http://tinyurl.com/9rynvnv'

### slowpickle ###
Min: 0.408205 -> 0.458309: 1.12x slower
Avg: 0.413738 -> 0.463916: 1.12x slower
Significant (t=-53.85)
Stddev: 0.00263 -> 0.00604: 2.3019x larger
Timeline: b'http://tinyurl.com/coffkbg'

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15596>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Aug 10, 2012, 7:16 PM

Post #8 of 8 (195 views)
Permalink
[issue15596] pickle: Faster serialization of Unicode strings [In reply to]

Changes by Jesús Cea Avión <jcea [at] jcea>:


----------
nosy: +jcea

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue15596>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

Python bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.