Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Bugs

[issue6594] json C serializer performance tied to structure depth on some systems

 

 

Python bugs RSS feed   Index | Next | Previous | View Threaded


report at bugs

Nov 19, 2009, 6:23 PM

Post #1 of 20 (477 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems

Valentin Kuznetsov <vkuznet [at] gmail> added the comment:

Hi,
I just found this bug and would like to add my experience with
performance of large JSON docs. I have a few JSON docs about 180MB in
size which I read from data-services. I use python2.6, run on Linux, 64-
bit node w/ 16GB of RAM and 8 core CPU, Intel Xeon 2.33GHz each. I used
both json and cjson modules to parse my documents. My observation that
the amount of RAM used to parse such docs is about 2GB, which is a way
too much. The total time spent about 30 seconds (using cjson). The
content of my docs are very mixed, lists, strings, other dicts. I can
provide them if it will be required, but it's 200MB :)

For comparison, I got the same data in XML and using
cElementTree.iterparse I stay w/ 300MB RAM usage per doc, which is
really reasonable to me.

I can provide some benchmarks and perform such tests if it will be
required.

----------
nosy: +vkuznet

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Nov 19, 2009, 6:35 PM

Post #2 of 20 (443 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Bob Ippolito <bob [at] redivi> added the comment:

Did you try the trunk of simplejson? It doesn't work quite the same way as
the current json module in Python 2.6+.

Without the data or a tool to produce data that causes the problem, there
isn't much I can do to help.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 2, 2009, 10:47 AM

Post #3 of 20 (388 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Valentin Kuznetsov <vkuznet [at] gmail> added the comment:

Hi,
I'm sorry for delay, I was busy. Here is a test data file:
http://www.lns.cornell.edu/~vk/files/mangled.json

Its size is 150 MB, 50MB less of original, due to scrambled values I was
forced to do.

The tests with stock json module in python 2.6.2 is 2GB
source = open('mangled.json', 'r')
data = json.load(source)

Using simplejson 2.0.9 from PyPi I saw the same performance, please note
_speedups.so C module was compiled.

Using cjson module, I observed 180MB of RAM utilization
source = open('mangled.json', 'r')
data = cjson.encode(source.read())

cjson is about 10 times faster!

I re-factor code which deals with XML version of the same data and I was
able to process it using cElementTree only using 20MB (!) of RAM.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 2, 2009, 2:19 PM

Post #4 of 20 (387 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Antoine Pitrou <pitrou [at] free> added the comment:

> Using cjson module, I observed 180MB of RAM utilization
> source = open('mangled.json', 'r')
> data = cjson.encode(source.read())
>
> cjson is about 10 times faster!

This is simply wrong. You should be using cjson.decode(), not
cjson.encode().
If you do so, you will see that cjson tajes as much as memory as
simplejson and is actually a bit slower.

Looking at your json file, I would have a couple of suggestions:
- don't quote integers and floats, so that they are decoded as Python
ints and floats rather than strings
- if the same structure is used a large number of times, don't use
objects, use lists instead

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 2, 2009, 2:32 PM

Post #5 of 20 (386 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Antoine Pitrou <pitrou [at] free> added the comment:

That said, it is possible to further improve json by reducing the number
of memory allocations and temporary copies. Here is an experimental
(meaning: not polished) patch which gains 40% in decoding speed in your
example (9 seconds versus 15).

We could also add an option to intern object keys when decoding (which
wins 400MB in your example); or, alternatively, have an internal "memo"
remembering already seen keys and avoiding duplicates.

----------
keywords: +patch
Added file: http://bugs.python.org/file15444/json-opts.patch

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 2, 2009, 2:53 PM

Post #6 of 20 (385 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Valentin Kuznetsov <vkuznet [at] gmail> added the comment:

Oops, that's explain why I saw such small memory usage with cjson. I
constructed tests on a fly.

Regarding the data structure. Unfortunately it's out of my hands. The
data comes from data-service. So, I can't do much and can only report to
developers.

I'll try your patch tomorrow. Obviously it's a huge gain, both in memory
footprint and CPU usage.

Thanks.
Valentin.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 4, 2009, 5:00 PM

Post #7 of 20 (368 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Antoine Pitrou <pitrou [at] free> added the comment:

Here is a new patch with an internal memo dict to reuse equal keys, and
some tests.

----------
stage: -> patch review
versions: +Python 3.2
Added file: http://bugs.python.org/file15450/json-opts2.patch

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 4, 2009, 5:03 PM

Post #8 of 20 (369 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Changes by Antoine Pitrou <pitrou [at] free>:


Removed file: http://bugs.python.org/file15444/json-opts.patch

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 7:02 AM

Post #9 of 20 (361 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Valentin Kuznetsov <vkuznet [at] gmail> added the comment:

Antoine,
indeed, both patches improved time and memory foot print. The latest
patch shows only 1.1GB RAM usage and is very fast. What's worry me
though, that memory is not released back to the system. Is this is the
case? I just added time.sleep after json.load and saw that once decoding
is done, the resident size still remain the same.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 7:25 AM

Post #10 of 20 (360 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Antoine Pitrou <pitrou [at] free> added the comment:

> Antoine,
> indeed, both patches improved time and memory foot print. The latest
> patch shows only 1.1GB RAM usage and is very fast. What's worry me
> though, that memory is not released back to the system. Is this is the
> case? I just added time.sleep after json.load and saw that once decoding
> is done, the resident size still remain the same.

Interesting. Does it release memory without the patch?

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 7:38 AM

Post #11 of 20 (361 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Valentin Kuznetsov <vkuznet [at] gmail> added the comment:

Nope, all three json's implementation do not release the memory. I used
your patched one, the one shipped with 2.6 and cjson. The one which comes
with 2.6, reach 2GB, then release 200MB and stays with 1.8GB during
sleep. The cjson reaches 1.5GB mark and stays there. But all three
release another 100-200MB just before the exit (one top cycle before
process disappear). I used sleep of 20 seconds, so I'm pretty sure memory
was not released during that time, since I watched the process with idle
CPU.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 7:43 AM

Post #12 of 20 (360 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Antoine Pitrou <pitrou [at] free> added the comment:

> Nope, all three json's implementation do not release the memory. I used
> your patched one, the one shipped with 2.6 and cjson. The one which comes
> with 2.6, reach 2GB, then release 200MB and stays with 1.8GB during
> sleep. The cjson reaches 1.5GB mark and stays there. But all three
> release another 100-200MB just before the exit (one top cycle before
> process disappear). I used sleep of 20 seconds, so I'm pretty sure memory
> was not released during that time, since I watched the process with idle
> CPU.

Do you destroy the decoded data, though? If you keep it in memory
there's no chance that a lot of memory will be released.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 7:51 AM

Post #13 of 20 (361 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Valentin Kuznetsov <vkuznet [at] gmail> added the comment:

I made data local, but adding del shows the same behavior.
This is the test

def test():
source = open('mangled.json', 'r')
data = json.load(source)
source.close()
del data
test()
time.sleep(20)

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 12:40 PM

Post #14 of 20 (359 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Shawn <swalker [at] opensolaris> added the comment:

The attached patch doubles write times for my particular case when
applied to simplejson trunk using python 2.6.2. Not good.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 12:43 PM

Post #15 of 20 (360 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Antoine Pitrou <pitrou [at] free> added the comment:

> The attached patch doubles write times for my particular case when
> applied to simplejson trunk using python 2.6.2. Not good.

What do you mean by "write times"? The patch only affects decoding.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 12:55 PM

Post #16 of 20 (359 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Shawn <swalker [at] opensolaris> added the comment:

You are right, an environment anomaly let me to falsely believe that
this had somehow affected encoding performance.

I had repeated the test many times with and without the patch using
simplejson trunk and wrongly concluded that the patch was to blame.

After correcting the environment, write performance returned to normal.

This patch seems to perform roughly the same for my decode cases, but
uses about 10-20MB less memory. My needs are far less than that of the
other poster.

However, this bug is about the serializer (encoder). So perhaps the
decode performance patch should be a separate bug?

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 1:01 PM

Post #17 of 20 (360 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Shawn <swalker [at] opensolaris> added the comment:

I've attached a sample JSON file that is much slower to write out on
some systems as described in the initial comment.

If you were to restructure the contents of this file into more of a tree
structure instead of the flat array structure it uses now, you will
notice that as the depth increases, serializer performance decreases
significantly.

----------
Added file: http://bugs.python.org/file15475/catalog.dependency.C.gz

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 3:09 PM

Post #18 of 20 (359 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Antoine Pitrou <pitrou [at] free> added the comment:

> However, this bug is about the serializer (encoder). So perhaps the
> decode performance patch should be a separate bug?

You're right, I've filed a separate bug for it: issue7451.

----------
stage: patch review -> needs patch

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 3:09 PM

Post #19 of 20 (359 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Changes by Antoine Pitrou <pitrou [at] free>:


Removed file: http://bugs.python.org/file15450/json-opts2.patch

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 3:41 PM

Post #20 of 20 (360 views)
Permalink
[issue6594] json C serializer performance tied to structure depth on some systems [In reply to]

Antoine Pitrou <pitrou [at] free> added the comment:

Your example takes 0.5s to dump here.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue6594>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

Python bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.