Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903)

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


solipsis at pitrou

Apr 23, 2012, 1:22 PM

Post #1 of 12 (662 views)
Permalink
Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903)

On Mon, 23 Apr 2012 17:24:57 +0200
benjamin.peterson <python-checkins [at] python> wrote:
> http://hg.python.org/cpython/rev/6e5855854a2e
> changeset: 76485:6e5855854a2e
> user: Benjamin Peterson <benjamin [at] python>
> date: Mon Apr 23 11:24:50 2012 -0400
> summary:
> Implement PEP 412: Key-sharing dictionaries (closes #13903)

I hope someone can measure the results of this change on real-world
code. Benchmark results with http://hg.python.org/benchmarks/ are not
overly promising.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


rdmurray at bitdance

Apr 23, 2012, 2:55 PM

Post #2 of 12 (662 views)
Permalink
Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903) [In reply to]

On Mon, 23 Apr 2012 22:22:18 +0200, Antoine Pitrou <solipsis [at] pitrou> wrote:
> On Mon, 23 Apr 2012 17:24:57 +0200
> benjamin.peterson <python-checkins [at] python> wrote:
> > http://hg.python.org/cpython/rev/6e5855854a2e
> > changeset: 76485:6e5855854a2e
> > user: Benjamin Peterson <benjamin [at] python>
> > date: Mon Apr 23 11:24:50 2012 -0400
> > summary:
> > Implement PEP 412: Key-sharing dictionaries (closes #13903)
>
> I hope someone can measure the results of this change on real-world
> code. Benchmark results with http://hg.python.org/benchmarks/ are not
> overly promising.

I'm pretty sure that anything heavily using sqlalchemy will benefit,
so that would be a good place to look for a real-world benchmark.

--David
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


kristjan at ccpgames

Apr 24, 2012, 3:24 AM

Post #3 of 12 (648 views)
Permalink
Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903) [In reply to]

Probably any benchmark involving a large amount of object instances with non-trivial dictionaries.

Benchmarks should measure memory usage too, of course. Sadly that is not possible in standard
cPython. Our 2.7 branch has extensive patching to allow custom memory allocators to be used
(it even eliminates the explicit "malloc" calls used here and there in the code) and exposes some
functions, such as sys.getpymalloced(), useful for memory benchmarking.

Perhaps I should write about this on my blog. Updating the memory allocation macro layer in
cPython for embedding is something I'd be inclined to contribute, but it will involve a large amount
of bikeshedding, I'm sure :)

Btw, this is of great interest to me at the moment, our Shanghai engineers are screaming at the
memory waste incurred by dictionaries. A 10 item dictionary consumes 1/2k on 32 bits, did you
know this?

K

> -----Original Message-----
> From: python-dev-bounces+kristjan=ccpgames.com [at] python
> [mailto:python-dev-bounces+kristjan=ccpgames.com [at] python] On
> Behalf Of R. David Murray
> Sent: 23. apríl 2012 21:56
> To: Antoine Pitrou
> Cc: python-dev [at] python
> Subject: Re: [Python-Dev] cpython: Implement PEP 412: Key-sharing
> dictionaries (closes #13903)
>
> On Mon, 23 Apr 2012 22:22:18 +0200, Antoine Pitrou <solipsis [at] pitrou>
> wrote:
> > On Mon, 23 Apr 2012 17:24:57 +0200
> > benjamin.peterson <python-checkins [at] python> wrote:
> > > http://hg.python.org/cpython/rev/6e5855854a2e
> > > changeset: 76485:6e5855854a2e
> > > user: Benjamin Peterson <benjamin [at] python>
> > > date: Mon Apr 23 11:24:50 2012 -0400
> > > summary:
> > > Implement PEP 412: Key-sharing dictionaries (closes #13903)
> >
> > I hope someone can measure the results of this change on real-world
> > code. Benchmark results with http://hg.python.org/benchmarks/ are not
> > overly promising.
>
> I'm pretty sure that anything heavily using sqlalchemy will benefit, so that
> would be a good place to look for a real-world benchmark.
>
> --David
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-
> dev/kristjan%40ccpgames.com


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Apr 24, 2012, 3:37 AM

Post #4 of 12 (644 views)
Permalink
Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903) [In reply to]

On Tue, 24 Apr 2012 10:24:16 +0000
Kristján Valur Jónsson <kristjan [at] ccpgames> wrote:
>
> Btw, this is of great interest to me at the moment, our Shanghai engineers are screaming at the
> memory waste incurred by dictionaries. A 10 item dictionary consumes 1/2k on 32 bits, did you
> know this?

The sparseness of hash tables is a well-known time/space tradeoff.
See e.g. http://bugs.python.org/issue10408

Regards

Antoine.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Apr 24, 2012, 4:41 AM

Post #5 of 12 (645 views)
Permalink
Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903) [In reply to]

On Tue, Apr 24, 2012 at 8:24 PM, Kristján Valur Jónsson
<kristjan [at] ccpgames> wrote:
> Perhaps I should write about this on my blog.  Updating the memory allocation macro layer in
> cPython for embedding is something I'd be inclined to contribute, but it will involve a large amount
> of bikeshedding, I'm sure :)

Trawl the tracker before you do - I'm pretty sure there's a patch
(from the Nokia S60 port, IIRC) that adds a couple of macro
definitions so that platform ports and embedding applications can
intercept malloc() and free() calls.

It would be way out of date by now, but I seem to recall thinking it
looked reasonable at a quick glance.

Cheers,
Nick.

--
Nick Coghlan   |   ncoghlan [at] gmail   |   Brisbane, Australia
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Apr 24, 2012, 10:43 AM

Post #6 of 12 (633 views)
Permalink
Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903) [In reply to]

> Benchmarks should measure memory usage too, of course. Sadly that
> is not possible in standard cPython.

It's actually very easy in standard CPython, using sys.getsizeof.

> Btw, this is of great interest to me at the moment, our Shanghai
> engineers are screaming at the
> memory waste incurred by dictionaries. A 10 item dictionary
> consumes 1/2k on 32 bits, did you know this?

I did.

In Python 3.3, this now goes down to 248 bytes (32 bits).

Regards,
Martin


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


kristjan at ccpgames

Apr 25, 2012, 2:11 AM

Post #7 of 12 (631 views)
Permalink
Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903) [In reply to]

> -----Original Message-----
> From: python-dev-bounces+kristjan=ccpgames.com [at] python
> [mailto:python-dev-bounces+kristjan=ccpgames.com [at] python] On
> Behalf Of martin [at] v
> Sent: 24. apríl 2012 17:44
> To: python-dev [at] python
> Subject: Re: [Python-Dev] cpython: Implement PEP 412: Key-sharing
> dictionaries (closes #13903)
>
> > Benchmarks should measure memory usage too, of course. Sadly that is
> > not possible in standard cPython.
>
> It's actually very easy in standard CPython, using sys.getsizeof.
>
Yes, you can query each python object about how big it thinks it is.
What I'm speaking of is more like:
start_allocs, start_mem = allocator.get_current()
allocator.reset_limits()
run_complicated_tests()

end_allocs, end_mem = allocator.get=current()

Print "delta blocks: %d, delta mem: %d"%(end_allocs-start_allocs, end_mem-start_mem)
print "peak blocks: %d, peak mem: %d"%allocator.peak()


> > Btw, this is of great interest to me at the moment, our Shanghai
> > engineers are screaming at the memory waste incurred by dictionaries.
> > A 10 item dictionary consumes 1/2k on 32 bits, did you know this?
>
> I did.
>
> In Python 3.3, this now goes down to 248 bytes (32 bits).
>
I'm going to experiment with tunable parameters in 2.7 to trade performance for memory. In some applications, memory trumps performance.

K

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


mark at hotpy

Apr 25, 2012, 2:45 AM

Post #8 of 12 (635 views)
Permalink
Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903) [In reply to]

Kristján Valur Jónsson wrote:
>
>> -----Original Message-----
>> From: python-dev-bounces+kristjan=ccpgames.com [at] python
>> [mailto:python-dev-bounces+kristjan=ccpgames.com [at] python] On
>> Behalf Of martin [at] v
>> Sent: 24. apríl 2012 17:44
>> To: python-dev [at] python
>> Subject: Re: [Python-Dev] cpython: Implement PEP 412: Key-sharing
>> dictionaries (closes #13903)
>>
>>> Benchmarks should measure memory usage too, of course. Sadly that is
>>> not possible in standard cPython.
>> It's actually very easy in standard CPython, using sys.getsizeof.
>>
> Yes, you can query each python object about how big it thinks it is.
> What I'm speaking of is more like:
> start_allocs, start_mem = allocator.get_current()
> allocator.reset_limits()
> run_complicated_tests()
>
> end_allocs, end_mem = allocator.get=current()
>
> Print "delta blocks: %d, delta mem: %d"%(end_allocs-start_allocs, end_mem-start_mem)
> print "peak blocks: %d, peak mem: %d"%allocator.peak()

Take a look at the benchmark suite at
http://hg.python.org/benchmarks/
The test runner has an -m option that profiles memory usage,
you could take a look at how that is implemented

Cheers,
Mark.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


kristjan at ccpgames

Apr 25, 2012, 3:32 AM

Post #9 of 12 (632 views)
Permalink
Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903) [In reply to]

> -----Original Message-----
> Take a look at the benchmark suite at
> http://hg.python.org/benchmarks/
> The test runner has an -m option that profiles memory usage, you could take
> a look at how that is implemented
>

Yes, out of process monitoring of memory as reported by the OS. We do gather those counters as well on clients and servers.
But they don't give you the granularity you want when checking for memory leaks and memory usage by certain algorithms.
In the same way that the unittests have reference leak reports, they could just have memory usage reports, if the underlying allocator supported that.

FYI the current state of affairs of the cPython 2.7 branch we use is as follows:
1) We allow the API user to specify the base allocator python uses, both for regular allocs and allocating blocks for the obmalloc one, using:

/* Support for custom allocators */
typedef void *(*PyCCP_Malloc_t)(size_t size, void *arg, const char *file, int line, const char *msg);
typedef void *(*PyCCP_Realloc_t)(void *ptr, size_t size, void *arg, const char *file, int line, const char *msg);
typedef void (*PyCCP_Free_t)(void *ptr, void *arg, const char *file, int line, const char *msg);
typedef size_t (*PyCCP_Msize_t)(void *ptr, void *arg);
typedef struct PyCCP_CustomAllocator_t
{
PyCCP_Malloc_t pMalloc;
PyCCP_Realloc_t pRealloc;
PyCCP_Free_t pFree;
PyCCP_Msize_t pMsize; /* can be NULL, or return -1 if no size info is avail. */
void *arg; /* opaque argument for the functions */
} PyCCP_CustomAllocator_t;

/* To set an allocator! use 0 for the regular allocator, 1 for the block allocator.
* pass a null pointer to reset to internal default
*/
PyAPI_FUNC(void) PyCCP_SetAllocator(int which, const PyCCP_CustomAllocator_t *); /* for BLUE to set the current context */

/* internal data member */
extern PyCCP_CustomAllocator_t _PyCCP_CustomAllocator[];

2) using ifdefs, the macros will delegate all final allocations through these allocators. This includes all the "naked" malloc calls scattered about, they are patched up using #defines.

3) Additionally, there is an internal layer of management, before delegating to the external allocators. This internal manager provides statistics, exposed through the "sys" module.

The layering is something like this, all more or less definable by pre-processor macros. (raw malloc() is turned into something else via pre-processor magic and a special "patch_malloc.h" file added to the modules which uses raw malloc())

PyMem_Malloc() PyObject_Malloc()
| |
v v
Mem bookkeeping obj bookkeeping
| |
| v
malloc() | obmallocator
| | |
v v v
PyMem_MALLOC_RAW() PyObject_MALLOC_RAW
| |
v v
malloc() or vectored allocator specified through API function


Cheers,

K


martin at v

Apr 25, 2012, 11:57 AM

Post #10 of 12 (643 views)
Permalink
Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903) [In reply to]

>>> Benchmarks should measure memory usage too, of course. Sadly that is
>>> not possible in standard cPython.
>>
>> It's actually very easy in standard CPython, using sys.getsizeof.
>>
> Yes, you can query each python object about how big it thinks it is.
> What I'm speaking of is more like:
> start_allocs, start_mem = allocator.get_current()
> allocator.reset_limits()
> run_complicated_tests()
>
> end_allocs, end_mem = allocator.get=current()

This is easy in a debug build, using sys.getobjects(). In a release
build, you can use pympler:

start = pympler.muppy.get_size(pympler.muppy.get_objects())
run_complicated_tests()
end = pympler.muppy.get_size(pympler.muppy.get_objects())
print "delta mem: %d" % (end-start)

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


kristjan at ccpgames

Apr 26, 2012, 4:41 AM

Post #11 of 12 (631 views)
Permalink
Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903) [In reply to]

Thanks.
Meanwhile, I blogged about tuning the dict implementation.
Preliminary testing seems to indicate that tuning it to conserve memory saves us 2Mb of wasted slots on the login screen. No small thing on a PS3 system.
http://blog.ccpgames.com/kristjan/2012/04/25/optimizing-the-dict/
I wonder if we shouldn't make those factors into #defines as I did in my 2.7 modifications, and even provide a "memory saving" predefine for embedders.
(Believe it or not, sometimes python performance is not an issue at all, but memory usage is.)

K

> -----Original Message-----
> From: Nick Coghlan [mailto:ncoghlan [at] gmail]
> Sent: 24. apríl 2012 11:42
> To: Kristján Valur Jónsson
> Cc: R. David Murray; Antoine Pitrou; python-dev [at] python
> Subject: Re: [Python-Dev] cpython: Implement PEP 412: Key-sharing
> dictionaries (closes #13903)
>
> On Tue, Apr 24, 2012 at 8:24 PM, Kristján Valur Jónsson
> <kristjan [at] ccpgames> wrote:
> > Perhaps I should write about this on my blog.  Updating the memory
> > allocation macro layer in cPython for embedding is something I'd be
> > inclined to contribute, but it will involve a large amount of
> > bikeshedding, I'm sure :)
>
> Trawl the tracker before you do - I'm pretty sure there's a patch (from the
> Nokia S60 port, IIRC) that adds a couple of macro definitions so that platform
> ports and embedding applications can intercept malloc() and free() calls.
>
> It would be way out of date by now, but I seem to recall thinking it looked
> reasonable at a quick glance.
>
> Cheers,
> Nick.
>
> --
> Nick Coghlan   |   ncoghlan [at] gmail   |   Brisbane, Australia


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


kristjan at ccpgames

Apr 26, 2012, 6:26 AM

Post #12 of 12 (626 views)
Permalink
Re: cpython: Implement PEP 412: Key-sharing dictionaries (closes #13903) [In reply to]

> -----Original Message-----
> From: "Martin v. Löwis" [mailto:martin [at] v]
>
> This is easy in a debug build, using sys.getobjects(). In a release build, you can
> use pympler:
>
> start = pympler.muppy.get_size(pympler.muppy.get_objects())
> run_complicated_tests()
> end = pympler.muppy.get_size(pympler.muppy.get_objects())
> print "delta mem: %d" % (end-start)

Thanks for pointing out pympler to me. Sounds like fun, I'll try it out.
I should point out that gc.get_objects() also works, if you don't care about stuff like ints and floats.

Another reason why I like the runtime stats we have built in, however, is that they provide no query overhead.
You can query the current resource usage as often as you like and this is important in a running app. We log python memory usage every second or so.

Cheers,

K

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.