Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Proposal: Run GC less often

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


martin at v

Jun 21, 2008, 1:23 PM

Post #1 of 11 (2423 views)
Permalink
Proposal: Run GC less often

Here is my proposal for making the GC run less often.
The objective is to avoid the quadratic-time behavior
if many objects are created and none of them is garbage.

The youngest generation remains collected after 700
allocations, the middle generation after 10 young collections.
Therefore, full GC remains considered every 7000 allocations.
The two youngest generations don't cause quadratic behavior,
as the number of objects in each generation is bounded.

Currently, full GC is invoked every 10 middle collections.
Under my proposal, 10 middle collections must have passed,
PLUS the number of survivor objects from the middle generation
must exceed 10% of the number of objects in the oldest
generation.

As a consequence, garbage collection becomes less frequent
as the number of objects on the heap grows, resulting in
an amortized O(N) overhead for garbage collection (under
the access pattern stated above).

To determine the number of survivor objects, counts are
taken during the collection. Objects deallocated through
reference counting still remain accounted for as survivors
(likewise for determining the size of the oldest generation).

I made a simulation assuming an initially-empty heap,
and invocations of the collector every M=7000 objects.
The table below is in units of M and shows the number
of objects allocated, and the number of objects inspected
since the start of the simulation, for the old and the
new scheme (the asterisk indicates that a collection
took place; lines with no collection are dropped):

total old_inspected new_inspected
10 10* 10*
20 30* 30*
30 60* 60*
40 100* 100*
50 150* 150*
60 210* 210*
70 280* 280*
80 360* 360*
90 450* 450*
100 550* 450
102 550 552*
110 660* 552
115 660 667*
120 780* 667
129 780 796*
130 910* 796
140 1050* 796
...
940 44650* 7724
942 44650 8666*
950 45600* 8666
960 46560* 8666
970 47530* 8666
980 48510* 8666
990 49500* 8666
1000 50500* 8666
...
9990 4995000* 95887

As you can see, the old and the new scheme would start
of equally until 100M objects have been allocated. The
table shows how the old scheme grows quadratically, and
the new scheme eventually approaches roughly a factor
of ten between the number of objects and the number of
times that GC considers an object.

Applications with a small number of objects will see no
change in behavior, also, applications that create
lots of small cycles will continue to see them collected
in the youngest or middle generations.

Please let me know what you think.

Regards,
Martin

P.S. I don't plan to implement this myself, so if you like
the idea, go ahead.
Attachments: comp.py (0.70 KB)


brett at python

Jun 21, 2008, 1:54 PM

Post #2 of 11 (2368 views)
Permalink
Re: Proposal: Run GC less often [In reply to]

On Sat, Jun 21, 2008 at 1:23 PM, "Martin v. Löwis" <martin [at] v> wrote:
> Here is my proposal for making the GC run less often.
> The objective is to avoid the quadratic-time behavior
> if many objects are created and none of them is garbage.
>
[SNIP]
> Applications with a small number of objects will see no
> change in behavior, also, applications that create
> lots of small cycles will continue to see them collected
> in the youngest or middle generations.
>
> Please let me know what you think.
>

Interesting. Seems reasonable to me if the impact really is as minimal
as you say, Martin.

-Brett
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


adlaiff6 at gmail

Jun 21, 2008, 2:33 PM

Post #3 of 11 (2381 views)
Permalink
Re: Proposal: Run GC less often [In reply to]

If you can get me a version of the interpreter with this change made
(I wouldn't know what to change), I can run a very
allocation/deallocation-heavy application I have lying around, and get
you some good benchmarks.

On Sat, Jun 21, 2008 at 1:23 PM, "Martin v. Löwis" <martin [at] v> wrote:
- Show quoted text -
> Here is my proposal for making the GC run less often.
> The objective is to avoid the quadratic-time behavior
> if many objects are created and none of them is garbage.
>
> The youngest generation remains collected after 700
> allocations, the middle generation after 10 young collections.
> Therefore, full GC remains considered every 7000 allocations.
> The two youngest generations don't cause quadratic behavior,
> as the number of objects in each generation is bounded.
>
> Currently, full GC is invoked every 10 middle collections.
> Under my proposal, 10 middle collections must have passed,
> PLUS the number of survivor objects from the middle generation
> must exceed 10% of the number of objects in the oldest
> generation.
>
> As a consequence, garbage collection becomes less frequent
> as the number of objects on the heap grows, resulting in
> an amortized O(N) overhead for garbage collection (under
> the access pattern stated above).
>
> To determine the number of survivor objects, counts are
> taken during the collection. Objects deallocated through
> reference counting still remain accounted for as survivors
> (likewise for determining the size of the oldest generation).
>
> I made a simulation assuming an initially-empty heap,
> and invocations of the collector every M=7000 objects.
> The table below is in units of M and shows the number
> of objects allocated, and the number of objects inspected
> since the start of the simulation, for the old and the
> new scheme (the asterisk indicates that a collection
> took place; lines with no collection are dropped):
>
> total old_inspected new_inspected
> 10 10* 10*
> 20 30* 30*
> 30 60* 60*
> 40 100* 100*
> 50 150* 150*
> 60 210* 210*
> 70 280* 280*
> 80 360* 360*
> 90 450* 450*
> 100 550* 450
> 102 550 552*
> 110 660* 552
> 115 660 667*
> 120 780* 667
> 129 780 796*
> 130 910* 796
> 140 1050* 796
> ...
> 940 44650* 7724
> 942 44650 8666*
> 950 45600* 8666
> 960 46560* 8666
> 970 47530* 8666
> 980 48510* 8666
> 990 49500* 8666
> 1000 50500* 8666
> ...
> 9990 4995000* 95887
>
> As you can see, the old and the new scheme would start
> of equally until 100M objects have been allocated. The
> table shows how the old scheme grows quadratically, and
> the new scheme eventually approaches roughly a factor
> of ten between the number of objects and the number of
> times that GC considers an object.
>
> Applications with a small number of objects will see no
> change in behavior, also, applications that create
> lots of small cycles will continue to see them collected
> in the youngest or middle generations.
>
> Please let me know what you think.
>
> Regards,
> Martin
>
> P.S. I don't plan to implement this myself, so if you like
> the idea, go ahead.
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev [at] python
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: http://mail.python.org/mailman/options/python-dev/adlaiff6%40gmail.com
>
>



--
Cheers,
Leif
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


greg.ewing at canterbury

Jun 21, 2008, 6:16 PM

Post #4 of 11 (2361 views)
Permalink
Re: Proposal: Run GC less often [In reply to]

Martin v. Löwis wrote:

> Under my proposal, 10 middle collections must have passed,
> PLUS the number of survivor objects from the middle generation
> must exceed 10% of the number of objects in the oldest
> generation.

What happens if the program enters a phase where it's not
producing any new cyclic garbage, but is breaking references
among the old objects in such a way that cycles of them
are being left behind? Under this rule, the oldest
generation would never be scanned, so those cycles would
never be collected.

> As a consequence, garbage collection becomes less frequent
> as the number of objects on the heap grows

Wouldn't it be simpler just to base the collection frequency
directly on the total number of objects in the heap?
From what another poster said, this seems to be what
emacs does.

--
Greg
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Jun 21, 2008, 10:13 PM

Post #5 of 11 (2365 views)
Permalink
Re: Proposal: Run GC less often [In reply to]

> What happens if the program enters a phase where it's not
> producing any new cyclic garbage, but is breaking references
> among the old objects in such a way that cycles of them
> are being left behind? Under this rule, the oldest
> generation would never be scanned, so those cycles would
> never be collected.

Correct. However, this is the case already today.

> Wouldn't it be simpler just to base the collection frequency
> directly on the total number of objects in the heap?

Using what precise formula?

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Jun 22, 2008, 7:32 AM

Post #6 of 11 (2350 views)
Permalink
Re: Proposal: Run GC less often [In reply to]

Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:
>
> What happens if the program enters a phase where it's not
> producing any new cyclic garbage, but is breaking references
> among the old objects in such a way that cycles of them
> are being left behind? Under this rule, the oldest
> generation would never be scanned, so those cycles would
> never be collected.

We could introduce a kind of "timing rule" such that there is at least
one full collection, say, every minute. While timing is not relevant
to memory management, it is relevant to the user behind the keyboard.

In any case, I think MvL's suggestion is worth trying.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


nas at arctrix

Jun 22, 2008, 12:27 PM

Post #7 of 11 (2338 views)
Permalink
Re: Proposal: Run GC less often [In reply to]

Greg Ewing <greg.ewing [at] canterbury> wrote:
> Martin v. Löwis wrote:
>
>> Under my proposal, 10 middle collections must have passed,
>> PLUS the number of survivor objects from the middle generation
>> must exceed 10% of the number of objects in the oldest
>> generation.
>
> What happens if the program enters a phase where it's not
> producing any new cyclic garbage, but is breaking references
> among the old objects in such a way that cycles of them
> are being left behind? Under this rule, the oldest
> generation would never be scanned, so those cycles would
> never be collected.

Another problem is that the program could be slowly leaking and a
full collection will never happen.

>> As a consequence, garbage collection becomes less frequent
>> as the number of objects on the heap grows
>
> Wouldn't it be simpler just to base the collection frequency
> directly on the total number of objects in the heap?
> From what another poster said, this seems to be what
> emacs does.

I like simple. The whole generational collection scheme was dreamed
up by me early in the GC's life. There was not a lot of thought or
benchmarking put into it since at that time I was more focused on
getting the basic GC working. At some later point some tuning was
done on the collection frequencies but that 10 middle collections
scheme was never deeply investigated, AFAIK.

BTW, I suspect that documentation needs updating since I understand
that the GC is no longer optional (the stdlib and/or the Python
internals create reference cycles themselves).

Neil

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Jun 22, 2008, 1:36 PM

Post #8 of 11 (2348 views)
Permalink
Re: Proposal: Run GC less often [In reply to]

> Another problem is that the program could be slowly leaking and a
> full collection will never happen.

I don't think that will be possible. If the program slowly leaks,
survivor objects leave the middle generation, and account towards
the 10%. As the count of objects in the oldest generation doesn't
change, collection will eventually occur.

However, it may occur much later than it currently does, if you have
many objects on the heap, and each middle collection only has few
survivors. One may argue that if the machine had the space to keep
N objects in memory, it probably can also keep 1.1N objects in memory.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


greg.ewing at canterbury

Jun 22, 2008, 6:10 PM

Post #9 of 11 (2332 views)
Permalink
Re: Proposal: Run GC less often [In reply to]

Martin v. Löwis wrote:

>>Wouldn't it be simpler just to base the collection frequency
>>directly on the total number of objects in the heap?
>
> Using what precise formula?

The simplest thing to try would be

middle_collections >= num_objects_in_heap * some_constant

--
Greg
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Jun 22, 2008, 8:32 PM

Post #10 of 11 (2337 views)
Permalink
Re: Proposal: Run GC less often [In reply to]

>>> Wouldn't it be simpler just to base the collection frequency
>>> directly on the total number of objects in the heap?
>>
>> Using what precise formula?
>
> The simplest thing to try would be
>
> middle_collections >= num_objects_in_heap * some_constant

So what value is some_constant?

Regards,
Martin

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


tjreedy at udel

Jun 22, 2008, 11:13 PM

Post #11 of 11 (2329 views)
Permalink
Re: Proposal: Run GC less often [In reply to]

Neil Schemenauer wrote:
>
> BTW, I suspect that documentation needs updating since I understand
> that the GC is no longer optional (the stdlib and/or the Python
> internals create reference cycles themselves).

Is it possible and might it be useful for those internal cycle-creating
operations to increment a counter that was part of the gc trigger?
Doing millions of 'safe' operations would then leave the counter alone
and could have less effect in triggering gc.

tjr

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.