Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Re: Retrieve an arbitrary element from a setwithoutremoving it

 

 

First page Previous page 1 2 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded


steve at pearwood

Nov 9, 2009, 12:16 PM

Post #26 of 33 (589 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

On Tue, 10 Nov 2009 03:40:11 am Björn Lindqvist wrote:
> 2009/11/6 Raymond Hettinger <python [at] rcn>:
> > [me]
> >
> >> Why not write a short, fast get_first() function for your utils
> >> directory and be done with it?
> >> That could work with sets, mappings, generators, and other
> >> containers and iterators.
> >> No need to fatten the set/frozenset API for something so trivial
> >> and so rarely needed.
> >
> > Forgot to post the code.  It is short, fast, and easy.  It is
> > explicit about handing the case with an empty input.  And it is
> > specific about which value it returns (always the first iterated
> > value; not an arbitrary one).  There's no guessing about what it
> > does.  It gets the job done.
> >
> > def first(iterable):
> >   'Return the first value from a container or iterable.  If empty,
> > raises LookupError'
> >   for value in iterable:
> >       return value
> >   raise LookupError('no first value; iterable is empty')
> >
> > If desired, it is not hard to change to the last time to return a
> > default value or some other exception.
>
> That function is very nice and genericly lisp-like. Couldn't that one
> be in the builtins? It would both solve the problem and avoid
> fattening the set API.


I'm not sure, but isn't that thread-unsafe?

Because sets aren't directly iterable, `for value in iterable` builds a
set_iterator operator. If another thread modifies the set after the
set_iterator is built, but before the value is looked up, you will get
a mysterious RuntimeError almost impossible to debug.


>>> def first(): # simplified
... for value in iterator:
... return value
...
>>> dis.dis(first)
2 0 SETUP_LOOP 15 (to 18)
3 LOAD_GLOBAL 0 (iterator)
6 GET_ITER
7 FOR_ITER 7 (to 17)
10 STORE_FAST 0 (value)

3 13 LOAD_FAST 0 (value)
16 RETURN_VALUE
>> 17 POP_BLOCK
>> 18 LOAD_CONST 0 (None)
21 RETURN_VALUE


Is it possible for another thread to be called between the GET_ITER and
STORE_FAST?




--
Steven D'Aprano
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Nov 9, 2009, 1:50 PM

Post #27 of 33 (597 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

> So the rationale is to ensure that only add operations perform a resize
> and so that sequential pop() operations don't incur excessive resizing
> costs.

I agree that the use case of repeated .pop() operations is reasonable,
and (IIUC) that case is also special-cased using a finger/index.

I think for regular removal, the same logic should not apply: if a
series of removals is performed, then further (non-pop) removals
see increasing costs, as do regular lookups. So I think that a removal
should trigger shrinking (with appropriate thresholds) unless it's a
.pop.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Nov 9, 2009, 2:19 PM

Post #28 of 33 (596 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

> I'm not sure, but isn't that thread-unsafe?

You are right; it's thread-unsafe.

I would fix it by catching the RuntimeError, and retrying. Given the
current GIL strategy (including proposed changes to it), it won't happen
two times in a row, so the number of retries would be bounded.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Nov 9, 2009, 2:39 PM

Post #29 of 33 (586 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

> We repeatedly search through the slots sequentially and remove the first
> element we find. The first removal requires checking 1 slot, the second
> removal requires checking 2 slots, the third removal requires checking 3
> slots, and so forth. The hash table will shrink after the n/2-th
> removal, when we have checked 1 + 2 + 3 + ... + n/2 = O(n**2) slots for
> n/2 removals (or amortized O(n) per removal). It's too late for
> shrinking to save us; we've already performed too much work.

Ah, right.

Perhaps people writing the loop the way you proposed deserve to get bad
performance, as they should use .pop in the first place.

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


daniel at stutzbachenterprises

Nov 9, 2009, 2:44 PM

Post #30 of 33 (587 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

On Mon, Nov 9, 2009 at 3:50 PM, "Martin v. Löwis" <martin [at] v>wrote:

> I think for regular removal, the same logic should not apply: if a
> series of removals is performed, then further (non-pop) removals
> see increasing costs, as do regular lookups. So I think that a removal
> should trigger shrinking (with appropriate thresholds) unless it's a
> .pop.
>

Minor technicality, but it's the next() operation of the iterator that has
increasing cost as the set/dict becomes sparse. Removals are always O(1).
The removal uses the hash to find the item quickly. The iterator has to
scan through the table for non-empty entries.

(the above assumes a good hash function with few collisions, of course)

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>


ncoghlan at gmail

Nov 10, 2009, 2:51 AM

Post #31 of 33 (574 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

Martin v. Löwis wrote:
>> I'm not sure, but isn't that thread-unsafe?
>
> You are right; it's thread-unsafe.
>
> I would fix it by catching the RuntimeError, and retrying. Given the
> current GIL strategy (including proposed changes to it), it won't happen
> two times in a row, so the number of retries would be bounded.

It's also one of the major reasons for not sharing mutable containers
between threads if you can avoid it (and serialising access to them if
you can't)

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
---------------------------------------------------------------
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Nov 10, 2009, 5:55 AM

Post #32 of 33 (570 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

Nick Coghlan <ncoghlan <at> gmail.com> writes:
>
> It's also one of the major reasons for not sharing mutable containers
> between threads if you can avoid it (and serialising access to them if
> you can't)

Not necessarily, for example it is common to rely on the fact that list.append()
is atomic.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Nov 10, 2009, 6:27 AM

Post #33 of 33 (562 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

Antoine Pitrou wrote:
> Nick Coghlan <ncoghlan <at> gmail.com> writes:
>> It's also one of the major reasons for not sharing mutable containers
>> between threads if you can avoid it (and serialising access to them if
>> you can't)
>
> Not necessarily, for example it is common to rely on the fact that list.append()
> is atomic.

s/"mutable
containers"/"mutable-containers-that-object-loudly-to-their-size-changing-during-iteration
-like-sets-and-dicts" :)

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
---------------------------------------------------------------
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

First page Previous page 1 2 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.