Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Retrieve an arbitrary element from a set without removing it

 

 

First page Previous page 1 2 3 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded


w.richert at gmx

Oct 23, 2009, 2:32 AM

Post #1 of 61 (1392 views)
Permalink
Retrieve an arbitrary element from a set without removing it

Hi,

recently I wrote an algorithm, in which very often I had to get an arbitrary
element from a set without removing it.

Three possibilities came to mind:

1.
x = some_set.pop()
some_set.add(x)

2.
for x in some_set:
break

3.
x = iter(some_set).next()


Of course, the third should be the fastest. It nevertheless goes through all
the iterator creation stuff, which costs some time. I wondered, why the builtin
set does not provide a more direct and efficient way for retrieving some element
without removing it. Is there any reason for this?

I imagine something like

x = some_set.get()

or

x = some_set.pop(False)

and am thinking about providing a patch against setobject.c (preferring the
.get() solution being a stripped down pop()).

Before, I would like to know whether I have overlooked something or whether
this can be done in an already existing way.

Thanks,
wr

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


steve at pearwood

Oct 23, 2009, 3:54 AM

Post #2 of 61 (1348 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

On Fri, 23 Oct 2009 08:32:45 pm Willi Richert wrote:
> Hi,
>
> recently I wrote an algorithm, in which very often I had to get an
> arbitrary element from a set without removing it.
>
> Three possibilities came to mind:
...
> Of course, the third should be the fastest.

If you need one or more items chosen randomly without replacement, use a
list:

L = list(some_set)
x = random.choice(L)

If you don't need a randomly chosen item, merely an arbitrary item, you
can still use a list but avoid the call to random.choice:

x = list(some_set)[0]

> It nevertheless goes
> through all the iterator creation stuff, which costs some time. I
> wondered, why the builtin set does not provide a more direct and
> efficient way for retrieving some element without removing it. Is
> there any reason for this?

I can see it being useful to iterate over the entire set, without
removing anything. I can see it being useful to pop an arbitrary item,
and remove it. But I can't see the use for retrieving an arbitrary
item, leaving it in the set. What do you use this for?


> I imagine something like
>
> x = some_set.get()
>
> or
>
> x = some_set.pop(False)
>
> and am thinking about providing a patch against setobject.c
> (preferring the .get() solution being a stripped down pop()).

-1 on .pop(flag)

+0 on .get()



--
Steven D'Aprano
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


algorias at gmail

Oct 23, 2009, 10:04 AM

Post #3 of 61 (1344 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

2009/10/23 Willi Richert <w.richert [at] gmx>:
> Hi,
>
> recently I wrote an algorithm, in which very often I had to get an arbitrary
> element from a set without removing it.
>
> Three possibilities came to mind:
>
> 1.
> x = some_set.pop()
> some_set.add(x)
>
> 2.
> for x in some_set:
>        break
>
> 3.
> x = iter(some_set).next()
>
>
> Of course, the third should be the fastest. It nevertheless goes through all
> the iterator creation stuff, which costs some time. I wondered, why the builtin
> set does not provide a more direct and efficient way for retrieving some element
> without removing it. Is there any reason for this?
>
> I imagine something like
>
> x = some_set.get()
>


I see this as being useful for frozensets as well, where you can't get
an arbitrary element easily due to the obvious lack of .pop(). I ran
into this recently, when I had a frozenset that I knew had 1 element
(it was the difference between 2 other sets), but couldn't get to that
element easily (get the pun?)
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


john.arbash.meinel at gmail

Oct 23, 2009, 10:25 AM

Post #4 of 61 (1341 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

Vitor Bosshard wrote:
> 2009/10/23 Willi Richert <w.richert [at] gmx>:
>> Hi,
>>
>> recently I wrote an algorithm, in which very often I had to get an arbitrary
>> element from a set without removing it.
>>
>> Three possibilities came to mind:
>>
>> 1.
>> x = some_set.pop()
>> some_set.add(x)
>>
>> 2.
>> for x in some_set:
>> break
>>
>> 3.
>> x = iter(some_set).next()
>>
>>
>> Of course, the third should be the fastest. It nevertheless goes through all
>> the iterator creation stuff, which costs some time. I wondered, why the builtin
>> set does not provide a more direct and efficient way for retrieving some element
>> without removing it. Is there any reason for this?
>>
>> I imagine something like
>>
>> x = some_set.get()
>>
>
>
> I see this as being useful for frozensets as well, where you can't get
> an arbitrary element easily due to the obvious lack of .pop(). I ran
> into this recently, when I had a frozenset that I knew had 1 element
> (it was the difference between 2 other sets), but couldn't get to that
> element easily (get the pun?)

So in my testing (2) was actually the fastest. I assumed because .next()
was a function call overhead, while:
for x in some_set:
break

Was evaluated inline. It probably still has to call PyObject_GetIter,
however it doesn't have to create a stack frame for it.

This is what "timeit" tells me. All runs are of the form:
python -m timeit -s "s = set([10])" ...

0.101us "for x in s: break; x"
0.130us "for x in s: pass; x"
0.234us -s "n = next; i = iter" "x = n(i(s)); x"
0.248us "x = next(iter(s)); x"
0.341us "x = iter(s).next(); x"

So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x
faster than (iter(s).next()).
I was pretty surprised that it was 30% faster than "for x in s: pass". I
assume it has something to do with a potential "else:" statement?

Note that all of these are significantly < 1us. So this only matters if
it is something you are doing often.

I don't know your specific timings, but I would guess that:
for x in s: break

Is actually going to be faster than your
s.get()

Primarily because s.get() requires an attribute lookup. I would think
your version might be faster for:
stat2 = "g = s.get; for i in xrange(100): g()"

However, that is still a function call, which may be treated differently
by the interpreter than the for:break loop. I certainly suggest you try
it and compare.

John
=:->
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


p.f.moore at gmail

Oct 23, 2009, 11:13 AM

Post #5 of 61 (1341 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

2009/10/23 John Arbash Meinel <john.arbash.meinel [at] gmail>:
> I was pretty surprised that it was 30% faster than "for x in s: pass". I
> assume it has something to do with a potential "else:" statement?

I'd imagine it's actually because it has to call next() a second time
and deal with the StopIteration exception - the loop has to end
normally, whereas the break form exits prematurely.

Paul.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Oct 23, 2009, 12:01 PM

Post #6 of 61 (1341 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

Willi Richert wrote:
> recently I wrote an algorithm, in which very often I had to get an arbitrary
> element from a set without removing it.

See this discussion:

http://comments.gmane.org/gmane.comp.python.ideas/5606

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


tjreedy at udel

Oct 23, 2009, 12:04 PM

Post #7 of 61 (1341 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

John Arbash Meinel wrote:
> So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x
> faster than (iter(s).next()).
> I was pretty surprised that it was 30% faster than "for x in s: pass". I
> assume it has something to do with a potential "else:" statement?

for x in s: pass

iterates through *all* the elements in s and leaves x bound to the
arbritrary *last* one instead of the arbitrary *first* one. For a large
set, this would be a lot slower, not just a little.

fwiw, I think the use case for this is sufficiently rare that it does
not need a separate method just for this purpose.

tjr

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


john.arbash.meinel at gmail

Oct 23, 2009, 1:11 PM

Post #8 of 61 (1342 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

Terry Reedy wrote:
> John Arbash Meinel wrote:
>> So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x
>> faster than (iter(s).next()).
>> I was pretty surprised that it was 30% faster than "for x in s: pass". I
>> assume it has something to do with a potential "else:" statement?
>
> for x in s: pass
>
> iterates through *all* the elements in s and leaves x bound to the
> arbritrary *last* one instead of the arbitrary *first* one. For a large
> set, this would be a lot slower, not just a little.
>
> fwiw, I think the use case for this is sufficiently rare that it does
> not need a separate method just for this purpose.
>
> tjr

The point of my test was that it was a set with a *single* item, and
'break' was 30% faster than 'pass'. Which was surprising. Certainly the
difference is huge if there are 10k items in the set.

John
=:->

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


w.richert at gmx

Oct 23, 2009, 1:53 PM

Post #9 of 61 (1341 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

Hi,

surprised about the performance of for/break provided by Vitor, I did some
more testing. It revealed that indeed we can forget the get() (which was
implemented as a stripped down pop()):

from timeit import *
stats = [."for i in xrange(1000): iter(s).next() ",
"for i in xrange(1000): \n\tfor x in s: \n\t\tbreak ",
"for i in xrange(1000): s.add(s.pop()) ",
"for i in xrange(1000): s.get() "]

for stat in stats:
t = Timer(stat, setup="s=set(range(100))")
try:
print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
except:
t.print_exc()



$ ./test_get.py
Time for for i in xrange(1000): iter(s).next() : 0.433080
Time for for i in xrange(1000):
for x in s:
break : 0.148695
Time for for i in xrange(1000): s.add(s.pop()) : 0.317418
Time for for i in xrange(1000): s.get() : 0.146673


In some tests, for/break was even slightly faster then get().

As always, intuition regarding performance bottlenecks is flawed ;-)

Anyway, thanks for all the helpful comments, especially to Stefan for the
http://comments.gmane.org/gmane.comp.python.ideas/5606 link.

Regards,
wr

Am Freitag, 23. Oktober 2009 19:25:48 schrieb John Arbash Meinel:
> Vitor Bosshard wrote:
> > 2009/10/23 Willi Richert <w.richert [at] gmx>:
> >> Hi,
> >>
> >> recently I wrote an algorithm, in which very often I had to get an
> >> arbitrary element from a set without removing it.
> >>
> >> Three possibilities came to mind:
> >>
> >> 1.
> >> x = some_set.pop()
> >> some_set.add(x)
> >>
> >> 2.
> >> for x in some_set:
> >> break
> >>
> >> 3.
> >> x = iter(some_set).next()
> >>
> >>
> >> Of course, the third should be the fastest. It nevertheless goes through
> >> all the iterator creation stuff, which costs some time. I wondered, why
> >> the builtin set does not provide a more direct and efficient way for
> >> retrieving some element without removing it. Is there any reason for
> >> this?
> >>
> >> I imagine something like
> >>
> >> x = some_set.get()
> >
> > I see this as being useful for frozensets as well, where you can't get
> > an arbitrary element easily due to the obvious lack of .pop(). I ran
> > into this recently, when I had a frozenset that I knew had 1 element
> > (it was the difference between 2 other sets), but couldn't get to that
> > element easily (get the pun?)
>
> So in my testing (2) was actually the fastest. I assumed because .next()
> was a function call overhead, while:
> for x in some_set:
> break
>
> Was evaluated inline. It probably still has to call PyObject_GetIter,
> however it doesn't have to create a stack frame for it.
>
> This is what "timeit" tells me. All runs are of the form:
> python -m timeit -s "s = set([10])" ...
>
> 0.101us "for x in s: break; x"
> 0.130us "for x in s: pass; x"
> 0.234us -s "n = next; i = iter" "x = n(i(s)); x"
> 0.248us "x = next(iter(s)); x"
> 0.341us "x = iter(s).next(); x"
>
> So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x
> faster than (iter(s).next()).
> I was pretty surprised that it was 30% faster than "for x in s: pass". I
> assume it has something to do with a potential "else:" statement?
>
> Note that all of these are significantly < 1us. So this only matters if
> it is something you are doing often.
>
> I don't know your specific timings, but I would guess that:
> for x in s: break
>
> Is actually going to be faster than your
> s.get()
>
> Primarily because s.get() requires an attribute lookup. I would think
> your version might be faster for:
> stat2 = "g = s.get; for i in xrange(100): g()"
>
> However, that is still a function call, which may be treated differently
> by the interpreter than the for:break loop. I certainly suggest you try
> it and compare.
>
> John
> =:->
>


steve at pearwood

Oct 23, 2009, 3:44 PM

Post #10 of 61 (1340 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

On Sat, 24 Oct 2009 07:11:57 am John Arbash Meinel wrote:

> The point of my test was that it was a set with a *single* item, and
> 'break' was 30% faster than 'pass'. Which was surprising.

Not really. See below.

> Certainly
> the difference is huge if there are 10k items in the set.


Earlier you suggested that the difference may have been because of a
potential "else" statement. That won't be it: if there's no "else" in
the code, it's not compiled in:

>>> import dis
>>> dis.dis(compile("for x in s: break", "", "exec"))
1 0 SETUP_LOOP 15 (to 18)
3 LOAD_NAME 0 (s)
6 GET_ITER
>> 7 FOR_ITER 7 (to 17)
10 STORE_NAME 1 (x)
13 BREAK_LOOP
14 JUMP_ABSOLUTE 7
>> 17 POP_BLOCK
>> 18 LOAD_CONST 0 (None)
21 RETURN_VALUE
>>>
>>> dis.dis(compile("for x in s: pass", "", "exec"))
1 0 SETUP_LOOP 14 (to 17)
3 LOAD_NAME 0 (s)
6 GET_ITER
>> 7 FOR_ITER 6 (to 16)
10 STORE_NAME 1 (x)
13 JUMP_ABSOLUTE 7
>> 16 POP_BLOCK
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE



The difference is likely to be this:

for x in s: break

retrieves the first (only) element of the set, then immediately breaks
out of the loop. This is very different from:

for x in s: pass

which retrieves the first element of the set, then tries to retrieve a
second element, which fails and raises StopIteration, which is then
caught, ending the loop. That's much more expensive.



--
Steven D'Aprano
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


steve at pearwood

Oct 23, 2009, 3:46 PM

Post #11 of 61 (1340 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

On Sat, 24 Oct 2009 06:04:12 am Terry Reedy wrote:
> John Arbash Meinel wrote:
> > So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x
> > faster than (iter(s).next()).
> > I was pretty surprised that it was 30% faster than "for x in s:
> > pass". I assume it has something to do with a potential "else:"
> > statement?
>
> for x in s: pass
>
> iterates through *all* the elements in s and leaves x bound to the
> arbritrary *last* one instead of the arbitrary *first* one. For a
> large set, this would be a lot slower, not just a little.
>
> fwiw, I think the use case for this is sufficiently rare that it does
> not need a separate method just for this purpose.


And yet it keeps coming up, again and again... obviously people using
sets in code think it has a use-case.

I did ask earlier for a use-case, and the OP hasn't replied, but Vitor
Bosshard did point out one of his use-cases: he had the difference
between two frozensets, which he knew had only one element, but due to
the lack of pop() he had no straightforward way of finding out what
that element was.

The lack of get() in sets and frozensets is sounding more and more to me
like the victory of purity over practicality.



--
Steven D'Aprano
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


steve at pearwood

Oct 23, 2009, 3:49 PM

Post #12 of 61 (1340 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

On Sat, 24 Oct 2009 07:53:24 am Willi Richert wrote:
> Hi,
>
> surprised about the performance of for/break provided by Vitor, I did
> some more testing. It revealed that indeed we can forget the get()
> (which was implemented as a stripped down pop()):


I don't understand that conclusion. According to your tests, your
implementation of get() is as fast as "for x in set: break", and it's
certainly much, much more readable and straightforward.



--
Steven D'Aprano
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ben+python at benfinney

Oct 23, 2009, 4:26 PM

Post #13 of 61 (1329 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

Steven D'Aprano <steve [at] pearwood> writes:

> On Sat, 24 Oct 2009 06:04:12 am Terry Reedy wrote:
> > fwiw, I think the use case for this is sufficiently rare that it
> > does not need a separate method just for this purpose.
>
> And yet it keeps coming up, again and again... obviously people using
> sets in code think it has a use-case.

The desire for this may be obvious, but what seems to be lacking is One
Obvious Way to Do It.

I agree that ‘for x in foo_set: break’ is not an Obvious Way.

> The lack of get() in sets and frozensets is sounding more and more to
> me like the victory of purity over practicality.

What would be the input to ‘set.get’?

If it's the value, that makes it rather non-obvious; if I already know
about ‘dict.get’ which takes the key, I'm not going to be thinking about
the ‘get’ method if I don't have a key to feed it. Once I learn it, I'm
going to forget it easily, because it's inconsistent with ‘dict.get’. So
I don't think that meets the “obvious way” criterion.

By analogy with ‘list.pop’, the method that takes the “top” item off the
“stack”, I would expect to see ‘list.peek’ and ‘set.peek’, to see the
item without altering the container.

--
\ “Odious ideas are not entitled to hide from criticism behind |
`\ the human shield of their believers' feelings.” —Richard |
_o__) Stallman |
Ben Finney

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


steve at pearwood

Oct 23, 2009, 7:07 PM

Post #14 of 61 (1330 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

On Sat, 24 Oct 2009 10:26:27 am Ben Finney wrote:
> Steven D'Aprano <steve [at] pearwood> writes:
> > On Sat, 24 Oct 2009 06:04:12 am Terry Reedy wrote:
> > > fwiw, I think the use case for this is sufficiently rare that it
> > > does not need a separate method just for this purpose.
> >
> > And yet it keeps coming up, again and again... obviously people
> > using sets in code think it has a use-case.
>
> The desire for this may be obvious, but what seems to be lacking is
> One Obvious Way to Do It.
>
> I agree that ‘for x in foo_set: break’ is not an Obvious Way.
>
> > The lack of get() in sets and frozensets is sounding more and more
> > to me like the victory of purity over practicality.
>
> What would be the input to ‘set.get’?

It wouldn't take any input.


> If it's the value, that makes it rather non-obvious;

It makes it pointless if it takes the value. If you already have the
value, why would you need to retrieve it from the set?


> if I already
> know about ‘dict.get’ which takes the key, I'm not going to be
> thinking about the ‘get’ method if I don't have a key to feed it.
> Once I learn it, I'm going to forget it easily, because it's
> inconsistent with ‘dict.get’. So I don't think that meets the
> “obvious way” criterion.

"get" is such a generic term that I don't believe that is a problem.
There are already quite a number of get() methods in the 2.6 std lib:

$ grep "def get[(]" *.py
_abcoll.py: def get(self, key, default=None):
ConfigParser.py: def get(self, section, option):
ConfigParser.py: def get(self, section, option, raw=False,
vars=None):
doctest.py: def get(self):
mailbox.py: def get(self, key, default=None):
os.py: def get(self, key, failobj=None):
pickle.py: def get(self, i, pack=struct.pack):
Queue.py: def get(self, block=True, timeout=None):
shelve.py: def get(self, key, default=None):
sre_parse.py: def get(self):
threading.py: def get(self):
UserDict.py: def get(self, key, failobj=None):
UserDict.py: def get(self, key, default=None):
weakref.py: def get(self, key, default=None):
weakref.py: def get(self, key, default=None):
webbrowser.py:def get(using=None):


I think you over-estimate the degree of confusion people suffer from
similar sounding methods. I don't think people are confused that
dict[key] and list[index] have different semantics, and I don't see why
dict.get(key[, default]) and set.get() would be any different.

But if you don't like get(), spell it differently. There's plenty of
opportunity for bike-shedding:

getitem()
getelement()
get_arbitrary_element()
peek()

etc.



> By analogy with ‘list.pop’, the method that takes the “top” item off
> the “stack”, I would expect to see ‘list.peek’ and ‘set.peek’, to see
> the item without altering the container.

You don't need list.peek() because there already is an obvious way to
retrieve an item from a list without removing it: list[index].





--
Steven D'Aprano
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ben+python at benfinney

Oct 23, 2009, 8:02 PM

Post #15 of 61 (1329 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

Steven D'Aprano <steve [at] pearwood> writes:

> On Sat, 24 Oct 2009 10:26:27 am Ben Finney wrote:
> > Steven D'Aprano <steve [at] pearwood> writes:
> > > The lack of get() in sets and frozensets is sounding more and more
> > > to me like the victory of purity over practicality.
> >
> > What would be the input to ‘set.get’?
>
> It wouldn't take any input.

That is even less obvious. I would expect a parameter-less ‘set.get’ to
get the set. Not terribly useful, but the name and function signature
doesn't suggest anything else.

> "get" is such a generic term that I don't believe that is a problem.

The problem above is made less problematic by the fact that the function
signature (e.g. ‘foo_dict.get(key)’) clarifies the answer to the
question “get what?”. Whereas ‘foo_set.get()’ doesn't communicate much
at all to the reader.

If we want a method that gets one item from a set, perhaps the name can
make it clearer: name it ‘set.getitem’. But which item should it get?
The ‘__getitem__’ special method of lists and dictionaries requires an
index or key as parameter.

--
\ “Roll dice!” “Why?” “Shut up! I don't need your fucking |
`\ *input*, I need you to roll dice!” —Luke Crane, demonstrating |
_o__) his refined approach to play testing, 2009 |
Ben Finney

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


steve at pearwood

Oct 23, 2009, 8:38 PM

Post #16 of 61 (1329 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

On Sat, 24 Oct 2009 02:02:48 pm Ben Finney wrote:
> Steven D'Aprano <steve [at] pearwood> writes:
> > On Sat, 24 Oct 2009 10:26:27 am Ben Finney wrote:
> > > Steven D'Aprano <steve [at] pearwood> writes:
> > > > The lack of get() in sets and frozensets is sounding more and
> > > > more to me like the victory of purity over practicality.
> > >
> > > What would be the input to ‘set.get’?
> >
> > It wouldn't take any input.
>
> That is even less obvious. I would expect a parameter-less ‘set.get’
> to get the set. Not terribly useful, but the name and function
> signature doesn't suggest anything else.

You already have the set. Why would you need a method that you call that
returns the set you already have?

A method called "get" obviously gets something, and if it takes no
arguments the only thing is has access to is the set. The obvious
things it could get are the set as a whole or some part of the set.
Since getting the set as a whole would be pointless, and we're allowed
to assume that the language designers aren't lunatics who would waste
time creating pointless methods, the obvious answer is that it gets
some part of the set.

Since there's no obvious reason to choose one subset over another
subset, the only useful thing it could return is a single arbitrary
item. But if you're not willing to guess what it gets, you are
permitted to read the docstring.


> > "get" is such a generic term that I don't believe that is a
> > problem.
>
> The problem above is made less problematic by the fact that the
> function signature (e.g. ‘foo_dict.get(key)’) clarifies the answer to
> the question “get what?”. Whereas ‘foo_set.get()’ doesn't communicate
> much at all to the reader.

You are demanding a level of intuitiveness that few, if any, functions
in the standard library would be able to meet. Consider list.index(x):
would you expect it to return the value at index x, or the index of
value x? Either description is compatible with the name, and in fact
people sometimes mix them up.

Or sorted() -- from the name alone, I'd expect this to work, but it
doesn't:

>>> sorted(1, 2, 3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable


That's not a problem with the built-in function. I took a guess about
the API, and guessed wrong. I'll learn from this, and get it right next
time.


> If we want a method that gets one item from a set, perhaps the name
> can make it clearer: name it ‘set.getitem’. But which item should it
> get? The ‘__getitem__’ special method of lists and dictionaries
> requires an index or key as parameter.

Neither of which is appropriate for sets -- sets are unordered, so
elements don't have indexes, and they don't map keys to values. They
just have elements.

Sets are not lists, and they're not dicts. Analogies only go so far, and
you can't reason about set methods by considering how lists or dicts
will behave.



--
Steven D'Aprano
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ben+python at benfinney

Oct 23, 2009, 9:06 PM

Post #17 of 61 (1336 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

Steven D'Aprano <steve [at] pearwood> writes:

> On Sat, 24 Oct 2009 02:02:48 pm Ben Finney wrote:
> > I would expect a parameter-less ‘set.get’ to get the set. Not
> > terribly useful, but the name and function signature doesn't suggest
> > anything else.
>
> You already have the set. Why would you need a method that you call
> that returns the set you already have?

Exactly, hence the confusion. I think the method name ‘set.get’ is poor
for that reason.

> A method called "get" obviously gets something, and if it takes no
> arguments the only thing is has access to is the set. The obvious
> things it could get are the set as a whole or some part of the set.

Which then raises the question “what part of the set does it get?”,
which the function signature does nothing to answer. I'm proposing that
a no-parameters ‘set.get’ is needlessly confusing to think about.

> Since there's no obvious reason to choose one subset over another
> subset, the only useful thing it could return is a single arbitrary
> item.

That's not at all obvious, IMO. The name doesn't give much help at all
in getting to that conclusion, and isn't easily associated with that
connotation.

> You are demanding a level of intuitiveness that few, if any, functions
> in the standard library would be able to meet.

I'm not demanding intuitiveness; all programming interfaces are learned.
I'm asking for ease of learning and later recall.

> That's not a problem with the built-in function. I took a guess about
> the API, and guessed wrong. I'll learn from this, and get it right
> next time.

Which is partly my point. Such a narrow use case (“get an arbitrary item
from the set, without specifying anything about what that item might
be”) is a mismatch for such a generic name. It makes it unnecessarily
difficult to make the association.

Since the use case is so specific, I would expect the name to be
specific too, to better match the use case.

(Since I know how your mind works, I can anticipate facetious
fifty-character-long names bubbling up already in a response from you.
Let's just assume that joke is already done :-)

--
\ “Holy tintinnabulation, Batman!” —Robin |
`\ |
_o__) |
Ben Finney

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stephen at xemacs

Oct 23, 2009, 11:22 PM

Post #18 of 61 (1316 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

Ben Finney writes:
> Steven D'Aprano <steve [at] pearwood> writes:

> > "get" is such a generic term that I don't believe that is a problem.
>
> The problem above is made less problematic by the fact that the function
> signature (e.g. 'foo_dict.get(key)') clarifies the answer to the
> question "get what?". Whereas 'foo_set.get()' doesn't communicate much
> at all to the reader.

I agree.

This is precisely why a couple of months ago people were proposing
names like ".getany()" for this API.

The problem brought up then was that pretty clearly people would then
do things like

x = foo.getany()
y = foo.getany()

expecting x and y to be different (ie, interpreting "any" as "random").
Pretty soon there was a whole family of proposals for such methods:
.getfirst(), .getlast(), .getrandom(), .getonly(), ....

I think it would be better to document the various ways of doing this,
and let each program define its own oneliner for the MySet.get() that
makes idiomatic sense in its use case.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Oct 24, 2009, 12:19 AM

Post #19 of 61 (1313 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

Ben Finney wrote:
> Which then raises the question “what part of the set does it get?”,
> which the function signature does nothing to answer. I'm proposing that
> a no-parameters ‘set.get’ is needlessly confusing to think about.

The fact that set.get() is just set.pop() without removing the result
from the set seems perfectly straightforward.

> Since the use case is so specific, I would expect the name to be
> specific too, to better match the use case.

The use case is no more specific than set.pop().

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
---------------------------------------------------------------
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


gzlist at googlemail

Oct 24, 2009, 2:12 AM

Post #20 of 61 (1310 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

On 24/10/2009, Nick Coghlan <ncoghlan [at] gmail> wrote:
> Ben Finney wrote:
>> Which then raises the question “what part of the set does it get?”,
>> which the function signature does nothing to answer. I'm proposing that
>> a no-parameters ‘set.get’ is needlessly confusing to think about.
>
> The fact that set.get() is just set.pop() without removing the result
> from the set seems perfectly straightforward.

There's a different proposed meaning for `set.get` that's been
discussed on python-dev before:

<http://mail.python.org/pipermail/python-dev/2009-April/088128.html>

That one I've had cause for before and no clean and fast way of
writing, this one I've always just done the for/break thing.

Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


w.richert at gmx

Oct 24, 2009, 7:29 AM

Post #21 of 61 (1305 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

Hi,

I agree. But, here are the pros/cons collected from the recent list repsonses:

Pro:
- more readable
- newbies will encounter one of the fastest solution (.get()) before trying
slower "first solutions" like (iter(set).next())

Cons:
- no name consensus. get() getany() arbitrary() ?
- BDFL moratorium, which I find very wise (get() is, however, no language
extension, but std lib extension, which Guido did not moratorize, didn't he?)
- other classes should then also be extended, like frozenset

Regards,
wr

PS: what is the correct verb form of moratorium?

Am Samstag, 24. Oktober 2009 00:49:38 schrieb Steven D'Aprano:
> On Sat, 24 Oct 2009 07:53:24 am Willi Richert wrote:
> > Hi,
> >
> > surprised about the performance of for/break provided by Vitor, I did
> > some more testing. It revealed that indeed we can forget the get()
> > (which was implemented as a stripped down pop()):
>
> I don't understand that conclusion. According to your tests, your
> implementation of get() is as fast as "for x in set: break", and it's
> certainly much, much more readable and straightforward.
>
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


steve at pearwood

Oct 24, 2009, 8:18 AM

Post #22 of 61 (1303 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

On Sat, 24 Oct 2009 08:12:26 pm Martin (gzlist) wrote:

> On 24/10/2009, Nick Coghlan <ncoghlan [at] gmail> wrote:
> > Ben Finney wrote:
> >> Which then raises the question “what part of the set does it
> >> get?”, which the function signature does nothing to answer. I'm
> >> proposing that a no-parameters ‘set.get’ is needlessly confusing
> >> to think about.
> >
> > The fact that set.get() is just set.pop() without removing the
> > result from the set seems perfectly straightforward.
>
> There's a different proposed meaning for `set.get` that's been
> discussed on python-dev before:
>
> <http://mail.python.org/pipermail/python-dev/2009-April/088128.html>

That thread has an example of a use-case for extracting an item from a
set, given the item itself: interning.

The current Python idiom for interning is to use a dict:

# store the canonical version
_cache[item] = item

# retrieve the interned version
item = _cache[item]

It has been argued that such interning is better done by sets rather
than dicts, since this will save a pointer per item (plus any
additional overhead dicts may have over that of sets). If this argument
is accepted, then we want a fast, efficient operation to perform this:

def get(self, item):
"""Return an element of the set equal to item.
Useful for retrieving the canonical version of an element
and for interning.
"""
for x in self:
if x == item:
return x
raise ValueError('item not in set')


The above is O(N); ideally it should be O(1).

Normally we don't care about identity, only equality, so if x and item
above are equal we are indifferent about which one we use. But
interning is a real example of a use-case where we do care about
identity. Arguably it is inefficient and wasteful to use a dict for
interning when sets could be just as fast while using less memory.

The other proposed semantics for set.get() are to retrieve an arbitrary
element. It must be arbitrary, because elements in a set are unordered
and unkeyed. This gives us the semantics of pop() without removing the
element:

def get(self):
"""Return an arbitrary element of the set without removing it."""
for x in self:
return x
raise ValueError("empty set")

This is a frequent request, but I still don't know what the use-case is.

If you'll excuse me stating the obvious, both of these can easily be
written as helper-functions. The main arguments for making them methods
are:

(1) The first one is needlessly O(N) when it could be O(1).

(2) The second one is apparently a common need (although I personally
have never needed it), forcing people to re-invent the wheel, sometimes
incorrectly or inefficiently, e.g.:

def get(aset):
for element in aset:
pass
return element



--
Steven D'Aprano
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


w.richert at gmx

Oct 24, 2009, 8:41 AM

Post #23 of 61 (1303 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

Hi,

someone on this list mentioned that much of the s.get() time is spend on the
name lookup for get(). That is indeed the case:

===================
from timeit import *

stats = [."for i in xrange(1000): iter(s).next() ",
"for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
"for i in xrange(1000): s.add(s.pop()) ",
"for i in xrange(1000): s.get() ",
"g=s.get;\nfor i in xrange(1000): g() "]

for stat in stats:
t = Timer(stat, setup="s=set(range(1000))")
print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
==================

Time for for i in xrange(1000): iter(s).next() : 0.448227
Time for for i in xrange(1000):
for x in s:
break: 0.141669
Time for for i in xrange(1000): s.add(s.pop()) : 0.348055
Time for for i in xrange(1000): s.get() : 0.148580
Time for g=s.get;
for i in xrange(1000): g() : 0.080563

So, now set.get() is indeed the fastest and preferable solution if you need
massive amounts of retrieving elements from a set without removing them.

wr

Am Freitag, 23. Oktober 2009 22:53:24 schrieb Willi Richert:
> Hi,
>
> surprised about the performance of for/break provided by Vitor, I did some
> more testing. It revealed that indeed we can forget the get() (which was
> implemented as a stripped down pop()):
>
> from timeit import *
> stats = [."for i in xrange(1000): iter(s).next() ",
> "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak ",
> "for i in xrange(1000): s.add(s.pop()) ",
> "for i in xrange(1000): s.get() "]
>
> for stat in stats:
> t = Timer(stat, setup="s=set(range(100))")
> try:
> print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
> except:
> t.print_exc()
>
>
>
> $ ./test_get.py
> Time for for i in xrange(1000): iter(s).next() : 0.433080
> Time for for i in xrange(1000):
> for x in s:
> break : 0.148695
> Time for for i in xrange(1000): s.add(s.pop()) : 0.317418
> Time for for i in xrange(1000): s.get() : 0.146673
>
>
> In some tests, for/break was even slightly faster then get().
>
> As always, intuition regarding performance bottlenecks is flawed ;-)
>
> Anyway, thanks for all the helpful comments, especially to Stefan for the
> http://comments.gmane.org/gmane.comp.python.ideas/5606 link.
>
> Regards,
> wr
>
> Am Freitag, 23. Oktober 2009 19:25:48 schrieb John Arbash Meinel:
> > Vitor Bosshard wrote:
> > > 2009/10/23 Willi Richert <w.richert [at] gmx>:
> > >> Hi,
> > >>
> > >> recently I wrote an algorithm, in which very often I had to get an
> > >> arbitrary element from a set without removing it.
> > >>
> > >> Three possibilities came to mind:
> > >>
> > >> 1.
> > >> x = some_set.pop()
> > >> some_set.add(x)
> > >>
> > >> 2.
> > >> for x in some_set:
> > >> break
> > >>
> > >> 3.
> > >> x = iter(some_set).next()
> > >>
> > >>
> > >> Of course, the third should be the fastest. It nevertheless goes
> > >> through all the iterator creation stuff, which costs some time. I
> > >> wondered, why the builtin set does not provide a more direct and
> > >> efficient way for retrieving some element without removing it. Is
> > >> there any reason for this?
> > >>
> > >> I imagine something like
> > >>
> > >> x = some_set.get()
> > >
> > > I see this as being useful for frozensets as well, where you can't get
> > > an arbitrary element easily due to the obvious lack of .pop(). I ran
> > > into this recently, when I had a frozenset that I knew had 1 element
> > > (it was the difference between 2 other sets), but couldn't get to that
> > > element easily (get the pun?)
> >
> > So in my testing (2) was actually the fastest. I assumed because .next()
> > was a function call overhead, while:
> > for x in some_set:
> > break
> >
> > Was evaluated inline. It probably still has to call PyObject_GetIter,
> > however it doesn't have to create a stack frame for it.
> >
> > This is what "timeit" tells me. All runs are of the form:
> > python -m timeit -s "s = set([10])" ...
> >
> > 0.101us "for x in s: break; x"
> > 0.130us "for x in s: pass; x"
> > 0.234us -s "n = next; i = iter" "x = n(i(s)); x"
> > 0.248us "x = next(iter(s)); x"
> > 0.341us "x = iter(s).next(); x"
> >
> > So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x
> > faster than (iter(s).next()).
> > I was pretty surprised that it was 30% faster than "for x in s: pass". I
> > assume it has something to do with a potential "else:" statement?
> >
> > Note that all of these are significantly < 1us. So this only matters if
> > it is something you are doing often.
> >
> > I don't know your specific timings, but I would guess that:
> > for x in s: break
> >
> > Is actually going to be faster than your
> > s.get()
> >
> > Primarily because s.get() requires an attribute lookup. I would think
> > your version might be faster for:
> > stat2 = "g = s.get; for i in xrange(100): g()"
> >
> > However, that is still a function call, which may be treated differently
> > by the interpreter than the for:break loop. I certainly suggest you try
> > it and compare.
> >
> > John
> > =:->
>
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


alexander.belopolsky at gmail

Oct 24, 2009, 10:34 AM

Post #24 of 61 (1297 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

On Fri, Oct 23, 2009 at 6:46 PM, Steven D'Aprano <steve [at] pearwood> wrote:
> On Sat, 24 Oct 2009 06:04:12 am Terry Reedy wrote:
..
>> fwiw, I think the use case for this is sufficiently rare that it does
>> not need a separate method just for this purpose.
>
>
> And yet it keeps coming up, again and again... obviously people using
> sets in code think it has a use-case.
>
This reminds me a debate I had with Martin several years ago:

http://bugs.python.org/issue1507011

Here is the essence:

AB> I disagree with Martin. I think interning is a set
AB> operation and it is unfortunate that set API does not
AB> support it directly.

ML> I disagree with Alexander's last remark in several respects:
ML> set is indeed a container, and there is a way to get the
ML> elements back (namely, by enumerating them, or picking an
ML> arbitrary element for removal). There is no operation to check,
ML> on insertion of E, what the the prior element equal to E was
ML> (if any); there is only a check to determine whether E is in the
ML> set. The operation "give me the member equal but not identical
ML> to E" is conceptually a lookup operation; the mathematical set
ML> construct has no such operation, and the Python set models
ML> it closely. IOW, set is *not* a dict with key==value.

To me, however, a set seems to be a container that is a specialization
of a dict with values and keys being the same. In this model, a
get() method, or even a __getitem__ with s[k] is k, is only natural.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


guido at python

Oct 24, 2009, 10:55 AM

Post #25 of 61 (1297 views)
Permalink
Re: Retrieve an arbitrary element from a set without removing it [In reply to]

On Sat, Oct 24, 2009 at 10:34 AM, Alexander Belopolsky
<alexander.belopolsky [at] gmail> wrote:
> To me, however, a set seems to be a container that is a specialization
> of a dict with values and keys being the same.

That's absurd; the mapping provides nothing useful.

--
--Guido van Rossum

PS. My elbow needs a couple more weeks of rest. Limiting myself to
ultra-short emails.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

First page Previous page 1 2 3 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.