Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Re: Retrieve an arbitrary element from a setwithoutremoving it

 

 

First page Previous page 1 2 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded


python at rcn

Oct 27, 2009, 12:49 PM

Post #1 of 33 (1754 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it

[geremy condra]
> Was it ever decided whether this would fall under the moratorium?

Decided isn't the right word:
http://mail.python.org/pipermail/python-dev/2009-October/093373.html

FWIW, I'm a strong -1 on both proposals.

Just add a short get_one() function and a get_equivalent() recipe
to your utils directory. That will get the job done (thought I don't
expect that you will *ever* make much use of either one).
No need to complexify a type that is currently very simple.


Raymond



P.S. get_equivalent: http://code.activestate.com/recipes/499299/
get_one = lambda s, default=None: next(iter(s), default)

The first works with all type that defines __contains__.
The second works for any iterable.
Neither of these concepts are specific to set objects.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


debatem1 at gmail

Oct 27, 2009, 8:34 PM

Post #2 of 33 (1704 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

On Tue, Oct 27, 2009 at 3:49 PM, Raymond Hettinger <python [at] rcn> wrote:
>
> [geremy condra]
>>
>> Was it ever decided whether this would fall under the moratorium?
>
> Decided isn't the right word:
> http://mail.python.org/pipermail/python-dev/2009-October/093373.html
>

<snip>

I'm unclear- does that imply that this is this going to be decided on a
case-by-case basis in the future or have the details for the big M just
not been hammered out yet?

Geremy Condra
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


python at rcn

Nov 4, 2009, 5:07 PM

Post #3 of 33 (1651 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

[Steven D'Aprano]
>> Anyway, given the level of opposition to the suggestion, I'm no longer
>> willing to carry the flag for it. If anyone else -- perhaps the OP --
>> feels they want to take it any further, be my guest.

[geremy condra]
> I've said before that I'd like there to be one, standard way of
> doing this. A function call- set.pick() seems reasonably named
> to me- is probably the cleanest way to do that. Absent that,
> an example in the docs that illustrates the preferred idiom
> would be great.

Summarizing my opposition to a new set method:
1) there already are at least two succinct ways to get the same effect
2) those ways work with any container, not just sets
3) set implementations in other languages show that this isn't needed.
4) there is value to keeping the API compact
5) isn't needed for optimization (selecting the same value in a loop makes no sense)
6) absence of real-world code examples that would be meaningfully improved

I would be happy to add an example to the docs so that this thread
can finally end.


Raymond
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


eric at trueblade

Nov 4, 2009, 5:12 PM

Post #4 of 33 (1645 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

Raymond Hettinger wrote:
> Summarizing my opposition to a new set method:
> 1) there already are at least two succinct ways to get the same effect
> 2) those ways work with any container, not just sets
> 3) set implementations in other languages show that this isn't needed.
> 4) there is value to keeping the API compact
> 5) isn't needed for optimization (selecting the same value in a loop
> makes no sense)
> 6) absence of real-world code examples that would be meaningfully improved
>
> I would be happy to add an example to the docs so that this thread
> can finally end.

Please do!

Eric.

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


tjreedy at udel

Nov 4, 2009, 5:48 PM

Post #5 of 33 (1643 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

Eric Smith wrote:
> Raymond Hettinger wrote:
>> Summarizing my opposition to a new set method:
>> 1) there already are at least two succinct ways to get the same effect
>> 2) those ways work with any container, not just sets
>> 3) set implementations in other languages show that this isn't needed.
>> 4) there is value to keeping the API compact
>> 5) isn't needed for optimization (selecting the same value in a loop
>> makes no sense)
>> 6) absence of real-world code examples that would be meaningfully
>> improved

Agreed

>> I would be happy to add an example to the docs so that this thread
>> can finally end.
>
> Please do!

Yes!

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


chris at subtlety

Nov 5, 2009, 12:43 PM

Post #6 of 33 (1642 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

On Wed, Nov 4, 2009 at 7:07 PM, Raymond Hettinger <python [at] rcn> wrote:
> [Steven D'Aprano]
>>> Anyway, given the level of opposition to the suggestion, I'm no longer
>>> willing to carry the flag for it. If anyone else -- perhaps the OP --
>>> feels they want to take it any further, be my guest.

I feel pretty strongly that it's a wart in the language, and a
sufficiently strong one that it should be remedied. I'm happy to
champion it, but haven't the faintest idea what that entails.

> Summarizing my opposition to a new set method:
> 1) there already are at least two succinct ways to get the same effect
> 2) those ways work with any container, not just sets
> 3) set implementations in other languages show that this isn't needed.
> 4) there is value to keeping the API compact
> 5) isn't needed for optimization (selecting the same value in a loop makes
> no sense)
> 6) absence of real-world code examples that would be meaningfully improved
>
> I would be happy to add an example to the docs so that this thread
> can finally end.

Adding an example to the docs does not solve the problem, which is
if you come across the following code:

for x in s:
break

... it really looks like it does nothing. It's only because of the
slightly idiosyncratic way Python handles variable scoping that it has
an effect at all, and that effect isn't overtly related to what the
code says, which is "Iterate over all the elements in this set, then
immediately stop after the first one". s.get() or s.pick() are both
more succinct and more clear, saying "Get me an arbitrary element from
this set". To make matters worse, "for x in s: break" fails silently
when s is empty, and "x = iter(s).next()" raises a StopIteration
exception. Neither is clear.
The obvious way, for newcomers, of achieving the effect is:

x = s.pop()
s.add(x)

... and that's simply horrible in terms of efficiency. So the
"obvious" way of doing it in Python is wrong(TM), and the "correct"
way of doing it is obscure and raises misleading exceptions.

I suppose, mulling things over, the method should be called
.pick(), which avoids any confusion with .get(). And, as I've stated,
I think it should return a member of the set, with no guarantees what
member of the set is returned. It could be the same one every time,
or a random one, or the last one placed in the set.
For cases where people want to cycle through the members of the set
in a predictable order, they can either copy the contents into a list
(sorted or unsorted) *or* subclass set and override the .pick() method
to place stronger guarantees on the API.

So, summarizing my responses:

1) the two succinct ways are unclear and not immediately obvious
2) the existing methods aren't needed for other objects
3) set implementations in other languages are irrelevant
4) this is a small, targeted change which not make the API disordered or unruly
5) could very well be needed for optimization, in cases where
constructing an iterator is expensive
6) there have been several real-world examples posted which would be
improved by this change

-- Chris
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


alexander.belopolsky at gmail

Nov 5, 2009, 1:09 PM

Post #7 of 33 (1639 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

On Thu, Nov 5, 2009 at 3:43 PM, Chris Bergstresser <chris [at] subtlety> wrote:
> .. and "x = iter(s).next()" raises a StopIteration
> exception.

And that's why the documented recipe should probably recommend
next(iter(s), default) instead. Especially because iter(s).next() is
not even valid code in 3.0.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


python at rcn

Nov 5, 2009, 2:30 PM

Post #8 of 33 (1647 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

[Chris Bergstresser]
> Clearly, I'll need to write up the PEP.

Why not write a short, fast get_first() function for your utils directory and be done with it?
That could work with sets, mappings, generators, and other containers and iterators.
No need to fatten the set/frozenset API for something so trivial and so rarely needed.


Raymond
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


python at rcn

Nov 5, 2009, 3:02 PM

Post #9 of 33 (1638 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

[me]
> Why not write a short, fast get_first() function for your utils directory and be done with it?
> That could work with sets, mappings, generators, and other containers and iterators.
> No need to fatten the set/frozenset API for something so trivial and so rarely needed.

Forgot to post the code. It is short, fast, and easy. It is explicit about handing the case with an empty input. And it is
specific about which value it returns (always the first iterated value; not an arbitrary one). There's no guessing about what it
does. It gets the job done.

def first(iterable):
'Return the first value from a container or iterable. If empty, raises LookupError'
for value in iterable:
return value
raise LookupError('no first value; iterable is empty')

If desired, it is not hard to change to the last time to return a default value or some other exception.


Raymond



_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


debatem1 at gmail

Nov 5, 2009, 3:04 PM

Post #10 of 33 (1637 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

On Thu, Nov 5, 2009 at 4:09 PM, Alexander Belopolsky
<alexander.belopolsky [at] gmail> wrote:
> On Thu, Nov 5, 2009 at 3:43 PM, Chris Bergstresser <chris [at] subtlety> wrote:
>> .. and "x = iter(s).next()" raises a StopIteration
>> exception.
>
> And that's why the documented recipe should probably recommend
> next(iter(s), default) instead.  Especially because iter(s).next() is
> not even valid code in 3.0.

This seems reasonably legible to you? Strikes me as coding by
incantation. Also, while I've heard people say that the naive
approach is slower, I'm not getting that result. Here's my test:


>>> smrt = timeit.Timer("next(iter(s))", "s=set(range(100))")
>>> smrt.repeat(10)
[1.2845709323883057, 0.60247397422790527, 0.59621405601501465,
0.59133195877075195, 0.58387589454650879, 0.56839084625244141,
0.56839680671691895, 0.56877803802490234, 0.56905913352966309,
0.56846404075622559]

>>> naive = timeit.Timer("x=s.pop();s.add(x)", "s=set(range(100))")
>>> naive.repeat(10)
[0.93139314651489258, 0.53566789627075195, 0.53674602508544922,
0.53608798980712891, 0.53634309768676758, 0.53557991981506348,
0.53578495979309082, 0.53666114807128906, 0.53576493263244629,
0.53491711616516113]


Perhaps my test is flawed in some way?

Geremy Condra
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


foom at fuhm

Nov 5, 2009, 3:17 PM

Post #11 of 33 (1637 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

On Nov 5, 2009, at 6:04 PM, geremy condra wrote:
> Perhaps my test is flawed in some way?

Yes: you're testing the speed of something that makes absolutely no
sense to do in a tight loop, so *who the heck cares how fast any way
of doing it is*!

Is this thread over yet?

James
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ubershmekel at gmail

Nov 5, 2009, 3:52 PM

Post #12 of 33 (1636 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

On Fri, Nov 6, 2009 at 1:17 AM, James Y Knight <foom [at] fuhm> wrote:

>
> Is this thread over yet?
>
>
Sorry, I just had to point out that pop/add has a side effect that would be
apparent on a set that multiple threads access - it loses an item and then
gets it back. Sounds like a sleeper race condition that's going to be rare
but extremely hard to find if it does occur. Crooked as a gil.

--yuv


debatem1 at gmail

Nov 5, 2009, 4:30 PM

Post #13 of 33 (1633 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

On Thu, Nov 5, 2009 at 6:17 PM, James Y Knight <foom [at] fuhm> wrote:
> On Nov 5, 2009, at 6:04 PM, geremy condra wrote:
>>
>> Perhaps my test is flawed in some way?
>
> Yes: you're testing the speed of something that makes absolutely no sense to
> do in a tight loop, so *who the heck cares how fast any way of doing it is*!
>
> Is this thread over yet?
>
> James

I'm testing the speed because the claim was made that the pop/add
approach was inefficient. Here's the full quote:

> The obvious way, for newcomers, of achieving the effect is:
>
> x = s.pop()
> s.add(x)
>
> ... and that's simply horrible in terms of efficiency. So the
> "obvious" way of doing it in Python is wrong(TM), and the "correct"
> way of doing it is obscure and raises misleading exceptions.

Since I use this in a graph theory library that I am currently trying
to scale to handle 300 million node simulations, and this is used in
several relevant algorithms, I was concerned.

Geremy Condra
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


chris at subtlety

Nov 5, 2009, 6:59 PM

Post #14 of 33 (1633 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

On Thu, Nov 5, 2009 at 5:02 PM, Raymond Hettinger <python [at] rcn> wrote:
> Forgot to post the code.  It is short, fast, and easy.  It is explicit about
> handing the case with an empty input.  And it is specific about which value
> it returns (always the first iterated value; not an arbitrary one).  There's
> no guessing about what it does.  It gets the job done.

I'm trying to take this suggestion in the best possible light,
which is that you honestly think I didn't read past Chapter 3 of the
Python Tutorial, and I am therefore in fact unfamiliar with function
definitions.

-- Chris
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


chris at subtlety

Nov 5, 2009, 7:12 PM

Post #15 of 33 (1632 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

On Thu, Nov 5, 2009 at 6:30 PM, geremy condra <debatem1 [at] gmail> wrote:
> I'm testing the speed because the claim was made that the pop/add
> approach was inefficient. Here's the full quote:
>
>>    The obvious way, for newcomers, of achieving the effect is:
>>
>>  x = s.pop()
>>  s.add(x)
>>
>> ... and that's simply horrible in terms of efficiency.  So the
>> "obvious" way of doing it in Python is wrong(TM), and the "correct"
>> way of doing it is obscure and raises misleading exceptions.

I was talking mainly from a theoretical standpoint, and because the
library I'm working on is designed to work seamlessly over the
network. In those cases, where the set the user is working with is
actually a proxy object across the wire, the time to acquire the
locks, remove the object, release the locks, reacquire the locks, add
the object, then rerelease the locks is *significantly* more expensive
than just noting the set hasn't changed and returning a cached object
from it.

-- Chris
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Nov 5, 2009, 9:43 PM

Post #16 of 33 (1630 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

> On Thu, Nov 5, 2009 at 5:02 PM, Raymond Hettinger <python [at] rcn> wrote:
>> Forgot to post the code. It is short, fast, and easy. It is explicit about
>> handing the case with an empty input. And it is specific about which value
>> it returns (always the first iterated value; not an arbitrary one). There's
>> no guessing about what it does. It gets the job done.
>
> I'm trying to take this suggestion in the best possible light,
> which is that you honestly think I didn't read past Chapter 3 of the
> Python Tutorial, and I am therefore in fact unfamiliar with function
> definitions.

I read Raymond's suggestion rather as a question: why bother with a
tedious, multi-year process, when a three-line function will achieve
exactly the same?

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


steve at pearwood

Nov 5, 2009, 10:01 PM

Post #17 of 33 (1644 views)
Permalink
Re: Retrieve an arbitrary element from a setwithoutremoving it [In reply to]

On Fri, 6 Nov 2009 10:52:54 am Yuvgoog Greenle wrote:
> On Fri, Nov 6, 2009 at 1:17 AM, James Y Knight <foom [at] fuhm> wrote:
> > Is this thread over yet?
>
> Sorry, I just had to point out that pop/add has a side effect that
> would be apparent on a set that multiple threads access - it loses an
> item and then gets it back. Sounds like a sleeper race condition
> that's going to be rare but extremely hard to find if it does occur.
> Crooked as a gil.


Surely Raymond's suggestion also suffers from a similar race condition?

for x in set:
return x

creates a set_iterator. If another thread modifies the original set
after the set_iterator is created but before the return, you would get
a mysterious and almost impossible to debug RuntimeError.



--
Steven D'Aprano
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


chris at subtlety

Nov 5, 2009, 10:19 PM

Post #18 of 33 (1629 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

On Thu, Nov 5, 2009 at 11:43 PM, "Martin v. Löwis" <martin [at] v> wrote:
> I read Raymond's suggestion rather as a question: why bother with a
> tedious, multi-year process, when a three-line function will achieve
> exactly the same?

Because it doesn't achieve exactly the same. What I want is a
sane, rational way to describe what I'm doing in the core API, so
other programmers learning the language don't spend the amount of time
I did perplexed that there was a .pop() and a .remove() and a
.discard(), but there wasn't a .pick(). I don't want to have to write
the same little helper function in every project to fill a deficiency
in the library. I don't want to have to argue about the flaws in
solutions with race conditions, or the fact that cheap functions
become expensive functions when performed over the network, or that
there's a real value in having an atomic operation which throws a sane
exception when it fails, or how it's disturbing to my OCD core to have
an API which believes:

if x in s:
s.remove(x)

... is too confusing, so there should be a .discard() method, but ...

for x in s:
break

... is self-evident and obvious, so there's no need for a .pick().

-- Chris
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


daniel at stutzbachenterprises

Nov 6, 2009, 6:17 AM

Post #19 of 33 (1633 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

On Thu, Nov 5, 2009 at 5:02 PM, Raymond Hettinger <python [at] rcn> wrote:

> def first(iterable):
> 'Return the first value from a container or iterable. If empty, raises
> LookupError'
> for value in iterable:
> return value
> raise LookupError('no first value; iterable is empty')
>

(This email should not be construed to mean that I support adding a .pick
method; I do not.)

I realized this morning that the "for i in s: break" idiom can be O(n), even
assuming no hash collisions. Observe this subtly O(n**2) algorithm:

Cashew:~$ cat > test.py
while s:
for i in s:
break
# Imagine a complex algorithm here that depends on i still being in s
s.remove(i)

Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(20000))'
"`cat test.py`"
10 loops, best of 3: 31.2 msec per loop
Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(40000))'
"`cat test.py`"
10 loops, best of 3: 124 msec per loop
Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(80000))'
"`cat test.py`"
10 loops, best of 3: 491 msec per loop
Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(160000))'
"`cat test.py`"
10 loops, best of 3: 1.96 sec per loop

Doubling n quadruples the run-time, i.e., O(n**2) behavior.

I wonder if those clamoring for a pick() method and complaining about
efficiency are inadvertently running into this?

The problem arises because we're systematically unbalancing the hash table.
The iterator returns the first valid entry in the hash table, which we
remove. Repeat several times and pretty soon the iterator has to spend O(n)
time scanning through empty entries to find the first remaining valid entry.

On the other hand, the pop/add idiom is O(1), since the .pop implementation
contains some voodoo to remember where it left off. Consequently, the
following algorithm is O(n) instead of O(n**2):

Cashew:~$ cat > test.py
while s:
i = s.pop()
s.add(i)
# Imagine a complex algorithm here that depends on i still being in s
s.remove(i)

Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(400000))'
"`cat test.py`"
10 loops, best of 3: 28 msec per loop
Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(800000))'
"`cat test.py`"
10 loops, best of 3: 56.2 msec per loop
Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(1600000))'
"`cat test.py`"
10 loops, best of 3: 113 msec per loop
Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(3200000))'
"`cat test.py`"
10 loops, best of 3: 227 msec per loop

Doubling n doubles the run-time, i.e., O(n) behavior.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>


g.brandl at gmx

Nov 6, 2009, 3:59 PM

Post #20 of 33 (1607 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

Chris Bergstresser schrieb:
> On Thu, Nov 5, 2009 at 11:43 PM, "Martin v. Löwis" <martin [at] v> wrote:
>> I read Raymond's suggestion rather as a question: why bother with a
>> tedious, multi-year process, when a three-line function will achieve
>> exactly the same?
>
> Because it doesn't achieve exactly the same. What I want is a
> sane, rational way to describe what I'm doing in the core API, so
> other programmers learning the language don't spend the amount of time
> I did perplexed that there was a .pop() and a .remove() and a
> .discard(), but there wasn't a .pick(). I don't want to have to write
> the same little helper function in every project to fill a deficiency
> in the library.

Why don't you write that little helper function only in those projects that
actually need it? I suspect that will be a small fraction...

> I don't want to have to argue about the flaws in
> solutions with race conditions, or the fact that cheap functions
> become expensive functions when performed over the network, or that
> there's a real value in having an atomic operation which throws a sane
> exception when it fails, or how it's disturbing to my OCD core to have
> an API which believes:
>
> if x in s:
> s.remove(x)
>
> ... is too confusing, so there should be a .discard() method, but ...
>
> for x in s:
> break
>
> ... is self-evident and obvious, so there's no need for a .pick().

It's not just the self-evidence of the required "workaround" that determines
whether an API function is added to a builtin type. It's whether you need it.
Imagine someone asking for a function on lists that removes duplicates, but
keeps the order of all first occurrences. This is not a completely unimaginable
method, and writing it by hand is more than a few lines (see e.g.
http://code.activestate.com/recipes/52560/#c1). Still, it's not needed
frequently enough to require being a list method.

Georg

--
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


martin at v

Nov 9, 2009, 12:42 AM

Post #21 of 33 (1542 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

> The problem arises because we're systematically unbalancing the hash
> table. The iterator returns the first valid entry in the hash table,
> which we remove. Repeat several times and pretty soon the iterator has
> to spend O(n) time scanning through empty entries to find the first
> remaining valid entry.

Interesting. Something goes wrong, it seems: if items get removed over
and over again, I think the set should shrink (not sure whether it
actually does). Then, I think you should end up with an amortized O(1)
for selecting an element (assuming that the underlying hashes don't
collide).

Regards,
Martin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ncoghlan at gmail

Nov 9, 2009, 5:32 AM

Post #22 of 33 (1540 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

Martin v. Löwis wrote:
>> The problem arises because we're systematically unbalancing the hash
>> table. The iterator returns the first valid entry in the hash table,
>> which we remove. Repeat several times and pretty soon the iterator has
>> to spend O(n) time scanning through empty entries to find the first
>> remaining valid entry.
>
> Interesting. Something goes wrong, it seems: if items get removed over
> and over again, I think the set should shrink (not sure whether it
> actually does). Then, I think you should end up with an amortized O(1)
> for selecting an element (assuming that the underlying hashes don't
> collide).

The behaviour is inherited (in spirit) from dicts. Courtesy of
dictnotes.txt:

"""
* Shrinkage rate upon exceeding maximum sparseness. The current
CPython code never even checks sparseness when deleting a
key. When a new key is added, it resizes based on the number
of active keys, so that the addition may trigger shrinkage
rather than growth.
"""

"""
Dictionary operations involving only a single key can be O(1) unless
resizing is possible. By checking for a resize only when the
dictionary can grow (and may *require* resizing), other operations
remain O(1), and the odds of resize thrashing or memory fragmentation
are reduced. In particular, an algorithm that empties a dictionary
by repeatedly invoking .pop will see no resizing, which might
not be necessary at all because the dictionary is eventually
discarded entirely.
"""

So the rationale is to ensure that only add operations perform a resize
and so that sequential pop() operations don't incur excessive resizing
costs.

Cheers,
Nick.

--
Nick Coghlan | ncoghlan [at] gmail | Brisbane, Australia
---------------------------------------------------------------
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


daniel at stutzbachenterprises

Nov 9, 2009, 7:09 AM

Post #23 of 33 (1545 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

On Mon, Nov 9, 2009 at 2:42 AM, "Martin v. Löwis" <martin [at] v>wrote:

> Interesting. Something goes wrong, it seems: if items get removed over
> and over again, I think the set should shrink (not sure whether it
> actually does). Then, I think you should end up with an amortized O(1)
> for selecting an element (assuming that the underlying hashes don't
> collide).
>

I'm not sure if Python's implementation shrinks or not, but even if it did
shrink, it would still be amortized O(n).

Imagine a completely full hash table that currently contains n elements in n
slots (I know this cannot happen in Python's implementation but it's useful
for illustrative purposes). Assume it will shrink when it gets down to n/2
elements.

Here is my pathological algorithm again, for reference purposes:

while s:
for i in s:
break
# Imagine a complex algorithm here that depends on i still being in s
s.remove(i)

We repeatedly search through the slots sequentially and remove the first
element we find. The first removal requires checking 1 slot, the second
removal requires checking 2 slots, the third removal requires checking 3
slots, and so forth. The hash table will shrink after the n/2-th removal,
when we have checked 1 + 2 + 3 + ... + n/2 = O(n**2) slots for n/2 removals
(or amortized O(n) per removal). It's too late for shrinking to save us;
we've already performed too much work.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>


alexander.belopolsky at gmail

Nov 9, 2009, 7:36 AM

Post #24 of 33 (1542 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

On Mon, Nov 9, 2009 at 10:09 AM, Daniel Stutzbach
<daniel [at] stutzbachenterprises> wrote:
> On Mon, Nov 9, 2009 at 2:42 AM, "Martin v. Löwis" <martin [at] v>
> wrote:
>>
>> Interesting. Something goes wrong, it seems: if items get removed over
>> and over again, I think the set should shrink (not sure whether it
>> actually does). Then, I think you should end up with an amortized O(1)
>> for selecting an element (assuming that the underlying hashes don't
>> collide).
>
> I'm not sure if Python's implementation shrinks or not,

It does not:

>>> s = set(range(100000))
>>> from sys import getsizeof
>>> getsizeof(s)
4194536
>>> while s: x = s.pop()
...
>>> getsizeof(s)
4194536
>>> s.clear()
>>> getsizeof(s)
232
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


bjourne at gmail

Nov 9, 2009, 8:40 AM

Post #25 of 33 (1537 views)
Permalink
Re: Retrieve an arbitrary element from asetwithoutremoving it [In reply to]

2009/11/6 Raymond Hettinger <python [at] rcn>:
> [me]
>>
>> Why not write a short, fast get_first() function for your utils directory
>> and be done with it?
>> That could work with sets, mappings, generators, and other containers and
>> iterators.
>> No need to fatten the set/frozenset API for something so trivial and so
>> rarely needed.
>
> Forgot to post the code.  It is short, fast, and easy.  It is explicit about
> handing the case with an empty input.  And it is specific about which value
> it returns (always the first iterated value; not an arbitrary one).  There's
> no guessing about what it does.  It gets the job done.
>
> def first(iterable):
>   'Return the first value from a container or iterable.  If empty, raises
> LookupError'
>   for value in iterable:
>       return value
>   raise LookupError('no first value; iterable is empty')
>
> If desired, it is not hard to change to the last time to return a default
> value or some other exception.

That function is very nice and genericly lisp-like. Couldn't that one
be in the builtins? It would both solve the problem and avoid
fattening the set API.


--
mvh Björn
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

First page Previous page 1 2 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.