Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

dictionary.update() enhancement?

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


maj64 at hotmail

Oct 11, 2001, 6:20 PM

Post #1 of 13 (966 views)
Permalink
dictionary.update() enhancement?

A long while ago there was a big debate here regarding the desired
behavior of the dictionary update method when it is asked to update a
key which already exists -- whether it should keep the existing
dictionary value, overwrite the value, or raise an exception, etc.

Looking through the archives, it appears that the following simple,
powerful, and flexible alternative was never considered: have update
accept an optional function parameter. This "collision function"
would be called with the two respective values when a key collision
occurs.

Consider the following uses (s=self's value, o=other's value):

d.update(other, lambda s, o: o) <--current behavior: overwrite
d.update(other, lambda s, o: s) <--keep existing value
d.update(other, lambda s, o: 1/0) <--raise exception (e.g.)
d.update(other, max) <--take the maximum value
d.update(other, merge) <--call special user-defined merging funtion
d.update(other, lambda s, o: deepcopy(o)) <--make a real copy of
other's value
d.update(other, lambda s, o: s+o) <--add the two values (very useful
in some applications)
d.update(other, lambda s, o: None) <--reset/flag the value after
collision

Perhaps you could even populate a dictionary with a list, providing
default values:

d.update(range(100), lambda s, o=1: o) <-- {0: 1, 1: 1, 2: 1 ... 99:
1}

Or provide a list of keys to be updated:

d.update(list, lambda, s, o=None: s+1) <-- update values on selected
keys

The above enhancement would be fully-backwards compatible and just as
fast in the default case. Moreover, it would be much faster than
anything that can currently be implemented within Python (by
subclassing dictionary and overriding the update method with hand
iteration).

Now if I could only write the patch....


max at alcyone

Oct 11, 2001, 6:52 PM

Post #2 of 13 (950 views)
Permalink
dictionary.update() enhancement? [In reply to]

Mark J wrote:

> A long while ago there was a big debate here regarding the desired
> behavior of the dictionary update method when it is asked to update a
> key which already exists -- whether it should keep the existing
> dictionary value, overwrite the value, or raise an exception, etc.
>
> Looking through the archives, it appears that the following simple,
> powerful, and flexible alternative was never considered: have update
> accept an optional function parameter. This "collision function"
> would be called with the two respective values when a key collision
> occurs.

At the very least, the collision function would have to have be given
the key as well, since that could affect the decision.

But after you're done you end up with a complete system which boils down
to manually selecting for each and every key, which makes one wonder why
you just don't it manually in the first place:

d = {...}
other = {...}
for key, value in other:
if d.has_key(key):
... handle collision ...
d.update(other)

--
Erik Max Francis / max [at] alcyone / http://www.alcyone.com/max/
__ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/ \ When the solution is simple, God is answering.
\__/ Albert Einstein
7 sisters productions / http://www.7sisters.com/
Web design for the future.


maj64 at hotmail

Oct 12, 2001, 8:50 AM

Post #3 of 13 (948 views)
Permalink
dictionary.update() enhancement? [In reply to]

Erik Max Francis <max [at] alcyone> wrote in message news:<3BC64CC4.444A77E7 [at] alcyone>...
> Mark J wrote:
>
> > A long while ago there was a big debate here regarding the desired
> > behavior of the dictionary update method when it is asked to update a
> > key which already exists -- whether it should keep the existing
> > dictionary value, overwrite the value, or raise an exception, etc.
> >
> > Looking through the archives, it appears that the following simple,
> > powerful, and flexible alternative was never considered: have update
> > accept an optional function parameter. This "collision function"
> > would be called with the two respective values when a key collision
> > occurs.
>
> At the very least, the collision function would have to have be given
> the key as well, since that could affect the decision.
>
> But after you're done you end up with a complete system which boils down
> to manually selecting for each and every key, which makes one wonder why
> you just don't it manually in the first place:
>
> d = {...}
> other = {...}
> for key, value in other:
> if d.has_key(key):
> ... handle collision ...
> d.update(other)

Because it's not as straight-forward as saying d.update(other,
collision_handler) and it's about 10 times slower.

Mark J


max at alcyone

Oct 12, 2001, 12:04 PM

Post #4 of 13 (950 views)
Permalink
dictionary.update() enhancement? [In reply to]

Mark J wrote:

> Because it's not as straight-forward as saying d.update(other,
> collision_handler) and it's about 10 times slower.

Ten times slower? What in the world gives you that idea?

--
Erik Max Francis / max [at] alcyone / http://www.alcyone.com/max/
__ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/ \ You're a fighter, so fight / Wake up and live
\__/ Sweetbox
Alcyone Systems / http://www.alcyone.com/
Alcyone Systems, San Jose, California.


logiplexsoftware at earthlink

Oct 12, 2001, 12:59 PM

Post #5 of 13 (952 views)
Permalink
dictionary.update() enhancement? [In reply to]

On Friday 12 October 2001 12:04, Erik Max Francis wrote:
> Mark J wrote:
> > Because it's not as straight-forward as saying d.update(other,
> > collision_handler) and it's about 10 times slower.
>
> Ten times slower? What in the world gives you that idea?

Especially since d.update(other, collision_handler) doesn't exist and hence
can't be benchmarked.

--
Cliff Wells
Software Engineer
Logiplex Corporation (www.logiplex.net)
(503) 978-6726 x308
(800) 735-0555 x308


max at alcyone

Oct 12, 2001, 2:45 PM

Post #6 of 13 (948 views)
Permalink
dictionary.update() enhancement? [In reply to]

Cliff Wells wrote:

> On Friday 12 October 2001 12:04, Erik Max Francis wrote:
>
> > Mark J wrote:
> >
> > > Because it's not as straight-forward as saying d.update(other,
> > > collision_handler) and it's about 10 times slower.
> >
> > Ten times slower? What in the world gives you that idea?
>
> Especially since d.update(other, collision_handler) doesn't exist and
> hence
> can't be benchmarked.

Indeed. The only way to speed up such a thing is to make the logic
(iterating through the items in the second list and testing if there is
a collision) written in C, but the thing that's going to take up the
most time here is invocations of the user-supplied function, which will
be Python. One can optimize the former, but will get almost no benefit
since the latter will be the bottleneck. Further, any such benefit will
only be noticeable if there are a huge amount of items or this updating
is being done very frequently. I wouldn't say that's a significant
issue; "lightning fast" is plenty good enough for most applications.

In other words, no, it will not be 1000% slower. I would be surprised
if it were even 10% slower.

--
Erik Max Francis / max [at] alcyone / http://www.alcyone.com/max/
__ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/ \ Too much agreement kills a chat.
\__/ Eldridge Cleaver
Computer science / http://www.alcyone.com/max/reference/compsci/
A computer science reference.


maj64 at hotmail

Oct 12, 2001, 4:06 PM

Post #7 of 13 (949 views)
Permalink
dictionary.update() enhancement? [In reply to]

Mark J wrote:
>> Because it's not as straight-forward as saying d.update(other,
>> collision_handler) and it's about 10 times slower.

Erik Max Francis <max [at] alcyone> wrote:
> Ten times slower? What in the world gives you that idea?

Try it:

import time

dictsize=1000
numloops=10000

d={}
other={}
for i in range(dictsize): #populate the dicts
d[i]=None
other[i]=None

def withupdate(other):
for i in range(numloops):
d.update(other)

def withloop(other):
for i in range(numloops):
for k, v in other.items():
if d.has_key(k):
#this "collision function" will merely duplicate
#update's current behavior;
#i.e. overwrite the existing value.
d[k]=v
else: #new key
d[k]=v

print "Dictionary size: %s Number of loops: %s" % (dictsize,
numloops)
start = time.clock()
withupdate(other)
finish = time.clock()
print "with update: %5.2fs"%(finish-start)
start = time.clock()
withloop(other)
finish = time.clock()
print "with loop: %5.2fs"%(finish-start)

****** Results of run on RedHat 7.1, Dell OptiPlex GX150, 512MB RAM
Dictionary size: 1000 Number of loops: 10000
with update: 0.81s
with loop: 19.36s
>>> 19.36/0.81
23.901234567901231

Ignoring the other benefits to an unhanced update function, that's a
big speedup over hand-iteration. Different test runs using different
parameters were roughly the same.

The above code merely preserves the semantics of the existing update
function since that's the fairest test until dictionary.update() is
changed. (Note: removing the conditional in withloop() still keeps
it about 10x slower.)

Of course, the performance difference will decrease as the complexity
of the collision function increases, but many useful collision
functions are about as simple as or even simpler than the one
currently used by dictionary.update().

By claiming only 10x faster in my last message, I was trying to
(reasonably and fairly?) account for the extra overhead of running the
collision function within update().

Please correct me if I'm missing something. I'd also be interesting
in hearing any big discrepancies on the performance factor on
different platforms.

Mark


max at alcyone

Oct 12, 2001, 4:52 PM

Post #8 of 13 (949 views)
Permalink
dictionary.update() enhancement? [In reply to]

Mark J wrote:

> Mark J wrote:
>
> >> Because it's not as straight-forward as saying d.update(other,
> >> collision_handler) and it's about 10 times slower.
>
> Erik Max Francis <max [at] alcyone> wrote:
>
> > Ten times slower? What in the world gives you that idea?
>
> Try it:

That is among the silliest tests I've ever seen. The current behavior
does _not_ do collision testing, and so you're not comparing manual
collision testing with your proposed collision testing, you're comparing
manual collision testing with _nothing_. Of _course_ it's slower, but
that doesn't mean that your proposed method would be any faster.

Your proposal would have to call out to the provided Python collision
handler function, which is going to be the bottleneck, and so it is
highly unlikely to be significantly faster.

--
Erik Max Francis / max [at] alcyone / http://www.alcyone.com/max/
__ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/ \ Don't overestimate the decency of the human race.
\__/ H.L. Mencken
Erik Max Francis' bookmarks / http://www.alcyone.com/max/links/
A highly categorized list of Web links.


db3l at fitlinxx

Oct 12, 2001, 5:00 PM

Post #9 of 13 (949 views)
Permalink
dictionary.update() enhancement? [In reply to]

maj64 [at] hotmail (Mark J) writes:

> The above code merely preserves the semantics of the existing update
> function since that's the fairest test until dictionary.update() is
> changed. (Note: removing the conditional in withloop() still keeps
> it about 10x slower.)

But it's not the fairest test for prototyping the overhead - having
just the call to update never uses any callback to Python functions as
the proposed change would. Thus, you stay entirely within the
internal C code for the update operation. That's a major difference
from the proposed change.

> Of course, the performance difference will decrease as the complexity
> of the collision function increases, but many useful collision
> functions are about as simple as or even simpler than the one
> currently used by dictionary.update().

Except that they won't be written in C. The overhead of getting out
to a Python callback function within the update processing will probably
be the major factor, as much as whatever that collision function does.

> By claiming only 10x faster in my last message, I was trying to
> (reasonably and fairly?) account for the extra overhead of running the
> collision function within update().
>
> Please correct me if I'm missing something. I'd also be interesting
> in hearing any big discrepancies on the performance factor on
> different platforms.

I think you're severely underestimating the overhead of calling back
out to Python processing code for the comparison function, and your
test case doesn't even try to simulate any overhead for that. Sure, a
Python loop against complete C code is a big difference, but adding a
Python callback changes that a lot since now you have significant
Python code internal to the process in both cases, but in the callback
you're invoking a function each time.

While not precisely apples to apples, I think this is a reasonable
point of comparison. In one case I sort a list just using the normal
internal sort comparison. In the second and third cases I use a
callback function. The first callback just calls the same internal
function that would be used anyway (but thus calls into the
interpreter C code again within the callback) - the second keeps it in
pure Python code, about as close to a minimum implementation as it
can. Because the list is already sorted, there are 9999 callbacks to
the comparison routine, or just about the same as one callback per
entry as with a dictionary update.

import time

list = range(10000)

def mycmp(x,y):
return cmp(x,y)

def mycmp2(x,y):
return (x-y)

start = time.clock()
for i in range(100):
list.sort()
finish = time.clock()

print (finish-start)/100.0

start = time.clock()
for i in range(100):
list.sort(mycmp)
finish = time.clock()

print (finish-start)/100.0

start = time.clock()
for i in range(100):
list.sort(mycmp2)
finish = time.clock()

print (finish-start)/100.0


On my machine, here's what I got:

No cmp 0.00144961120768
mycmp 0.0589527414929
mycmp2 0.0315124934838

So you can see, that even in the simplest callback case, there's a
factor of about 22x between not having a Python code callback, and
having one.

Stretching a bit and extrapolating to your test (showing 23.9x faster
for a straight update versus a loop), that would seem to imply good
odds that having a callback in the update might be darn close in
performance to the explicitly written loop.

In any event, I wouldn't think it high odds that you'll actually see
it 10x faster with the callback.

--
-- David
--
/-----------------------------------------------------------------------\
\ David Bolen \ E-mail: db3l [at] fitlinxx /
| FitLinxx, Inc. \ Phone: (203) 708-5192 |
/ 860 Canal Street, Stamford, CT 06902 \ Fax: (203) 316-5150 \
\-----------------------------------------------------------------------/


tim.one at home

Oct 12, 2001, 7:54 PM

Post #10 of 13 (950 views)
Permalink
dictionary.update() enhancement? [In reply to]

[Mark J]
> A long while ago there was a big debate here regarding the desired
> behavior of the dictionary update method when it is asked to update a
> key which already exists -- whether it should keep the existing
> dictionary value, overwrite the value, or raise an exception, etc.
>
> Looking through the archives, it appears that the following simple,
> powerful, and flexible alternative was never considered: have update
> accept an optional function parameter. This "collision function"
> would be called with the two respective values when a key collision
> occurs.
>
> ...
>
> The above enhancement ...
> would be much faster than anything that can currently be implemented
> within Python (by subclassing dictionary and overriding the update
> method with hand iteration).

Unfortunately, it would be slower than hand iteration. David Bolen went
into the most detail about that, and he's right: the overhead of calling
back into a Python function each time would kill performance.

Compare, e.g.,

x = 3
for i in xrange(1000000):
y = x+x

to

def add(i, j):
return i+j
x = 3
for i in xrange(1000000):
y = add(x, x)

for a direct comparison of doing a thing by hand inline versus calling a
Python function to do it.

I should also note that the actual implementation of dict.update does no
"collision checking" at all; like its docstring says, it's more akin to

>>> print {}.update.__doc__
D.update(E) -> None. Update D from E: for k in E.keys(): D[k] = E[k]
>>>

except that if E is a true dict (or in 2.2 also a subclass of dict) it
iterates over E's key+value pairs directly without materializing a list of
the keys (something you can do yourself in 2.2 via the new
dict.iteritems()).

if-speed-is-the-goal-the-status-quo-wins-ly y'rs - tim


maj64 at hotmail

Oct 13, 2001, 11:31 AM

Post #11 of 13 (951 views)
Permalink
dictionary.update() enhancement? [In reply to]

"Tim Peters" <tim.one [at] home> wrote in message news:<mailman.1002941714.8052.python-list [at] python>...
> [Mark J]
> > The above enhancement ...
> > would be much faster than anything that can currently be implemented
> > within Python (by subclassing dictionary and overriding the update
> > method with hand iteration).
>
> Unfortunately, it would be slower than hand iteration. David Bolen went
> into the most detail about that, and he's right: the overhead of calling
> back into a Python function each time would kill performance.
>
> Compare, e.g.,
>
> x = 3
> for i in xrange(1000000):
> y = x+x
>
> to
>
> def add(i, j):
> return i+j
> x = 3
> for i in xrange(1000000):
> y = add(x, x)
>
> for a direct comparison of doing a thing by hand inline versus calling a
> Python function to do it.

When I ran the above comparison, the extra call to the Python function
performed a bit less than half as fast as the inline function (43%).
This is about the cost that I was figuring on.

> I should also note that the actual implementation of dict.update does no
> "collision checking" at all;

Ya, I should have kept the collision conditional out of the loop in
withloop() for a fairer test. Without the collision test, the hand
iteration is still about 10x slower. When I used other.iteritems()
instead of other.items() with Python2.2a4, I didn't notice a change in
performance until dictsize was over about 3000. After about
dictsize=5000, there was roughly a 2x speedup using iteritems() over
item().

I'll drop the case. Nonetheless, I appreciate the discussion...

Mark


bokr at accessone

Oct 13, 2001, 6:15 PM

Post #12 of 13 (946 views)
Permalink
dictionary.update() enhancement? [In reply to]

On Fri, 12 Oct 2001 22:54:27 -0400, "Tim Peters" <tim.one [at] home> wrote:

>[Mark J]
[...]
>> would be called with the two respective values when a key collision
^^^^^^^^^^^^^^^^^^^^
>> occurs.
^^^^^^
[...]
>
>Unfortunately, it would be slower than hand iteration. David Bolen went
>into the most detail about that, and he's right: the overhead of calling
>back into a Python function each time would kill performance.
^^^^^^^^^

ISTM you're talking about different things ;-)

The way I read it, the collision *detection* would be done
without calling the policy-defining function (presumably
in C modified to make the test in C and conditionally call back).

Then results would depend on the collision percentage, but
in general there should be a gain over a totally-by-hand loop, IWT.


max at alcyone

Oct 13, 2001, 11:39 PM

Post #13 of 13 (949 views)
Permalink
dictionary.update() enhancement? [In reply to]

Bengt Richter wrote:

> ISTM you're talking about different things ;-)

I'm sure the "each time" here meant during each collision.

--
Erik Max Francis / max [at] alcyone / http://www.alcyone.com/max/
__ San Jose, CA, US / 37 20 N 121 53 W / ICQ16063900 / &tSftDotIotE
/ \ Every path has its puddle.
\__/ (an English proverb)
Alcyone Systems' Daily Planet / http://www.alcyone.com/planet.html
A new, virtual planet, every day.

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.