Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Feature suggestion: sum() ought to use a compensated summation algorithm

 

 

First page Previous page 1 2 Next page Last page  View All Python python RSS feed   Index | Next | Previous | View Threaded


szhorvat at gmail

May 3, 2008, 9:50 AM

Post #1 of 34 (966 views)
Permalink
Feature suggestion: sum() ought to use a compensated summation algorithm

I did the following calculation: Generated a list of a million random
numbers between 0 and 1, constructed a new list by subtracting the mean
value from each number, and then calculated the mean again.

The result should be 0, but of course it will differ from 0 slightly
because of rounding errors.

However, I noticed that the simple Python program below gives a result
of ~ 10^-14, while an equivalent Mathematica program (also using double
precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
more precise.

Here's the program (pardon my style, I'm a newbie/occasional user):

from random import random

data = [random() for x in xrange(1000000)]

mean = sum(data)/len(data)
print sum(x - mean for x in data)/len(data)

A little research shows that Mathematica uses a "compensated summation"
algorithm. Indeed, using the algorithm described at
http://en.wikipedia.org/wiki/Kahan_summation_algorithm
gives us a result around ~ 10^-17:

def compSum(arr):
s = 0.0
c = 0.0
for x in arr:
y = x-c
t = s+y
c = (t-s) - y
s = t
return s

mean = compSum(data)/len(data)
print compSum(x - mean for x in data)/len(data)


I thought that it would be very nice if the built-in sum() function used
this algorithm by default. Has this been brought up before? Would this
have any disadvantages (apart from a slight performance impact, but
Python is a high-level language anyway ...)?

Szabolcs Horvát
--
http://mail.python.org/mailman/listinfo/python-list


arnodel at googlemail

May 3, 2008, 10:40 AM

Post #2 of 34 (949 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

Szabolcs Horvát <szhorvat [at] gmail> writes:

[...]
> A little research shows that Mathematica uses a "compensated
> summation" algorithm. Indeed, using the algorithm described at
> http://en.wikipedia.org/wiki/Kahan_summation_algorithm
> gives us a result around ~ 10^-17:
>
> def compSum(arr):
> s = 0.0
> c = 0.0
> for x in arr:
> y = x-c
> t = s+y
> c = (t-s) - y
> s = t
> return s
>
> mean = compSum(data)/len(data)
> print compSum(x - mean for x in data)/len(data)
>
>
> I thought that it would be very nice if the built-in sum() function
> used this algorithm by default. Has this been brought up before?
> Would this have any disadvantages (apart from a slight performance
> impact, but Python is a high-level language anyway ...)?
>
> Szabolcs Horvát

sum() works for any sequence of objects with an __add__ method, not
just floats! Your algorithm is specific to floats.

--
Arnaud
--
http://mail.python.org/mailman/listinfo/python-list


ivan.illarionov at gmail

May 3, 2008, 11:04 AM

Post #3 of 34 (950 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

On Sat, 03 May 2008 18:50:34 +0200, Szabolcs Horvát wrote:

> I did the following calculation: Generated a list of a million random
> numbers between 0 and 1, constructed a new list by subtracting the mean
> value from each number, and then calculated the mean again.
>
> The result should be 0, but of course it will differ from 0 slightly
> because of rounding errors.
>
> However, I noticed that the simple Python program below gives a result
> of ~ 10^-14, while an equivalent Mathematica program (also using double
> precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
> more precise.
>
> Here's the program (pardon my style, I'm a newbie/occasional user):
>
> from random import random
>
> data = [random() for x in xrange(1000000)]
>
> mean = sum(data)/len(data)
> print sum(x - mean for x in data)/len(data)
>
> A little research shows that Mathematica uses a "compensated summation"
> algorithm. Indeed, using the algorithm described at
> http://en.wikipedia.org/wiki/Kahan_summation_algorithm gives us a result
> around ~ 10^-17:
>
> def compSum(arr):
> s = 0.0
> c = 0.0
> for x in arr:
> y = x-c
> t = s+y
> c = (t-s) - y
> s = t
> return s
>
> mean = compSum(data)/len(data)
> print compSum(x - mean for x in data)/len(data)
>
>
> I thought that it would be very nice if the built-in sum() function used
> this algorithm by default. Has this been brought up before? Would this
> have any disadvantages (apart from a slight performance impact, but
> Python is a high-level language anyway ...)?
>
> Szabolcs Horvát

Built-in sum should work with everything, not just floats. I think it
would be useful addition to standard math module.

If you know C you could write a patch to mathmodule.c and submit it to
Python devs.

--
Ivan
--
http://mail.python.org/mailman/listinfo/python-list


nick at craig-wood

May 3, 2008, 11:30 AM

Post #4 of 34 (950 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

Szabolcs Horvát <szhorvat [at] gmail> wrote:
> I did the following calculation: Generated a list of a million random
> numbers between 0 and 1, constructed a new list by subtracting the mean
> value from each number, and then calculated the mean again.
>
> The result should be 0, but of course it will differ from 0 slightly
> because of rounding errors.
>
> However, I noticed that the simple Python program below gives a result
> of ~ 10^-14, while an equivalent Mathematica program (also using double
> precision) gives a result of ~ 10^-17, i.e. three orders of magnitude
> more precise.
>
> Here's the program (pardon my style, I'm a newbie/occasional user):
>
> from random import random
>
> data = [random() for x in xrange(1000000)]
>
> mean = sum(data)/len(data)
> print sum(x - mean for x in data)/len(data)
>
> A little research shows that Mathematica uses a "compensated summation"
> algorithm. Indeed, using the algorithm described at
> http://en.wikipedia.org/wiki/Kahan_summation_algorithm
> gives us a result around ~ 10^-17:

I was taught in my numerical methods course (about 20 years ago now!)
that the way to do this sum with most accuracy is to sum from the
smallest magnitude to the largest magnitude.

Eg

>>> from random import random
>>> data = [random() for x in xrange(1000000)]
>>> mean = sum(data)/len(data)
>>> print sum(x - mean for x in data)/len(data)
1.64152134108e-14
>>> mean = sum(sorted(data))/len(data)
>>> print sum(x - mean for x in data)/len(data)
5.86725534824e-15
>>>

It isn't as good as the compensated sum but it is easy!

> I thought that it would be very nice if the built-in sum() function used
> this algorithm by default. Has this been brought up before? Would this
> have any disadvantages (apart from a slight performance impact, but
> Python is a high-level language anyway ...)?

sum() gets used for any numerical types not just floats...

--
Nick Craig-Wood <nick [at] craig-wood> -- http://www.craig-wood.com/nick
--
http://mail.python.org/mailman/listinfo/python-list


szhorvat at gmail

May 3, 2008, 11:44 AM

Post #5 of 34 (951 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

Arnaud Delobelle wrote:
>
> sum() works for any sequence of objects with an __add__ method, not
> just floats! Your algorithm is specific to floats.

This occurred to me also, but then I tried

sum(['abc', 'efg'], '')

and it did not work. Or is this just a special exception to prevent the
misuse of sum to join strings? (As I said, I'm only an occasional user.)

Generally, sum() seems to be most useful for numeric types (i.e. those
that form a group with respect to __add__ and __neg__/__sub__), which
may be either exact (e.g. integers) or inexact (e.g. floating point
types). For exact types it does not make sense to use compensated
summation (though it wouldn't give an incorrect answer, either), and
sum() cannot decide whether a user-defined type is exact or inexact.

But I guess that it would still be possible to make sum() use
compensated summation for built-in floating point types (float/complex).

Or, to go further, provide a switch to choose between the two methods,
and make use compensated summation for float/complex by default. (But
perhaps people would consider this last alternative a bit too messy.)

(Just some thoughts ...)
--
http://mail.python.org/mailman/listinfo/python-list


hdante at gmail

May 3, 2008, 1:13 PM

Post #6 of 34 (949 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

On May 3, 3:44 pm, Szabolcs Horvát <szhor...@gmail.com> wrote:
> Arnaud Delobelle wrote:
>
> > sum() works for any sequence of objects with an __add__ method, not
> > just floats!  Your algorithm is specific to floats.
>
> This occurred to me also, but then I tried
>
> sum(['abc', 'efg'], '')
>
> and it did not work.  Or is this just a special exception to prevent the
>   misuse of sum to join strings?  (As I said, I'm only an occasional user.)
>
> Generally, sum() seems to be most useful for numeric types (i.e. those
> that form a group with respect to __add__ and __neg__/__sub__), which
> may be either exact (e.g. integers) or inexact (e.g. floating point
> types).  For exact types it does not make sense to use compensated
> summation (though it wouldn't give an incorrect answer, either), and
> sum() cannot decide whether a user-defined type is exact or inexact.
>
> But I guess that it would still be possible to make sum() use
> compensated summation for built-in floating point types (float/complex).
>
> Or, to go further, provide a switch to choose between the two methods,
> and make use compensated summation for float/complex by default.  (But
> perhaps people would consider this last alternative a bit too messy.)
>
> (Just some thoughts ...)

The benefits should be weighted considering the common case. For
example, do you find an error of 10^-14 unbearable ? How many times
someone will add 1.000.000 numbers in a typical scientific application
written in python ?

I believe that moving this to third party could be better. What about
numpy ? Doesn't it already have something similar ?
--
http://mail.python.org/mailman/listinfo/python-list


arnodel at googlemail

May 3, 2008, 1:38 PM

Post #7 of 34 (947 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

Szabolcs Horvát <szhorvat [at] gmail> writes:

> Arnaud Delobelle wrote:
>>
>> sum() works for any sequence of objects with an __add__ method, not
>> just floats! Your algorithm is specific to floats.
>
> This occurred to me also, but then I tried
>
> sum(['abc', 'efg'], '')
>
> and it did not work. Or is this just a special exception to prevent
> the misuse of sum to join strings? (As I said, I'm only an occasional
> user.)

I think that's right: anything with an __add__ method, apart from
string, can be sum()ed.

--
Arnaud
--
http://mail.python.org/mailman/listinfo/python-list


max at alcyone

May 3, 2008, 2:20 PM

Post #8 of 34 (948 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

Szabolcs Horvát wrote:

> Arnaud Delobelle wrote:
>>
>> sum() works for any sequence of objects with an __add__ method, not
>> just floats! Your algorithm is specific to floats.
>
> This occurred to me also, but then I tried
>
> sum(['abc', 'efg'], '')
>
> and it did not work. Or is this just a special exception to prevent the
> misuse of sum to join strings? (As I said, I'm only an occasional user.)

What you wrote is nonsensical there, no different from 'a' + 1 -- which
is why it quite rightly raises a TypeError.

You're trying to add a list to a string, which is nonsensical. You add
strings to strings, or lists to lists, but mixing them up doesn't make
sense. Python can't guess what you mean when you write something like
['abc', 'def'] + '' -- which is the functional equivalent of your call
to sum -- and so doesn't try. It indicates this by raising a TypeError.

--
Erik Max Francis && max [at] alcyone && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM, Y!M erikmaxfrancis
--
http://mail.python.org/mailman/listinfo/python-list


bronger at physik

May 3, 2008, 2:27 PM

Post #9 of 34 (947 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

Hallöchen!

Erik Max Francis writes:

> Szabolcs Horvát wrote:
>
>> Arnaud Delobelle wrote:
>>>
>>> sum() works for any sequence of objects with an __add__ method, not
>>> just floats! Your algorithm is specific to floats.
>>
>> This occurred to me also, but then I tried
>>
>> sum(['abc', 'efg'], '')
>>
>> and it did not work. Or is this just a special exception to
>> prevent the misuse of sum to join strings? (As I said, I'm only
>> an occasional user.)
>
> What you wrote is nonsensical there, no different from 'a' + 1 --
> which is why it quite rightly raises a TypeError.

No, the above expression should yield ''+'abc'+'efg', look for the
signature of sum in the docs.

Tschö,
Torsten.

--
Torsten Bronger, aquisgrana, europa vetus
Jabber ID: bronger [at] jabber
(See http://ime.webhop.org for further contact info.)
--
http://mail.python.org/mailman/listinfo/python-list


max at alcyone

May 3, 2008, 2:31 PM

Post #10 of 34 (948 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

Torsten Bronger wrote:

> No, the above expression should yield ''+'abc'+'efg', look for the
> signature of sum in the docs.

You're absolutely right, I misread it. Sorry about that.

--
Erik Max Francis && max [at] alcyone && http://www.alcyone.com/max/
San Jose, CA, USA && 37 18 N 121 57 W && AIM, Y!M erikmaxfrancis
--
http://mail.python.org/mailman/listinfo/python-list


ivan.illarionov at gmail

May 3, 2008, 2:37 PM

Post #11 of 34 (949 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:

> Arnaud Delobelle wrote:
>>
>> sum() works for any sequence of objects with an __add__ method, not
>> just floats! Your algorithm is specific to floats.
>
> This occurred to me also, but then I tried
>
> sum(['abc', 'efg'], '')

Interesting, I always thought that sum is like shortcut of
reduce(operator.add, ...), but I was mistaken.

reduce() is more forgiving:
reduce(operator.add, ['abc', 'efg'], '' ) # it works
'abcefg'

--
Ivan
--
http://mail.python.org/mailman/listinfo/python-list


sturlamolden at yahoo

May 3, 2008, 3:05 PM

Post #12 of 34 (950 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

On May 3, 10:13 pm, hdante <hda...@gmail.com> wrote:

> I believe that moving this to third party could be better. What about
> numpy ? Doesn't it already have something similar ?

Yes, Kahan summation makes sence for numpy arrays. But the problem
with this algorithm is optimizing compilers. The programmer will be
forced to use tricks like inline assembly to get around the optimizer.
If not, the optimizer would remove the desired features of the
algorithm. But then we have a serious portability problem.
--
http://mail.python.org/mailman/listinfo/python-list


lobais at gmail

May 3, 2008, 3:31 PM

Post #13 of 34 (947 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
> On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
>
> > Arnaud Delobelle wrote:
> >>
> >> sum() works for any sequence of objects with an __add__ method, not
> >> just floats! Your algorithm is specific to floats.
> >
> > This occurred to me also, but then I tried
> >
> > sum(['abc', 'efg'], '')
>
> Interesting, I always thought that sum is like shortcut of
> reduce(operator.add, ...), but I was mistaken.
>
> reduce() is more forgiving:
> reduce(operator.add, ['abc', 'efg'], '' ) # it works
> 'abcefg'

Hm, it works for lists:
sum(([1], [2]), [])
[1, 2]

However I find the seccond argument hack ugly.
Does the sum way have any performance advantages over the reduce way?

--
Best Regards,
Med Venlig Hilsen,
Thomas

--
http://mail.python.org/mailman/listinfo/python-list


ivan.illarionov at gmail

May 3, 2008, 4:12 PM

Post #14 of 34 (941 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

On Sun, 04 May 2008 00:31:01 +0200, Thomas Dybdahl Ahle wrote:

> On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
>> On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
>>
>> > Arnaud Delobelle wrote:
>> >>
>> >> sum() works for any sequence of objects with an __add__ method, not
>> >> just floats! Your algorithm is specific to floats.
>> >
>> > This occurred to me also, but then I tried
>> >
>> > sum(['abc', 'efg'], '')
>>
>> Interesting, I always thought that sum is like shortcut of
>> reduce(operator.add, ...), but I was mistaken.
>>
>> reduce() is more forgiving:
>> reduce(operator.add, ['abc', 'efg'], '' ) # it works 'abcefg'
>
> Hm, it works for lists:
> sum(([1], [2]), [])
> [1, 2]
>
> However I find the seccond argument hack ugly. Does the sum way have any
> performance advantages over the reduce way?

Yes, sum() is faster:

$ python -m timeit "" "sum([[1], [2], [3, 4]], [])"
100000 loops, best of 3: 6.16 usec per loop

$ python -m timeit "import operator" \
"reduce(operator.add, [[1], [2], [3, 4]])"
100000 loops, best of 3: 11.9 usec per loop

--
Ivan
--
http://mail.python.org/mailman/listinfo/python-list


george.sakkis at gmail

May 3, 2008, 5:43 PM

Post #15 of 34 (942 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

On May 3, 7:12 pm, Ivan Illarionov <ivan.illario...@gmail.com> wrote:
> On Sun, 04 May 2008 00:31:01 +0200, Thomas Dybdahl Ahle wrote:
> > On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
> >> On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
>
> >> > Arnaud Delobelle wrote:
>
> >> >> sum() works for any sequence of objects with an __add__ method, not
> >> >> just floats!  Your algorithm is specific to floats.
>
> >> > This occurred to me also, but then I tried
>
> >> > sum(['abc', 'efg'], '')
>
> >> Interesting, I always thought that sum is like shortcut of
> >> reduce(operator.add, ...), but I was mistaken.
>
> >> reduce() is more forgiving:
> >> reduce(operator.add, ['abc', 'efg'], '' ) # it works 'abcefg'
> > Hm, it works for lists:
> > sum(([1], [2]), [])
> > [1, 2]

So it's not reduce() that is more forgiving, it's sum() that makes an
exception for strings only.


> > However I find the seccond argument hack ugly. Does the sum way have any
> > performance advantages over the reduce way?
>
> Yes, sum() is faster:
>
> $ python -m timeit "" "sum([[1], [2], [3, 4]], [])"
> 100000 loops, best of 3: 6.16 usec per loop
>
> $ python -m timeit "import operator" \
> "reduce(operator.add, [[1], [2], [3, 4]])"
> 100000 loops, best of 3: 11.9 usec per loop

Not really; you measure the import and the attribute access time in
the second case. sum() is 2x faster for adding numbers only:

# Adding integers
python -mtimeit --setup="x=[1]*100" "sum(x)"
100000 loops, best of 3: 4.87 usec per loop
python -mtimeit --setup="from operator import add; x=[1]*100"
"reduce(add,x)"
100000 loops, best of 3: 10.1 usec per loop

# Adding floats
python -mtimeit --setup="x=[1.0]*100" "sum(x)"
100000 loops, best of 3: 5.14 usec per loop
python -mtimeit --setup="from operator import add; x=[1.0]*100"
"reduce(add,x)"
100000 loops, best of 3: 10.1 usec per loop

# Adding tuples
python -mtimeit --setup="x=[(1,)]*100" "sum(x,())"
10000 loops, best of 3: 61.6 usec per loop
python -mtimeit --setup="from operator import add; x=[(1,)]*100"
"reduce(add,x,())"
10000 loops, best of 3: 66.3 usec per loop

# Adding lists
python -mtimeit --setup="x=[[1]]*100" "sum(x,[])"
10000 loops, best of 3: 79.9 usec per loop
python -mtimeit --setup="from operator import add; x=[[1]]*100"
"reduce(add,x,[])"
10000 loops, best of 3: 84.3 usec per loop

George
--
http://mail.python.org/mailman/listinfo/python-list


ivan.illarionov at gmail

May 3, 2008, 6:20 PM

Post #16 of 34 (944 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

On Sat, 03 May 2008 17:43:57 -0700, George Sakkis wrote:

> On May 3, 7:12 pm, Ivan Illarionov <ivan.illario...@gmail.com> wrote:
>> On Sun, 04 May 2008 00:31:01 +0200, Thomas Dybdahl Ahle wrote:
>> > On Sat, 2008-05-03 at 21:37 +0000, Ivan Illarionov wrote:
>> >> On Sat, 03 May 2008 20:44:19 +0200, Szabolcs Horvát wrote:
>>
>> >> > Arnaud Delobelle wrote:
>>
>> >> >> sum() works for any sequence of objects with an __add__ method,
>> >> >> not just floats!  Your algorithm is specific to floats.
>>
>> >> > This occurred to me also, but then I tried
>>
>> >> > sum(['abc', 'efg'], '')
>>
>> >> Interesting, I always thought that sum is like shortcut of
>> >> reduce(operator.add, ...), but I was mistaken.
>>
>> >> reduce() is more forgiving:
>> >> reduce(operator.add, ['abc', 'efg'], '' ) # it works 'abcefg'
>> > Hm, it works for lists:
>> > sum(([1], [2]), [])
>> > [1, 2]
>
> So it's not reduce() that is more forgiving, it's sum() that makes an
> exception for strings only.
>
>
>> > However I find the seccond argument hack ugly. Does the sum way have
>> > any performance advantages over the reduce way?
>>
>> Yes, sum() is faster:
>>
>> $ python -m timeit "" "sum([[1], [2], [3, 4]], [])" 100000 loops, best
>> of 3: 6.16 usec per loop
>>
>> $ python -m timeit "import operator" \ "reduce(operator.add, [[1], [2],
>> [3, 4]])" 100000 loops, best of 3: 11.9 usec per loop
>
> Not really; you measure the import and the attribute access time in the
> second case. sum() is 2x faster for adding numbers only:
>
> # Adding integers
> python -mtimeit --setup="x=[1]*100" "sum(x)" 100000 loops, best of 3:
> 4.87 usec per loop python -mtimeit --setup="from operator import add;
> x=[1]*100" "reduce(add,x)"
> 100000 loops, best of 3: 10.1 usec per loop
>
> # Adding floats
> python -mtimeit --setup="x=[1.0]*100" "sum(x)" 100000 loops, best of 3:
> 5.14 usec per loop python -mtimeit --setup="from operator import add;
> x=[1.0]*100" "reduce(add,x)"
> 100000 loops, best of 3: 10.1 usec per loop
>
> # Adding tuples
> python -mtimeit --setup="x=[(1,)]*100" "sum(x,())" 10000 loops, best of
> 3: 61.6 usec per loop python -mtimeit --setup="from operator import add;
> x=[(1,)]*100" "reduce(add,x,())"
> 10000 loops, best of 3: 66.3 usec per loop
>
> # Adding lists
> python -mtimeit --setup="x=[[1]]*100" "sum(x,[])" 10000 loops, best of
> 3: 79.9 usec per loop python -mtimeit --setup="from operator import add;
> x=[[1]]*100" "reduce(add,x,[])"
> 10000 loops, best of 3: 84.3 usec per loop
>
> George

Thanks for correction. Forget about --setup.

--
Ivan
--
http://mail.python.org/mailman/listinfo/python-list


hdante at gmail

May 4, 2008, 8:19 AM

Post #17 of 34 (937 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

On May 3, 7:05 pm, sturlamolden <sturlamol...@yahoo.no> wrote:
> On May 3, 10:13 pm, hdante <hda...@gmail.com> wrote:
>
> >  I believe that moving this to third party could be better. What about
> > numpy ? Doesn't it already have something similar ?
>
> Yes, Kahan summation makes sence for numpy arrays. But the problem
> with this algorithm is optimizing compilers. The programmer will be

No, optimizing compilers shouldn't discard floating point operations
by default, since it changes program behavior. If they do, at least
there should be a flag to turn it off.

> forced to use tricks like inline assembly to get around the optimizer.
> If not, the optimizer would remove the desired features of the
> algorithm. But then we have a serious portability problem.

--
http://mail.python.org/mailman/listinfo/python-list


duncan.booth at invalid

May 4, 2008, 8:58 AM

Post #18 of 34 (935 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

Szabolcs Horvát <szhorvat [at] gmail> wrote:

> I thought that it would be very nice if the built-in sum() function used
> this algorithm by default. Has this been brought up before? Would this
> have any disadvantages (apart from a slight performance impact, but
> Python is a high-level language anyway ...)?

There's a thread you might find interesting:

http://groups.google.com/group/comp.lang.python/browse_thread/thread/9eda29faf92f532e/027cef7d4429aa3a

In that thread I suggested that Python ought to implement sum by adding
together each pair of values, then each pair of results and so on. This
means that for input values of a similar magnitude the intermediate results
stay balanced. The implementation doesn't actually do all the first level
sums first, it builds up a stack containing the sum of 2^k, 2^(k-1)...2
values. Also it works for other types as well, and for strings or lists has
the advantage of minimising the number of large string or list concatenations.

If you follow that thread far enough you'll find a post by Jussi Salmela
which shows that summing adjacent pairs performs as well as Kahan summation
(he says 'almost as good' but in fact the only time they get different
results the 'correct' answer is 500000.000005 and Kahan and sumpairs get
the two nearest representable results either side: 500000.00000499998 and
500000.00000500004 respectively).

I never did get round to re-coding my sumpairs() function in C, I would be
interested to see just how much overhead it introduces (and I suspect it
would be faster than the existing sum in some cases such as for lists).
--
http://mail.python.org/mailman/listinfo/python-list


gagsl-py2 at yahoo

May 4, 2008, 5:14 PM

Post #19 of 34 (930 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

En Sun, 04 May 2008 12:58:25 -0300, Duncan Booth <duncan.booth [at] invalid> escribió:

> Szabolcs Horvát <szhorvat [at] gmail> wrote:
>
>> I thought that it would be very nice if the built-in sum() function used
>> this algorithm by default. Has this been brought up before? Would this
>> have any disadvantages (apart from a slight performance impact, but
>> Python is a high-level language anyway ...)?
>
> There's a thread you might find interesting:
>
> http://groups.google.com/group/comp.lang.python/browse_thread/thread/9eda29faf92f532e/027cef7d4429aa3a
>
> In that thread I suggested that Python ought to implement sum by adding
> together each pair of values, then each pair of results and so on. This
> means that for input values of a similar magnitude the intermediate results
> stay balanced. The implementation doesn't actually do all the first level

Python doesn't require __add__ to be associative, so this should not be used as a general sum replacement. But if you know that you're adding floating point numbers you can use whatever algorithm best fits you. Or use numpy arrays; I think they implement Kahan summation or a similar algorithm.

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


szhorvat at gmail

May 5, 2008, 12:31 AM

Post #20 of 34 (930 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

Duncan Booth wrote:
> Szabolcs Horvát <szhorvat [at] gmail> wrote:
>
>> I thought that it would be very nice if the built-in sum() function used
>> this algorithm by default. Has this been brought up before? Would this
>> have any disadvantages (apart from a slight performance impact, but
>> Python is a high-level language anyway ...)?
>
> There's a thread you might find interesting:
>
> http://groups.google.com/group/comp.lang.python/browse_thread/thread/9eda29faf92f532e/027cef7d4429aa3a
>
> In that thread I suggested that Python ought to implement sum by adding
> together each pair of values, then each pair of results and so on. This
> means that for input values of a similar magnitude the intermediate results
> stay balanced. The implementation doesn't actually do all the first level
> sums first, it builds up a stack containing the sum of 2^k, 2^(k-1)...2
> values. Also it works for other types as well, and for strings or lists has
> the advantage of minimising the number of large string or list concatenations.
>
> If you follow that thread far enough you'll find a post by Jussi Salmela
> which shows that summing adjacent pairs performs as well as Kahan summation
> (he says 'almost as good' but in fact the only time they get different
> results the 'correct' answer is 500000.000005 and Kahan and sumpairs get
> the two nearest representable results either side: 500000.00000499998 and
> 500000.00000500004 respectively).
>
> I never did get round to re-coding my sumpairs() function in C, I would be
> interested to see just how much overhead it introduces (and I suspect it
> would be faster than the existing sum in some cases such as for lists).

Thanks for this link! Though some of the thread is missing, it was an
interesting read.

This sounds like a very good idea: it is more generally applicable than
the Kahan summation because it does not require subtraction/negation, so
it would work with user-defined types, too.

While now I am convinced that using Kahan summation by default is not
such a good idea, I cannot see any serious disadvantages/problems with
pairwise summation!
--
http://mail.python.org/mailman/listinfo/python-list


szhorvat at gmail

May 5, 2008, 12:37 AM

Post #21 of 34 (928 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

Gabriel Genellina wrote:
>
> Python doesn't require __add__ to be associative, so this should not be used as a general sum replacement.

It does not _require_ this, but using an __add__ that is not commutative
and associative, or has side effects, would qualify as a serious misuse,
anyway. So I think that this isn't a real disadvantage (it can always
be documented that sum() expects __add__ to have these properties).

But you are right that summing floats with a requirement of high
precision is a quite specific use case, so there are no serious
arguments to do this, either.
--
http://mail.python.org/mailman/listinfo/python-list


python at rcn

May 5, 2008, 1:08 AM

Post #22 of 34 (925 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

On May 3, 9:50 am, Szabolcs Horvát <szhor...@gmail.com> wrote:
> I did the following calculation:  Generated a list of a million random
> numbers between 0 and 1, constructed a new list by subtracting the mean
> value from each number, and then calculated the mean again.
>
> The result should be 0, but of course it will differ from 0 slightly
> because of rounding errors.

See:
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/393090



> Here's the program (pardon my style, I'm a newbie/occasional user):
. . .
> mean = sum(data)/len(data)
> print sum(x - mean for x in data)/len(data)

See:
http://svn.python.org/view/sandbox/trunk/statistics/statistics.py?rev=40981&view=markup


Raymond

--
http://mail.python.org/mailman/listinfo/python-list


python at rcn

May 5, 2008, 1:16 AM

Post #23 of 34 (925 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

> > > However I find the seccond argument hack ugly. Does the sum way have any
> > > performance advantages over the reduce way?
>
> > Yes, sum() is faster:
...

> Not really; you measure the import and the attribute access time in
> the second case. sum() is 2x faster for adding numbers only:

Try it with Py2.6 or Py3.0. The sum() function has been optimized and
should be at least 6x faster for summing ints and floats. It should
even beat psyco optimized sums!


Raymond
--
http://mail.python.org/mailman/listinfo/python-list


szhorvat at gmail

May 5, 2008, 1:26 AM

Post #24 of 34 (924 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

On May 5, 9:37 am, Szabolcs Horvát <szhor...@gmail.com> wrote:
> Gabriel Genellina wrote:
>
> > Python doesn't require __add__ to be associative, so this should not be used as a general sum replacement.
>
> It does not _require_ this, but using an __add__ that is not commutative
> and associative, or has side effects, would qualify as a serious misuse,
> anyway.

Sorry, I meant associative only, not commutative.
--
http://mail.python.org/mailman/listinfo/python-list


duncan.booth at invalid

May 5, 2008, 1:37 AM

Post #25 of 34 (927 views)
Permalink
Re: Feature suggestion: sum() ought to use a compensated summation algorithm [In reply to]

"Gabriel Genellina" <gagsl-py2 [at] yahoo> wrote:

> En Sun, 04 May 2008 12:58:25 -0300, Duncan Booth
> <duncan.booth [at] invalid> escribió:
>
>> Szabolcs Horvát <szhorvat [at] gmail> wrote:
>>
>>> I thought that it would be very nice if the built-in sum() function
>>> used this algorithm by default. Has this been brought up before?
>>> Would this have any disadvantages (apart from a slight performance
>>> impact, but Python is a high-level language anyway ...)?
>>
>> There's a thread you might find interesting:
>>
>> http://groups.google.com/group/comp.lang.python/browse_thread/thread/9
>> eda29faf92f532e/027cef7d4429aa3a
>>
>> In that thread I suggested that Python ought to implement sum by
>> adding together each pair of values, then each pair of results and so
>> on. This means that for input values of a similar magnitude the
>> intermediate results stay balanced. The implementation doesn't
>> actually do all the first level
>
> Python doesn't require __add__ to be associative, so this should not
> be used as a general sum replacement. But if you know that you're
> adding floating point numbers you can use whatever algorithm best fits
> you. Or use numpy arrays; I think they implement Kahan summation or a
> similar algorithm.
>
Yes, my argument is more along the line that it should have been
implemented that way in the first place, but changing things now would
require time machine intervention. :)


--
http://mail.python.org/mailman/listinfo/python-list

First page Previous page 1 2 Next page Last page  View All Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.