| |
 Search this list this category for: (Advanced)

Mailing List Archive: Python: Python

# Sort comparison

drsalists at gmail

Apr 30, 2012, 10:25 PM

Post #1 of 7 (593 views)
 Sort comparison
A while back I did a sort algorithm runtime comparison for a variety of
sorting algorithms, and then mostly sat on it.

Recently, I got into a discussion with someone on stackoverflow about the

I realize it's commonly said that radixsort is n*k rather than n*log(n).
I've been making that case that in real life, frequently k==log(n).

Anyway, here's the comparison, with code and graph:
http://stromberg.dnsalias.org/~strombrg/sort-comparison/

(It's done in Pure python and Cython, with a little m4).

Interesting, BTW, that an unstable quicksort in Cython was approaching
timsort as a C extension module in performance, even though timsort in
Cython wasn't that performant. It makes one wonder if a highly optimized
quicksort as a C extension module wouldn't outperform timsort in the
standard library.

Yes, I know, we've committed to keeping list_.sort() stable now, and
quicksort normally isn't stable.

Also, before timsort, did CPython use quicksort? I'd be a little surprised
if it hadn't, since people used to say quicksort was the best thing around
in practice.

ian.g.kelly at gmail

May 1, 2012, 12:21 AM

Post #2 of 7 (572 views)
 Re: Sort comparison [In reply to]
On Mon, Apr 30, 2012 at 11:25 PM, Dan Stromberg <drsalists [at] gmail> wrote:
>
> A while back I did a sort algorithm runtime comparison for a variety of
> sorting algorithms, and then mostly sat on it.
>
> Recently, I got into a discussion with someone on stackoverflow about the
> running time of radix sort.
>
> I realize it's commonly said that radixsort is n*k rather than n*log(n).
> I've been making that case that in real life, frequently k==log(n).
>
> Anyway, here's the comparison, with code and graph:
> http://stromberg.dnsalias.org/~strombrg/sort-comparison/

It can be difficult to distinguish O(n) from O(n log n) on a log-log
plot, so I don't think that graph makes your point very well.

> Interesting, BTW, that an unstable quicksort in Cython was approaching
> timsort as a C extension module in performance, even though timsort in
> Cython wasn't that performant.  It makes one wonder if a highly optimized
> quicksort as a C extension module wouldn't outperform timsort in the
> standard library.
>
> Yes, I know, we've committed to keeping list_.sort() stable now, and
> quicksort normally isn't stable.
>
> Also, before timsort, did CPython use quicksort?  I'd be a little surprised
> if it hadn't, since people used to say quicksort was the best thing around
> in practice.

Based on a little googling, it was quicksort prior to 1.5.2, then it
was changed to a highly optimized samplesort (a quicksort variant) /
insertion sort hybrid until the current timsort was implemented in
2.3.

Cheers,
Ian
--
http://mail.python.org/mailman/listinfo/python-list

tjreedy at udel

May 1, 2012, 10:51 AM

Post #3 of 7 (567 views)
 Re: Sort comparison [In reply to]
On 5/1/2012 1:25 AM, Dan Stromberg wrote:

> Anyway, here's the comparison, with code and graph:
> http://stromberg.dnsalias.org/~strombrg/sort-comparison/
>
> (It's done in Pure python and Cython, with a little m4).
>
> Interesting, BTW, that an unstable quicksort in Cython was approaching
> timsort as a C extension module in performance, even though timsort in
> Cython wasn't that performant. It makes one wonder if a highly
> optimized quicksort as a C extension module wouldn't outperform timsort
> in the standard library.

Timsort is not only stable, but, by design, it is faster than quicksort
for various structured data patterns, many of which are realistic in
practice. For instance, you append k unordered items to n already sorted
items, where k << n, so that k*log(k) is on the order of n. Timsort
discovers the initial run of n sorted items, sorts the k items, and then
merges. The practical time is O(n). Ditto for sorting an already sorted
file in the reverse direction (timsort does an O(n) list reverse).

--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

drsalists at gmail

May 1, 2012, 11:00 AM

Post #4 of 7 (569 views)
 Re: Sort comparison [In reply to]
On Tue, May 1, 2012 at 12:21 AM, Ian Kelly <ian.g.kelly [at] gmail> wrote:

> On Mon, Apr 30, 2012 at 11:25 PM, Dan Stromberg <drsalists [at] gmail>
> wrote:
> >
> > A while back I did a sort algorithm runtime comparison for a variety of
> > sorting algorithms, and then mostly sat on it.
> >
> > Recently, I got into a discussion with someone on stackoverflow about the
> > running time of radix sort.
> >
> > I realize it's commonly said that radixsort is n*k rather than n*log(n).
> > I've been making that case that in real life, frequently k==log(n).
> >
> > Anyway, here's the comparison, with code and graph:
> > http://stromberg.dnsalias.org/~strombrg/sort-comparison/
>
> It can be difficult to distinguish O(n) from O(n log n) on a log-log
> plot, so I don't think that graph makes your point very well.
>

Do you have a suggestion for a better type of graph - or other
linearity/nonlinearity test?

I thought that on a log-log graph, a straight line was still a straight
line, but it's compressed at one end.

drsalists at gmail

May 1, 2012, 11:24 AM

Post #5 of 7 (564 views)
 Re: Sort comparison [In reply to]
On Tue, May 1, 2012 at 10:51 AM, Terry Reedy <tjreedy [at] udel> wrote:

> On 5/1/2012 1:25 AM, Dan Stromberg wrote:
>
> Anyway, here's the comparison, with code and graph:
>> http://stromberg.dnsalias.org/**~strombrg/sort-comparison/<http://stromberg.dnsalias.org/%7Estrombrg/sort-comparison/>
>>
>> (It's done in Pure python and Cython, with a little m4).
>>
>> Interesting, BTW, that an unstable quicksort in Cython was approaching
>> timsort as a C extension module in performance, even though timsort in
>> Cython wasn't that performant. It makes one wonder if a highly
>> optimized quicksort as a C extension module wouldn't outperform timsort
>> in the standard library.
>>
>
> Timsort is not only stable, but, by design, it is faster than quicksort
> for various structured data patterns, many of which are realistic in
> practice. For instance, you append k unordered items to n already sorted
> items, where k << n, so that k*log(k) is on the order of n. Timsort
> discovers the initial run of n sorted items, sorts the k items, and then
> merges. The practical time is O(n). Ditto for sorting an already sorted
> file in the reverse direction (timsort does an O(n) list reverse).
>

Yeah, I know, but thanks for making sure I did.

Timsort is a quite impressive algorithm.

I suspect that the "sorting a mostly-sorted list in near-linear time"
attribute of Timsort tends to encourage sorting in a loop though. Usually
sorting inside a loop gives a slow algorithm, even with something as nice
as Timsort.

ian.g.kelly at gmail

May 1, 2012, 11:52 AM

Post #6 of 7 (568 views)
 Re: Sort comparison [In reply to]
On Tue, May 1, 2012 at 12:00 PM, Dan Stromberg <drsalists [at] gmail> wrote:
>
> On Tue, May 1, 2012 at 12:21 AM, Ian Kelly <ian.g.kelly [at] gmail> wrote:
>>
>> On Mon, Apr 30, 2012 at 11:25 PM, Dan Stromberg <drsalists [at] gmail>
>> wrote:
>> >
>> > A while back I did a sort algorithm runtime comparison for a variety of
>> > sorting algorithms, and then mostly sat on it.
>> >
>> > Recently, I got into a discussion with someone on stackoverflow about
>> > the
>> > running time of radix sort.
>> >
>> > I realize it's commonly said that radixsort is n*k rather than n*log(n).
>> > I've been making that case that in real life, frequently k==log(n).
>> >
>> > Anyway, here's the comparison, with code and graph:
>> > http://stromberg.dnsalias.org/~strombrg/sort-comparison/
>>
>> It can be difficult to distinguish O(n) from O(n log n) on a log-log
>> plot, so I don't think that graph makes your point very well.
>
> Thank you for your comment.
>
> Do you have a suggestion for a better type of graph - or other
> linearity/nonlinearity test?

An ordinary linear-scale graph. It will only show the highest order
of magnitude in any detail, but for determining asymptotic behavior
that's the most interesting part anyway. O(n) should look like a
straight line, and O(n log n) should curve up with a gradually
increasing slope. Alternatively, calculate regressions and compare
the goodness of fit.

> I thought that on a log-log graph, a straight line was still a straight
> line, but it's compressed at one end.

The key characteristic of a log-log plot is that any curve y = ax ** b
will appear as a straight line with a slope of b. So a linear curve
will be a straight line with a slope of 1. A O(n log n) curve will
converge to a straight line with a slope of 1 as n approaches
infinity.
--
http://mail.python.org/mailman/listinfo/python-list

drsalists at gmail

May 1, 2012, 3:19 PM

Post #7 of 7 (565 views)
 Re: Sort comparison [In reply to]
On Tue, May 1, 2012 at 11:52 AM, Ian Kelly <ian.g.kelly [at] gmail> wrote:

> On Tue, May 1, 2012 at 12:00 PM, Dan Stromberg <drsalists [at] gmail>
> wrote:
> >
> > On Tue, May 1, 2012 at 12:21 AM, Ian Kelly <ian.g.kelly [at] gmail>
> wrote:
> >>
> >> On Mon, Apr 30, 2012 at 11:25 PM, Dan Stromberg <drsalists [at] gmail>
> >> wrote:
> >> >
> >> > A while back I did a sort algorithm runtime comparison for a variety
> of
> >> > sorting algorithms, and then mostly sat on it.
> >> >
> >> > Recently, I got into a discussion with someone on stackoverflow about
> >> > the
> >> > running time of radix sort.
> >> >
> >> > I realize it's commonly said that radixsort is n*k rather than
> n*log(n).
> >> > I've been making that case that in real life, frequently k==log(n).
> >> >
> >> > Anyway, here's the comparison, with code and graph:
> >> > http://stromberg.dnsalias.org/~strombrg/sort-comparison/
> >>
> >> It can be difficult to distinguish O(n) from O(n log n) on a log-log
> >> plot, so I don't think that graph makes your point very well.
> >
> > Thank you for your comment.
> >
> > Do you have a suggestion for a better type of graph - or other
> > linearity/nonlinearity test?
>
> An ordinary linear-scale graph. It will only show the highest order
> of magnitude in any detail, but for determining asymptotic behavior
> that's the most interesting part anyway. O(n) should look like a
> straight line, and O(n log n) should curve up with a gradually
> increasing slope. Alternatively, calculate regressions and compare
> the goodness of fit.
>
> > I thought that on a log-log graph, a straight line was still a straight
> > line, but it's compressed at one end.
>
> The key characteristic of a log-log plot is that any curve y = ax ** b
> will appear as a straight line with a slope of b. So a linear curve
> will be a straight line with a slope of 1. A O(n log n) curve will
> converge to a straight line with a slope of 1 as n approaches
> infinity.
>

Thanks for the info.

I actually feel like both the log-log graph and the linear-linear graph,
tell an interesting but different part of the story, so I've kept the
log-log graph and added a linear-linear graph.

I also added a linear-line to the linear-linear graph, for the sake of
comparison.

You can see that radix sort is falling between mergesort and quicksort
(both nlogn on average), and radix sort is coming up almost parallel to
mergesort.