Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Bugs

[issue1771] Remove cmp parameter to list.sort() and builtin.sorted()

 

 

Python bugs RSS feed   Index | Next | Previous | View Threaded


report at bugs

Dec 4, 2009, 2:26 PM

Post #1 of 10 (217 views)
Permalink
[issue1771] Remove cmp parameter to list.sort() and builtin.sorted()

Tom Switzer <thomas.switzer [at] gmail> added the comment:

I am not sure I understand the reasoning behind removing the cmp
parameter (and agree with Lea Wiemann). Trying to wedge a proper
comparison into the key parameter is clumsy and unreadable (as can be
seen in the 2to3 example above). The intrinsic ordering on objects does
not necessarily match up with the way you want to sort them. For
example, a natural intrinsic order on 2 points in 2d is lexicographical,
however you often want to sort by angular order relative to some other
point instead. Clearly this can never be put in __cmp__ or __lt__,
because the sorted order is relative to some other unknown point. Trying
to do this with the key function doesn't make sense; it would not be
clear you are sorting by angular order and you'd have to instantiate a
bunch of wrapper objects just to do basic sorting. Another quick example
would be sorting hyperplanes by intersection on a ray. Sorting points
along a direction given by a vector.

I understand removing redundant features from a language, but I just
can't see how key replaces this functionality in a readable or efficient
way. This highlights an important class of cases (since it was mentioned
that none could be thought of) in which we wish to make comparisons
between values where a comparison (<, > or ==) is more numerically
sound, more efficient, or the only option (perhaps the ordering is
defined explicitly) then computing the exact values (eg. angle). As far
as it seems, the only way to do this with key is by following the
example given and creating a class solely to wrap each object that
overrides __cmp__, which is certainly non-obvious (ie. there is no one,
obvious way to do it).

----------
nosy: +tixxit

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue1771>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 4, 2009, 2:49 PM

Post #2 of 10 (213 views)
Permalink
[issue1771] Remove cmp parameter to list.sort() and builtin.sorted() [In reply to]

Guido van Rossum <guido [at] python> added the comment:

Can someone provide a code sample to make this argument more
understandable for those of us who don't compare points by angular order
for a living... :-)

I'm not sure what the 2to3 example (I presume you mean msg59937) shows
except that conversion from a cmp function to a key function may require
you to actually think...

Also, for all of you asking for cmp back, I hope you realize that
sorting N values using a custom cmp function makes about N log N calls
calls to cmp, whereas using a custom key calls the key function only N
times. This means that even if your cmp function is faster than the
best key function you can write, the advantage is lost as N increases
(which is just where you'd like it to matter most :-).

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue1771>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 4, 2009, 11:04 PM

Post #3 of 10 (206 views)
Permalink
[issue1771] Remove cmp parameter to list.sort() and builtin.sorted() [In reply to]

Raymond Hettinger <rhettinger [at] users> added the comment:

FWIW, we had a long discussion on comp.lang.python and the net result
was that no use cases were found that required a cmp function. One
complex case (sorting recursive tree structures) at first appeared to
need a cmp-function but was found to be simpler and faster using a
key-function. The net result of the conversation was the feeling that
people who have grown-up using cmp-functions in either Python, C or some
other language feel like they've lost something but really haven't. In
contrast, people who use SQL or spreadsheet database tools find that key
functions come naturally since neither supports cmp-functions, instead
preferring the user to specify primary and secondary key functions.

Also, it was pointed-out the removal of cmp-functions in sorted() and
list.sort() was part of a larger effort to remove all forms of cmp from
the whole language (i.e. the builtin cmp function is gone and so it the
__cmp__ magic method). Rich comparisons have completely supplanted all
uses of cmp-functions in the language as a whole -- having multiple ways
to do it was confusing.

In converting code from 2-to-3, we have found two sticky cases.

The first occurs when an API had exposed cmp functions to the end-user
(for example, unittest.getTestCaseNames() and unittest.makeSuite() have
an optional sortUsing parameter that allows the user to specify a
cmp-function). To support that use case (so that end-user API's would
not have to be changed), we added a CmpToKey() tool which automatically
converts cmp-functions to key functions. This tool is referenced in the
docs and it could be added to the 2-to-3 converter.

The second case occurs when a primary key is sorted ascending and a
secondary key is sorted descending. The technique for that is to take
advantage of sort stability and do two sorts:

s.sort(key=secondary, reverse=True)
s.sort(key=primary)
# now sorted by primary ascending, secondary descending

That technique is going to be documented in an update of the sorting
how-to. It doesn't seem to arise much in practice and the cmp function
equivalent seems to be harder for beginners to write (though at least it
can be done with a single cmp-function and a single sort).

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue1771>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 6, 2009, 3:57 AM

Post #4 of 10 (200 views)
Permalink
[issue1771] Remove cmp parameter to list.sort() and builtin.sorted() [In reply to]

Mark Dickinson <dickinsm [at] gmail> added the comment:

Tom, I think I'm missing your point: all three of the examples you give
seem like perfect candidates for a key-based sort rather than a
comparison-based one. For the first example, couldn't you do something
like:

def direction(pt1, pt2):
"""angle of line segment from point 1 to point 2"""
return atan2(pt2.y - pt1.y, pt2.x - pt1.x)

my_points.sort(key=lambda pt: direction(reference_pt, pt))

? How would having a cmp keyword argument make this any easier or
simpler?


Here's the best example I can think of for which key-based sorting is
problematic: imagine that the Decimal type doesn't exist, and that you
have triples (sign, coefficient_string, exponent) representing
arbitrary-precision base 10 floating-point numbers. It's fairly tricky
to come up with a key function that maps these triples to some existing
ordered type, so that they can be sorted in natural numerical order.
The problem lies in the way that the sort order for the coefficient
string and exponent depends on the value of the sign (one way for
positive numbers, reversed for negative numbers). But it's not a big
deal to define a wrapper for cases like this.

----------
nosy: +mark.dickinson

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue1771>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 6, 2009, 8:24 AM

Post #5 of 10 (200 views)
Permalink
[issue1771] Remove cmp parameter to list.sort() and builtin.sorted() [In reply to]

Tom Switzer <thomas.switzer [at] gmail> added the comment:

Mark: I think your example actually helps illustrate my point. My point was
that computing the angle directly is less efficient or not as nice
numerically. For instance, if you are sorting points by angle relative to an
extreme point you could do something like this (a first step in the Graham
Scan) - be prepared for non-Python 3.0 code ;)

from functools import partial
from random import random

def turn(p, q, r):
"""Return -1, 0, or 1 if p,q,r forms a right, straight, or left turn
respectively."""
return cmp((q[0] - p[0])*(r[1] - p[1]) - (r[0] - p[0])*(q[1] - p[1]), 0)

pts = [(random(), random()) for i in xrange(10)]
i = min(xrange(len(pts)), key=lambda i: pts[i])
p = pts.pop(i)
pts.sort(cmp=partial(turn, p))

Here our comparison function requires only 2 multiplications and 5
additions/subtractions. This function is nice especially if you are using
arbitrary precision or rational numbers as it is exact.

On Sun, Dec 6, 2009 at 6:57 AM, Mark Dickinson <report [at] bugs>wrote:

>
> Mark Dickinson <dickinsm [at] gmail> added the comment:
>
> Tom, I think I'm missing your point: all three of the examples you give
> seem like perfect candidates for a key-based sort rather than a
> comparison-based one. For the first example, couldn't you do something
> like:
>
> def direction(pt1, pt2):
> """angle of line segment from point 1 to point 2"""
> return atan2(pt2.y - pt1.y, pt2.x - pt1.x)
>
> my_points.sort(key=lambda pt: direction(reference_pt, pt))
>
> ? How would having a cmp keyword argument make this any easier or
> simpler?
>
>
> Here's the best example I can think of for which key-based sorting is
> problematic: imagine that the Decimal type doesn't exist, and that you
> have triples (sign, coefficient_string, exponent) representing
> arbitrary-precision base 10 floating-point numbers. It's fairly tricky
> to come up with a key function that maps these triples to some existing
> ordered type, so that they can be sorted in natural numerical order.
> The problem lies in the way that the sort order for the coefficient
> string and exponent depends on the value of the sign (one way for
> positive numbers, reversed for negative numbers). But it's not a big
> deal to define a wrapper for cases like this.
>
> ----------
> nosy: +mark.dickinson
>
> _______________________________________
> Python tracker <report [at] bugs>
> <http://bugs.python.org/issue1771>
> _______________________________________
>

----------
Added file: http://bugs.python.org/file15463/unnamed

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue1771>
_______________________________________
Attachments: unnamed (3.09 KB)


report at bugs

Dec 6, 2009, 9:27 AM

Post #6 of 10 (201 views)
Permalink
[issue1771] Remove cmp parameter to list.sort() and builtin.sorted() [In reply to]

Mark Dickinson <dickinsm [at] gmail> added the comment:

Ah. Thanks for the explanation; I see your point. I guess you do just
have to use a CmpToKey-type wrapper for this sort of comparison.

Though for this *particular* example, it seems to me that you could still use a key function

lambda q: (q[0] - p[0])/(q[1]-p[1]),

which would be even more efficient. This is assuming that your extreme
point p has minimal second coordinate amongst points being sorted, which
as I understand it is how the Graham Scan typically starts. There's the
minor difficulty of dealing with points with the *same* second coordinate
as p, where I guess the key value should be some form of +Infinity. I can
also see that this might be problematic if you're working with a type
that's exact for addition and multiplication but not for division (e.g.,
Decimal with unbounded precision).

It would be interesting to see some relative timings for the sort stage of
the Graham scan (on a million random points, say): key function versus cmp
function.

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue1771>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 7:46 AM

Post #7 of 10 (196 views)
Permalink
[issue1771] Remove cmp parameter to list.sort() and builtin.sorted() [In reply to]

Tom Switzer <thomas.switzer [at] gmail> added the comment:

If the equal min y-coords are handled, I think it'd be quicker too. As Guido
noted, O(n) function calls is better then O(n log n) =] Though the general
case is still unhandled. And, though it doesn't help my case, the Graham
Scan can also be performed on points sorted lexicographically too, by
constructing the upper & lower hull separately, hehe.

Now, I understand cmp on the whole was removed from the language. Using
__lt__, __eq__, etc. really is more natural. However, having an explicit cmp
function for sorting makes sense to me. At the very least, it is more
obvious and natural for some problems - though I respect that using a key
func. is often faster. In some rare (though "rare" is very subjective) cases
it is required; packing a cmp function into __cmp__ in a wrapper object is
really just a hard-to-read cmp function and highlights the need for cmp. I
would actually love to see it added for min/max too actually, since I find I
often use a simple reduce function in place of a min(lst, cmp=...).
Enforcing proper comparisons (<, >, ==, etc) makes sense, but would having
the cmp function live, so to speak, in sorting really be that bad? Just
inform the user in the docs that key is preferred and often faster.

----------
Added file: http://bugs.python.org/file15473/unnamed

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue1771>
_______________________________________
Attachments: unnamed (1.24 KB)


report at bugs

Dec 7, 2009, 10:06 AM

Post #8 of 10 (195 views)
Permalink
[issue1771] Remove cmp parameter to list.sort() and builtin.sorted() [In reply to]

Changes by Guido van Rossum <guido [at] python>:


Removed file: http://bugs.python.org/file15463/unnamed

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue1771>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 10:06 AM

Post #9 of 10 (194 views)
Permalink
[issue1771] Remove cmp parameter to list.sort() and builtin.sorted() [In reply to]

Changes by Guido van Rossum <guido [at] python>:


Removed file: http://bugs.python.org/file15473/unnamed

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue1771>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com


report at bugs

Dec 7, 2009, 10:10 AM

Post #10 of 10 (195 views)
Permalink
[issue1771] Remove cmp parameter to list.sort() and builtin.sorted() [In reply to]

Guido van Rossum <guido [at] python> added the comment:

Python's sort implementation is carefully written to only use the "<"
comparison, ever. So a cmp really isn't the most natural way to specify
a comparison. (This should really be documented somewhere -- I know know
it because Tim Peters & I shared an office while he did most of the work
on Python's sort.)

----------

_______________________________________
Python tracker <report [at] bugs>
<http://bugs.python.org/issue1771>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/list-python-bugs%40lists.gossamer-threads.com

Python bugs RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.