Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

preferring [] or () in list of error codes?

 

 

First page Previous page 1 2 Next page Last page  View All Python python RSS feed   Index | Next | Previous | View Threaded


mh at pixar

Jun 8, 2009, 2:36 PM

Post #1 of 35 (6060 views)
Permalink
preferring [] or () in list of error codes?

Is there any reason to prefer one or the other of these statements?

if e.message.code in [25401,25402,25408]:
if e.message.code in (25401,25402,25408):

I'm currently using [], but only coz I think it's prettier
than ().

context: these are database errors and e is database exception,
so there's probably been zillions of instructions and io's
handling that already.

Many TIA!
Mark

--
Mark Harrison
Pixar Animation Studios
--
http://mail.python.org/mailman/listinfo/python-list


apt.shansen at gmail

Jun 8, 2009, 2:56 PM

Post #2 of 35 (5963 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Mon, Jun 8, 2009 at 2:36 PM, <mh [at] pixar> wrote:

> Is there any reason to prefer one or the other of these statements?
>
> if e.message.code in [25401,25402,25408]:
> if e.message.code in (25401,25402,25408):
>
> I'm currently using [], but only coz I think it's prettier
> than ().


I like to use tuples / () if the sequence literal is ultimately static.
Purely because in my mind that just makes it a little more clear-- a list is
mutable, so I use it when it should be or may be mutated; if it never would,
I use a tuple. It just seems clearer to me that way.

But a tuple also takes up a little space in memory, so it's a bit more
efficient that way. I have absolutely no idea if reading / checking for
contents in a list vs tuple has any performance difference, but would
suspect it'd be tiny (and probably irrelevant in a small case like that),
but still.

--S


Scott.Daniels at Acm

Jun 8, 2009, 3:15 PM

Post #3 of 35 (5972 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

mh [at] pixar wrote:
> Is there any reason to prefer one or the other of these statements?
> if e.message.code in [25401,25402,25408]:
> if e.message.code in (25401,25402,25408):
> I'm currently using [], but only coz I think it's prettier
> than ().
> context: these are database errors and e is database exception,
> so there's probably been zillions of instructions and io's
> handling that already.

I lightly prefer the (a, b, c) -- you do put spaces after the comma,
don't you? A tuple can be kept as a constant, but it requires (not
very heavy) program analysis to determine that the list need not be
constructed each time the statement is executed. In addition, a
tuple is allocated as a single block, while a list is a pair of
allocations.

The cost is tiny, however, and your sense of aesthetics is part of
your code. So unless you only very slightly prefer brackets, if I
were you I'd go with the list form.

--Scott David Daniels
Scott.Daniels [at] Acm
--
http://mail.python.org/mailman/listinfo/python-list


sjmachin at lexicon

Jun 8, 2009, 4:28 PM

Post #4 of 35 (5967 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

<mh <at> pixar.com> writes:

>
> Is there any reason to prefer one or the other of these statements?
>
> if e.message.code in [25401,25402,25408]:
> if e.message.code in (25401,25402,25408):
>

>From the viewpoint of relative execution speed, in the above case
if it matters at all it matters only on Python 2.4 AFAICT:

| >>> L=lambda x:x in[25401,25402,25408];
T=lambda x:x in(25401,25402,25408);import dis;dis.dis(L);dis.dis(T)
1 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (25401)
6 LOAD_CONST 2 (25402)
9 LOAD_CONST 3 (25408)
12 BUILD_LIST 3
15 COMPARE_OP 6 (in)
18 RETURN_VALUE
1 0 LOAD_FAST 0 (x)
3 LOAD_CONST 4 ((25401, 25402, 25408))
6 COMPARE_OP 6 (in)
9 RETURN_VALUE

Earlier versions build the list or tuple at run time
(as for the list above); later versions detect that
the list can't be mutated and generate the same code
for both the list and tuple.

However there are limits to the analysis that can be
performed e.g. if the list is passed to a function,
pursuit halts at the county line:

[Python 2.6.2]
| >>> F=lambda y,z:y in z;L=lambda x:F(x,[25401,25402,25408]);
T=lambda x:F(x,(25401,25402,25408));import dis;dis.dis(L);dis.dis(T)
1 0 LOAD_GLOBAL 0 (F)
3 LOAD_FAST 0 (x)
6 LOAD_CONST 0 (25401)
9 LOAD_CONST 1 (25402)
12 LOAD_CONST 2 (25408)
15 BUILD_LIST 3
18 CALL_FUNCTION 2
21 RETURN_VALUE
1 0 LOAD_GLOBAL 0 (F)
3 LOAD_FAST 0 (x)
6 LOAD_CONST 3 ((25401, 25402, 25408))
9 CALL_FUNCTION 2
12 RETURN_VALUE

So in general anywhere I had a "list constant" I'd make
it a tuple -- I'm not aware of any way that performance
gets worse by doing that, and it can get better.

Background: I'm supporting packages that run on 2.1 to 2.6
in one case and 2.4 to 2.6 in the other; every little
unobtrusive tweak helps :-)

HTH,
John

--
http://mail.python.org/mailman/listinfo/python-list


ben+python at benfinney

Jun 8, 2009, 4:43 PM

Post #5 of 35 (5972 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

mh [at] pixar writes:

> Is there any reason to prefer one or the other of these statements?
>
> if e.message.code in [25401,25402,25408]:
> if e.message.code in (25401,25402,25408):
>
> I'm currently using [], but only coz I think it's prettier
> than ().

Use a list when the semantic meaning of an item doesn't depend on all
the other items: it's “only” a collection of values.

Your list of message codes is a good example: if a value appears at
index 3, that doesn't make it mean something different from the same
value appearing at index 2.


Use a tuple when the semantic meaning of the items are bound together,
and it makes more sense to speak of all the items as a single structured
value.

The classic examples are point coordinates and timestamps: rather than a
collection of values, it makes more sense to think of each coordinate
set or timestamp as a single complex value. The value 7 appearing at
index 2 would have a completely different meaning from the value 7
appearing at index 3.


James Tauber explains this at
<URL:http://jtauber.com/blog/2006/04/15/python_tuples_are_not_just_constant_lists/>.

--
\ “Pinky, are you pondering what I'm pondering?” “Well, I think |
`\ so, Brain, but pantyhose are so uncomfortable in the |
_o__) summertime.” —_Pinky and The Brain_ |
Ben Finney
--
http://mail.python.org/mailman/listinfo/python-list


pavlovevidence at gmail

Jun 8, 2009, 5:37 PM

Post #6 of 35 (5969 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Jun 8, 4:43pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> m...@pixar.com writes:
> > Is there any reason to prefer one or the other of these statements?
>
> > if e.message.code in [25401,25402,25408]:
> > if e.message.code in (25401,25402,25408):
>
> > I'm currently using [], but only coz I think it's prettier
> > than ().
>
> Use a list when the semantic meaning of an item doesn't depend on all
> the other items: it's only a collection of values.
>
> Your list of message codes is a good example: if a value appears at
> index 3, that doesn't make it mean something different from the same
> value appearing at index 2.
>
> Use a tuple when the semantic meaning of the items are bound together,
> and it makes more sense to speak of all the items as a single structured
> value.

If you want to go strictly by the book, I would say he ought to be
using a set since his collection of numbers has no meaningful order
nor does it make sense to list any item twice.

I don't think it's very important, however, to stick to rules like
that for objects that don't live for more than a single line of code.


Carl Banks
--
http://mail.python.org/mailman/listinfo/python-list


ben+python at benfinney

Jun 8, 2009, 6:02 PM

Post #7 of 35 (5959 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

Carl Banks <pavlovevidence [at] gmail> writes:

> If you want to go strictly by the book, I would say he ought to be
> using a set since his collection of numbers has no meaningful order
> nor does it make sense to list any item twice.

Yes, a set would be best for this specific situation.

> I don't think it's very important, however, to stick to rules like
> that for objects that don't live for more than a single line of code.

It's important to the extent that it's important to express one's
*meaning*. Program code should be written primarily as a means of
communicating with other programmers, and only incidentally for the
computer to execute.

--
\ “Laurie got offended that I used the word ‘puke’. But to me, |
`\ that's what her dinner tasted like.” —Jack Handey |
_o__) |
Ben Finney
--
http://mail.python.org/mailman/listinfo/python-list


pavlovevidence at gmail

Jun 8, 2009, 6:28 PM

Post #8 of 35 (5950 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Jun 8, 6:02pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> Carl Banks <pavlovevide...@gmail.com> writes:
> > If you want to go strictly by the book, I would say he ought to be
> > using a set since his collection of numbers has no meaningful order
> > nor does it make sense to list any item twice.
>
> Yes, a set would be best for this specific situation.
>
> > I don't think it's very important, however, to stick to rules like
> > that for objects that don't live for more than a single line of code.
>
> It's important to the extent that it's important to express one's
> *meaning*. Program code should be written primarily as a means of
> communicating with other programmers, and only incidentally for the
> computer to execute.

Which is precisely why isn't not very important for an object that
exists for one line. No programmer is ever going to be confused about
the meaning of this:

if a in (1,2,3):


Carl Banks
--
http://mail.python.org/mailman/listinfo/python-list


charles at declareSub

Jun 8, 2009, 6:56 PM

Post #9 of 35 (5945 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Jun 8, 2009, at 9:28 PM, Carl Banks wrote:

> On Jun 8, 6:02 pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
>> Carl Banks <pavlovevide...@gmail.com> writes:
>>> If you want to go strictly by the book, I would say he ought to be
>>> using a set since his collection of numbers has no meaningful order
>>> nor does it make sense to list any item twice.
>>
>> Yes, a set would be best for this specific situation.
>>
>>> I don't think it's very important, however, to stick to rules like
>>> that for objects that don't live for more than a single line of
>>> code.
>>
>> It's important to the extent that it's important to express one's
>> *meaning*. Program code should be written primarily as a means of
>> communicating with other programmers, and only incidentally for the
>> computer to execute.
>
> Which is precisely why isn't not very important for an object that
> exists for one line. No programmer is ever going to be confused about
> the meaning of this:
>
> if a in (1,2,3):
>

Actually, I might be -- I think of a tuple first as a single thing, as
opposed to a list or map, which I see first as a collection of other
things.

Charles Yeomans
--
http://mail.python.org/mailman/listinfo/python-list


samwyse at gmail

Jun 8, 2009, 6:57 PM

Post #10 of 35 (5953 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Jun 8, 7:37pm, Carl Banks <pavlovevide...@gmail.com> wrote:
> On Jun 8, 4:43pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> > m...@pixar.com writes:
> > > Is there any reason to prefer one or the other of these statements?
>
> > > if e.message.code in [25401,25402,25408]:
> > > if e.message.code in (25401,25402,25408):
>
> If you want to go strictly by the book, I would say he ought to be
> using a set since his collection of numbers has no meaningful order
> nor does it make sense to list any item twice.

As the length of the list increases, the increased speeds of looking
something up makes using a set makes more sense. But what's the best
way to express this? Here are a few more comparisons (using Python
3.0)...

>>> S=lambda x:x in set((25401,25402,25408))
>>> dis(S)
1 0 LOAD_FAST 0 (x)
3 LOAD_GLOBAL 0 (set)
6 LOAD_CONST 3 ((25401, 25402, 25408))
9 CALL_FUNCTION 1
12 COMPARE_OP 6 (in)
15 RETURN_VALUE
>>> S=lambda x:x in{25401,25402,25408}
>>> dis(S)
1 0 LOAD_FAST 0 (x)
3 LOAD_CONST 0 (25401)
6 LOAD_CONST 1 (25402)
9 LOAD_CONST 2 (25408)
12 BUILD_SET 3
15 COMPARE_OP 6 (in)
18 RETURN_VALUE
>>> S=lambda x:x in{(25401,25402,25408)}
>>> dis(S)
1 0 LOAD_FAST 0 (x)
3 LOAD_CONST 3 ((25401, 25402, 25408))
6 BUILD_SET 1
9 COMPARE_OP 6 (in)
12 RETURN_VALUE

I conclude that using constructors is generally a bad idea, since the
compiler doesn't know if you're calling the builtin or something with
an overloaded name. I presume that the compiler will eventually
optimize the second example to match the last, but both of them use
the BUILD_SET opcode. I expect that this can be expensive for long
lists, so I don't think that it's a good idea to use set constants
inside loops. Instead it should be assigned to a global or class
variable.
--
http://mail.python.org/mailman/listinfo/python-list


clp2 at rebertia

Jun 8, 2009, 8:06 PM

Post #11 of 35 (5996 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Mon, Jun 8, 2009 at 6:57 PM, samwyse<samwyse [at] gmail> wrote:
> On Jun 8, 7:37 pm, Carl Banks <pavlovevide...@gmail.com> wrote:
>> On Jun 8, 4:43 pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
>> > m...@pixar.com writes:
>> > > Is there any reason to prefer one or the other of these statements?
>>
>> > >         if e.message.code in [25401,25402,25408]:
>> > >         if e.message.code in (25401,25402,25408):
>>
>> If you want to go strictly by the book, I would say he ought to be
>> using a set since his collection of numbers has no meaningful order
>> nor does it make sense to list any item twice.
>
> As the length of the list increases, the increased speeds of looking
> something up makes using a set makes more sense.  But what's the best
> way to express this?  Here are a few more comparisons (using Python
> 3.0)...
>
>>>> S=lambda x:x in set((25401,25402,25408))
>>>> dis(S)
>  1           0 LOAD_FAST                0 (x)
>              3 LOAD_GLOBAL              0 (set)
>              6 LOAD_CONST               3 ((25401, 25402, 25408))
>              9 CALL_FUNCTION            1
>             12 COMPARE_OP               6 (in)
>             15 RETURN_VALUE
>>>> S=lambda x:x in{25401,25402,25408}
>>>> dis(S)
>  1           0 LOAD_FAST                0 (x)
>              3 LOAD_CONST               0 (25401)
>              6 LOAD_CONST               1 (25402)
>              9 LOAD_CONST               2 (25408)
>             12 BUILD_SET                3
>             15 COMPARE_OP               6 (in)
>             18 RETURN_VALUE
>>>> S=lambda x:x in{(25401,25402,25408)}
>>>> dis(S)
>  1           0 LOAD_FAST                0 (x)
>              3 LOAD_CONST               3 ((25401, 25402, 25408))
>              6 BUILD_SET                1
>              9 COMPARE_OP               6 (in)
>             12 RETURN_VALUE
>
> I conclude that using constructors is generally a bad idea, since the
> compiler doesn't know if you're calling the builtin or something with
> an overloaded name.  I presume that the compiler will eventually
> optimize the second example to match the last, but both of them use
> the BUILD_SET opcode.  I expect that this can be expensive for long

Erm, unless I misunderstand you somehow, the second example will and
should *never* match the last.
The set {25401,25402,25408}, containing 3 integer elements, is quite
distinct from the set {(25401,25402,25408)}, containing one element
and that element is a tuple.
set(X) != {X}; set([X]) = {X}

Cheers,
Chris
--
http://blog.rebertia.com
--
http://mail.python.org/mailman/listinfo/python-list


steven at REMOVE

Jun 8, 2009, 8:18 PM

Post #12 of 35 (5960 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Tue, 09 Jun 2009 11:02:54 +1000, Ben Finney wrote:

> Carl Banks <pavlovevidence [at] gmail> writes:
>
>> If you want to go strictly by the book, I would say he ought to be
>> using a set since his collection of numbers has no meaningful order nor
>> does it make sense to list any item twice.
>
> Yes, a set would be best for this specific situation.
>
>> I don't think it's very important, however, to stick to rules like that
>> for objects that don't live for more than a single line of code.
>
> It's important to the extent that it's important to express one's
> *meaning*. Program code should be written primarily as a means of
> communicating with other programmers, and only incidentally for the
> computer to execute.


But practicality beats purity -- there are many scenarios where we make
compromises in our meaning in order to get correct, efficient code. E.g.
we use floats, despite them being a poor substitute for the abstract Real
numbers we mean.

In addition, using a tuple or a list in this context:

if e.message.code in (25401,25402,25408):

is so idiomatic, that using a set in it's place would be distracting.
Rather that efficiently communicating the programmer's intention, it
would raise in my mind the question "that's strange, why are they using a
set there instead of a tuple?".


--
Steven
--
http://mail.python.org/mailman/listinfo/python-list


ben+python at benfinney

Jun 8, 2009, 8:43 PM

Post #13 of 35 (5952 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

Steven D'Aprano <steven [at] REMOVE> writes:

> In addition, using a tuple or a list in this context:
>
> if e.message.code in (25401,25402,25408):
>
> is so idiomatic, that using a set in it's place would be distracting.

I think a list in that context is fine, and that's the idiom I see far
more often than a tuple.

> Rather that efficiently communicating the programmer's intention, it
> would raise in my mind the question "that's strange, why are they
> using a set there instead of a tuple?".

The fact that literal set syntax is a relative newcomer is the primary
reason for that, I'd wager.

--
\ “If you are unable to leave your room, expose yourself in the |
`\ window.” —instructions in case of fire, hotel, Finland |
_o__) |
Ben Finney
--
http://mail.python.org/mailman/listinfo/python-list


emile at fenx

Jun 8, 2009, 10:30 PM

Post #14 of 35 (5947 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On 6/8/2009 8:43 PM Ben Finney said...
> Steven D'Aprano <steven [at] REMOVE> writes:
>
>> In addition, using a tuple or a list in this context:
>>
>> if e.message.code in (25401,25402,25408):
>>
>> is so idiomatic, that using a set in it's place would be distracting.
>
> I think a list in that context is fine, and that's the idiom I see far
> more often than a tuple.
>
>> Rather that efficiently communicating the programmer's intention, it
>> would raise in my mind the question "that's strange, why are they
>> using a set there instead of a tuple?".
>
> The fact that literal set syntax is a relative newcomer is the primary
> reason for that, I'd wager.
>

Well, no. It really is more, "that's odd... why use set?"

Emile

--
http://mail.python.org/mailman/listinfo/python-list


steven at REMOVE

Jun 9, 2009, 1:02 AM

Post #15 of 35 (5954 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Tue, 09 Jun 2009 09:43:45 +1000, Ben Finney wrote:

> Use a list when the semantic meaning of an item doesn't depend on all
> the other items: it's “only” a collection of values.
>
> Your list of message codes is a good example: if a value appears at
> index 3, that doesn't make it mean something different from the same
> value appearing at index 2.

That advice would seem to imply that lists shouldn't be ordered. If a
list of values has an order, it implies that "first place" (index 0) is
different from "second place", by virtue of the positions they appear in
the list. The lists:

presidential_candidates_sorted_by_votes = ['Obama', 'McCain']
presidential_candidates_sorted_by_votes = ['McCain', 'Obama']

have very different meanings. Prohibiting the use of lists in the context
of ordered data is surely is an unfortunate consequence of your advice.


> James Tauber explains this at
> <URL:http://jtauber.com/blog/2006/04/15/
> python_tuples_are_not_just_constant_lists/>.


He doesn't really explain anything though, he merely states it as
revealed wisdom. The closest he comes to an explanation is to declare
that in tuples "the index in a tuple has an implied semantic. The point
of a tuple is that the i-th slot means something specific. In other
words, it's a index-based (rather than name based) datastructure." But he
gives no reason for why we should accept that as true for tuples but not
lists.

It may be that that's precisely the motivation Guido had when he
introduced tuples into Python, but why should we not overload tuples with
more meanings than Guido (hypothetically) imagined? In other words, why
*shouldn't* we treat tuples as immutable lists, if that helps us solve a
problem effectively?

To put it another way, I think the question of whether or not tuples are
immutable lists has the answer Mu. Sometimes they are, sometimes they're
not. I have no problem with the title of the quoted blog post -- that
tuples are not *just* constant lists -- but I do dispute that there is
any reason for declaring that tuples must not be used as constant lists.
As tuples are defined in Python, they quack like immutable lists, they
walk like immutable lists, and they swim like immutable lists. Why
shouldn't we treat them as immutable lists?

Phillip Eby states that "Lists are intended to be homogeneous sequences,
while tuples are heterogeneous data structures." (Notice the subtle shift
there: lists are "intended", while tuples "are". But in fact, there's
nothing to stop you from putting homogeneous data into a tuple, so Eby is
wrong to say that tuples *are* heterogeneous.)

Perhaps Eby intends lists to be homogeneous, perhaps Guido does too, but
this is Python, where we vigorously defend the right to shoot ourselves
in the foot. We strongly discourage class creators from trying to enforce
their intentions by using private attributes, and even when we allow such
a thing, the nature of Python is that nothing is truly private. Why
should homogeneity and heterogeneity of lists and tuples be sacrosanct?
Nothing stops me from putting hetereogeneous data into a list, or
homogeneous data into a tuple, and there doesn't appear to be any ill-
effects from doing so. Why give lose sleep over the alleged lack of
purity?



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list


samwyse at gmail

Jun 9, 2009, 4:47 AM

Post #16 of 35 (5946 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Jun 8, 10:06pm, Chris Rebert <c...@rebertia.com> wrote:
> On Mon, Jun 8, 2009 at 6:57 PM, samwyse<samw...@gmail.com> wrote:
> > On Jun 8, 7:37pm, Carl Banks <pavlovevide...@gmail.com> wrote:
> >> On Jun 8, 4:43pm, Ben Finney <ben+pyt...@benfinney.id.au> wrote:
> >> > m...@pixar.com writes:
> >> > > Is there any reason to prefer one or the other of these statements?
>
> >> > > if e.message.code in [25401,25402,25408]:
> >> > > if e.message.code in (25401,25402,25408):
>
> >> If you want to go strictly by the book, I would say he ought to be
> >> using a set since his collection of numbers has no meaningful order
> >> nor does it make sense to list any item twice.
>
> > As the length of the list increases, the increased speeds of looking
> > something up makes using a set makes more sense. But what's the best
> > way to express this? Here are a few more comparisons (using Python
> > 3.0)...
>
> >>>> S=lambda x:x in set((25401,25402,25408))
> >>>> dis(S)
> > 1 0 LOAD_FAST 0 (x)
> > 3 LOAD_GLOBAL 0 (set)
> > 6 LOAD_CONST 3 ((25401, 25402, 25408))
> > 9 CALL_FUNCTION 1
> > 12 COMPARE_OP 6 (in)
> > 15 RETURN_VALUE
> >>>> S=lambda x:x in{25401,25402,25408}
> >>>> dis(S)
> > 1 0 LOAD_FAST 0 (x)
> > 3 LOAD_CONST 0 (25401)
> > 6 LOAD_CONST 1 (25402)
> > 9 LOAD_CONST 2 (25408)
> > 12 BUILD_SET 3
> > 15 COMPARE_OP 6 (in)
> > 18 RETURN_VALUE
> >>>> S=lambda x:x in{(25401,25402,25408)}
> >>>> dis(S)
> > 1 0 LOAD_FAST 0 (x)
> > 3 LOAD_CONST 3 ((25401, 25402, 25408))
> > 6 BUILD_SET 1
> > 9 COMPARE_OP 6 (in)
> > 12 RETURN_VALUE
>
> > I conclude that using constructors is generally a bad idea, since the
> > compiler doesn't know if you're calling the builtin or something with
> > an overloaded name. I presume that the compiler will eventually
> > optimize the second example to match the last, but both of them use
> > the BUILD_SET opcode. I expect that this can be expensive for long
>
> Erm, unless I misunderstand you somehow, the second example will and
> should *never* match the last.
> The set {25401,25402,25408}, containing 3 integer elements, is quite
> distinct from the set {(25401,25402,25408)}, containing one element
> and that element is a tuple.
> set(X) != {X}; set([X]) = {X}

D'oh! I was thinking about how you can initialize a set from an
iterator and for some reason thought that you could do the same with a
set constant.
--
http://mail.python.org/mailman/listinfo/python-list


samwyse at gmail

Jun 9, 2009, 4:57 AM

Post #17 of 35 (5951 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Jun 8, 8:57pm, samwyse <samw...@gmail.com> wrote:
> I conclude that using constructors is generally a bad idea, since the
> compiler doesn't know if you're calling the builtin or something with
> an overloaded name. I presume that the compiler will eventually
> optimize the second example to match the last, but both of them use
> the BUILD_SET opcode. I expect that this can be expensive for long
> lists, so I don't think that it's a good idea to use set constants
> inside loops. Instead it should be assigned to a global or class
> variable.

Time to test things! I'm going to compare three things using Python
3.0:
X={...}\nS=lambda x: x in X
S=lambda x: x in {...}
S=lambda x: x in (...)
where the ... is replaced by lists of integers of various lengths.
Here's the test bed:

from random import seed, sample
from timeit import Timer
maxint = 2**31-1
values = list(map(lambda n: 2**n-1, range(1,16)))
def append_numbers(k, setup):
seed(1968740928)
for i in sample(range(maxint), k):
setup.append(str(i))
setup.append(',')
print('==', 'separate set constant')
for n in values[::2]:
print('===', n, 'values')
setup = ['X={']
append_numbers(n, setup)
setup.append('}\nS=lambda x: x in X')
t = Timer('S(88632719)', ''.join(setup))
print(t.repeat())
print('==', 'in-line set constant')
for n in values[:4]:
print('===', n, 'values')
setup = ['S=lambda x: x in {']
append_numbers(n, setup)
setup.append('}')
t = Timer('S(88632719)', ''.join(setup))
print(t.repeat())
print('==', 'in-line list constant')
for n in values:
print('===', n, 'values')
setup = ['S=lambda x: x in (']
append_numbers(n, setup)
setup.append(')')
t = Timer('S(88632719)', ''.join(setup))
print(t.repeat())

And here are the results. There's something interesting at the very
end.

== separate set constant
=== 1 values
[0.26937306277753176, 0.26113626173158877, 0.26000092190487889]
=== 7 values
[0.26583266867716426, 0.27223543774418268, 0.27681646689732919]
=== 31 values
[0.25089725090758752, 0.25562690230182894, 0.25844625504079444]
=== 127 values
[0.32404313956103392, 0.33048948958596691, 0.34487930728626104]
=== 511 values
[0.27574566041214732, 0.26991838348169983, 0.28309016928129083]
=== 2047 values
[0.27826162263639631, 0.27337357122204065, 0.26888752620793976]
=== 8191 values
[0.27479134917985437, 0.27955955295994261, 0.27740676538498654]
=== 32767 values
[0.26189725230441319, 0.25949247739587022, 0.2537356004743625]
== in-line set constant
=== 1 values
[0.43579086168772818, 0.4231755711968983, 0.42178740594125852]
=== 3 values
[0.54712875519095228, 0.55325048295244272, 0.54346991028189251]
=== 7 values
[1.1897654590178366, 1.1763383335032813, 1.2009900699669931]
=== 15 values
[1.7661906750718313, 1.7585005915556291, 1.7405896559478933]
== in-line list constant
=== 1 values
[0.23651385860493335, 0.24746972031361381, 0.23778469051234197]
=== 3 values
[0.23710750947396875, 0.23205630883254713, 0.23345592805789295]
=== 7 values
[0.24607764394636789, 0.23551903943099006, 0.24241377046524093]
=== 15 values
[0.2279376289444599, 0.22491908887861456, 0.24076747184349045]
=== 31 values
[0.22860084172708994, 0.233022074034551, 0.23138639128715965]
=== 63 values
[0.23671639831319169, 0.23404259479906031, 0.22269394573891077]
=== 127 values
[0.22754176857673158, 0.22818151468971593, 0.22711154629987718]
=== 255 values
[0.23503126794047802, 0.24493699618247788, 0.26690207833677349]
=== 511 values
[0.24518255811842238, 0.23878118587697728, 0.22844830837438934]
=== 1023 values
[0.23285585179122137, 0.24067220833932623, 0.23807439213642922]
=== 2047 values
[0.24206484343680756, 0.24352201187581102, 0.24366253252857462]
=== 4095 values
[0.24624526301527183, 0.23692145230748807, 0.23829956041899081]
=== 8191 values
[0.22246514570986164, 0.22435309515595137, 0.22220114567633331]
=== 16383 values
[194.29462683106374, 193.21789529116128, 193.25843228678508]
=== 32767 values

You will note that testing against a list constant is just as fast as
testing against a set. This was surprising for me; apparently the
__contains__ operator turns a tuple into a set. You will also note
that performance to fall off drastically for the last set of values.
I'm not sure what happens there; I guess I'll file a bug report.
--
http://mail.python.org/mailman/listinfo/python-list


ben+python at benfinney

Jun 9, 2009, 5:56 AM

Post #18 of 35 (5942 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

Steven D'Aprano <steven [at] REMOVE> writes:

> On Tue, 09 Jun 2009 09:43:45 +1000, Ben Finney wrote:
>
> > Use a list when the semantic meaning of an item doesn't depend on
> > all the other items: it's “only” a collection of values.
> >
> > Your list of message codes is a good example: if a value appears at
> > index 3, that doesn't make it mean something different from the same
> > value appearing at index 2.
>
> That advice would seem to imply that lists shouldn't be ordered.

No such implication. Order is important in a list, it just doesn't
change the semantic meaning of the value.

> If a list of values has an order, it implies that "first place" (index
> 0) is different from "second place", by virtue of the positions they
> appear in the list. The lists:
>
> presidential_candidates_sorted_by_votes = ['Obama', 'McCain']
> presidential_candidates_sorted_by_votes = ['McCain', 'Obama']
>
> have very different meanings.

But the semantic meaning if each value is unchanged: each is still a
presidential candidate's surname. The additional semantic meaning of
putting it in a list is no more than the position in the sequence. A
list is the right choice, for that reason.


Whereas, for example, in this tuple:

dodge_city = (1781, 1870, 1823)
(population, feet_above_sea_level, establishment_year) = dodge_city

each index in the sequence implies something very different about each
value. The semantic meaning of each index is *more* than just the
position in the sequence; it matters *for interpreting that component*,
and that component would not mean the same thing in a different index
position. A tuple is the right choice, for that reason.

--
\ “Are you pondering what I'm pondering?” “Umm, I think so, Don |
`\ Cerebro, but, umm, why would Sophia Loren do a musical?” |
_o__) —_Pinky and The Brain_ |
Ben Finney
--
http://mail.python.org/mailman/listinfo/python-list


samwyse at gmail

Jun 9, 2009, 8:20 AM

Post #19 of 35 (5918 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Jun 9, 12:30am, Emile van Sebille <em...@fenx.com> wrote:
> On 6/8/2009 8:43 PM Ben Finney said...

> > The fact that literal set syntax is a relative newcomer is the primary
> > reason for that, I'd wager.
>
> Well, no. It really is more, "that's odd... why use set?"

Until I ran some timing tests this morning, I'd have said that sets
could determine membership faster than a list, but that's apparently
not true, assuming that the list has less than 8K members. Above 16K
members, sets are much faster than lists. I'm not sure where the
break is, or even why there's a break.
--
http://mail.python.org/mailman/listinfo/python-list


pavlovevidence at gmail

Jun 9, 2009, 9:16 AM

Post #20 of 35 (5927 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Jun 9, 4:57am, samwyse <samw...@gmail.com> wrote:
> On Jun 8, 8:57pm, samwyse <samw...@gmail.com> wrote:
>
> > I conclude that using constructors is generally a bad idea, since the
> > compiler doesn't know if you're calling the builtin or something with
> > an overloaded name. I presume that the compiler will eventually
> > optimize the second example to match the last, but both of them use
> > the BUILD_SET opcode. I expect that this can be expensive for long
> > lists, so I don't think that it's a good idea to use set constants
> > inside loops. Instead it should be assigned to a global or class
> > variable.
>
> Time to test things! I'm going to compare three things using Python
> 3.0:
> X={...}\nS=lambda x: x in X
> S=lambda x: x in {...}
> S=lambda x: x in (...)
> where the ... is replaced by lists of integers of various lengths.
> Here's the test bed:
>
> from random import seed, sample
> from timeit import Timer
> maxint = 2**31-1
> values = list(map(lambda n: 2**n-1, range(1,16)))
> def append_numbers(k, setup):
> seed(1968740928)
> for i in sample(range(maxint), k):
> setup.append(str(i))
> setup.append(',')
> print('==', 'separate set constant')
> for n in values[::2]:
> print('===', n, 'values')
> setup = ['X={']
> append_numbers(n, setup)
> setup.append('}\nS=lambda x: x in X')
> t = Timer('S(88632719)', ''.join(setup))
> print(t.repeat())
> print('==', 'in-line set constant')
> for n in values[:4]:
> print('===', n, 'values')
> setup = ['S=lambda x: x in {']
> append_numbers(n, setup)
> setup.append('}')
> t = Timer('S(88632719)', ''.join(setup))
> print(t.repeat())
> print('==', 'in-line list constant')
> for n in values:
> print('===', n, 'values')
> setup = ['S=lambda x: x in (']
> append_numbers(n, setup)
> setup.append(')')
> t = Timer('S(88632719)', ''.join(setup))
> print(t.repeat())

It looks like you are evaluating the list/set/tuple every pass, and
then, for lists and tuples, always indexing the first item.


> And here are the results. There's something interesting at the very
> end.
[.snip results showing virtually identical performance for list, set,
and tuple]
> You will note that testing against a list constant is just as fast as
> testing against a set. This was surprising for me; apparently the
> __contains__ operator turns a tuple into a set.

Given the way you wrote the test it this is hardly surprising.

I would expect "item in list" to have comparable execution time to
"item in set" if item is always the first element in list.

Furthermore, the Python compiler appears to be optimizing this
specific case to always use a precompiled set. Well, almost
always....


>You will also note
> that performance to fall off drastically for the last set of values.
> I'm not sure what happens there; I guess I'll file a bug report.

Please don't; it's not a bug. The slowdown is because at sizes above
a certain threshold the Python compiler doesn't try to precompile in-
line lists, sets, and tuples. The last case was above that limit.


Carl Banks
--
http://mail.python.org/mailman/listinfo/python-list


pavlovevidence at gmail

Jun 9, 2009, 9:21 AM

Post #21 of 35 (5923 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Jun 9, 8:20am, samwyse <samw...@gmail.com> wrote:
> On Jun 9, 12:30am, Emile van Sebille <em...@fenx.com> wrote:
>
> > On 6/8/2009 8:43 PM Ben Finney said...
> > > The fact that literal set syntax is a relative newcomer is the primary
> > > reason for that, I'd wager.
>
> > Well, no. It really is more, "that's odd... why use set?"
>
> Until I ran some timing tests this morning, I'd have said that sets
> could determine membership faster than a list, but that's apparently
> not true,

See my reply to that post. I believe your tests were flawed.

> assuming that the list has less than 8K members. Above 16K
> members, sets are much faster than lists. I'm not sure where the
> break is, or even why there's a break.

The break comes from the compiler, not the objects themselves.


Carl Banks
--
http://mail.python.org/mailman/listinfo/python-list


mh at pixar

Jun 9, 2009, 9:40 AM

Post #22 of 35 (5919 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

John Machin <sjmachin [at] lexicon> wrote:
> T=lambda x:x in(25401,25402,25408);import dis;dis.dis(L);dis.dis(T)

I've learned a lot from this thread, but this is the
niftiest bit I've picked up... thanks!

--
Mark Harrison
Pixar Animation Studios
--
http://mail.python.org/mailman/listinfo/python-list


tjreedy at udel

Jun 9, 2009, 11:29 AM

Post #23 of 35 (5915 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

Steven D'Aprano wrote:

>> James Tauber explains this at
>> <URL:http://jtauber.com/blog/2006/04/15/
>> python_tuples_are_not_just_constant_lists/>.
>
>
> He doesn't really explain anything though, he merely states it as
> revealed wisdom. The closest he comes to an explanation is to declare
> that in tuples "the index in a tuple has an implied semantic. The point
> of a tuple is that the i-th slot means something specific. In other
> words, it's a index-based (rather than name based) datastructure." But he
> gives no reason for why we should accept that as true for tuples but not
> lists.
>
> It may be that that's precisely the motivation Guido had when he
> introduced tuples into Python, but why should we not overload tuples with
> more meanings than Guido (hypothetically) imagined? In other words, why
> *shouldn't* we treat tuples as immutable lists, if that helps us solve a
> problem effectively?

I believe that we should overload tuples with *less* specific meaning
than originally. In 3.0, tuples have *all* the general sequence
operations and methods, including .index() and .count(). This was not
true in 2.5 (don't know about 2.6), which is why tuples are yet not
documented as having those two methods (reported in
http://bugs.python.org/issue4966
). Operationally, they are now general immutable sequences. Period.

Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list


tjreedy at udel

Jun 9, 2009, 11:32 AM

Post #24 of 35 (5920 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

mh [at] pixar wrote:
> John Machin <sjmachin [at] lexicon> wrote:
>> T=lambda x:x in(25401,25402,25408);import dis;dis.dis(L);dis.dis(T)
>
> I've learned a lot from this thread, but this is the
> niftiest bit I've picked up... thanks!

If you are doing a lot of dissing, starting with
from dis import dis
saves subsequent typing.

tjr


--
http://mail.python.org/mailman/listinfo/python-list


steve at REMOVE-THIS-cybersource

Jun 9, 2009, 4:40 PM

Post #25 of 35 (5914 views)
Permalink
Re: preferring [] or () in list of error codes? [In reply to]

On Tue, 09 Jun 2009 04:57:48 -0700, samwyse wrote:

> Time to test things! I'm going to compare three things using Python
> 3.0:
> X={...}\nS=lambda x: x in X
> S=lambda x: x in {...}
> S=lambda x: x in (...)
> where the ... is replaced by lists of integers of various lengths.
> Here's the test bed:

[snip]

Hmmm... I think your test-bed is unnecessarily complicated, making it
difficult to see what is going on. Here's my version, with lists included
for completeness. Each test prints the best of five trials of one million
repetitions of ten successful searches, then does the same thing again
for unsuccessful searches.


from timeit import Timer

def test(size):
global s, l, t, targets
print("Testing search with size %d" % size)
rng = range(size)
s, l, t = set(rng), list(rng), tuple(rng)
# Calculate a (more or less) evenly distributed set of ten
# targets to search for, including both end points.
targets = [i*size//9 for i in range(9)] + [size-1]
assert len(targets) == 10
setup = "from __main__ import targets, %s"
body = "for i in targets: i in %s"
# Run a series of successful searches.
for name in "s l t".split():
obj = globals()[name]
secs = min(Timer(body % name, setup % name).repeat(repeat=5))
print("Successful search in %s: %f s" % (type(obj), secs))
# Also run unsuccessful tests.
targets = [size+x for x in targets]
for name in "s l t".split():
obj = globals()[name]
secs = min(Timer(body % name, setup % name).repeat(repeat=5))
print("Unsuccessful search in %s: %f s" % (type(obj), secs))

Results are:

>>> test(1)
Testing search with size 1
Successful search in <class 'set'>: 1.949509 s
Successful search in <class 'list'>: 1.838387 s
Successful search in <class 'tuple'>: 1.876309 s
Unsuccessful search in <class 'set'>: 1.998207 s
Unsuccessful search in <class 'list'>: 2.148660 s
Unsuccessful search in <class 'tuple'>: 2.137041 s
>>>
>>>
>>> test(10)
Testing search with size 10
Successful search in <class 'set'>: 1.943664 s
Successful search in <class 'list'>: 3.659786 s
Successful search in <class 'tuple'>: 3.569164 s
Unsuccessful search in <class 'set'>: 1.935553 s
Unsuccessful search in <class 'list'>: 5.833665 s
Unsuccessful search in <class 'tuple'>: 5.573177 s
>>>
>>>
>>> test(100)
Testing search with size 100
Successful search in <class 'set'>: 1.907839 s
Successful search in <class 'list'>: 21.704032 s
Successful search in <class 'tuple'>: 21.391875 s
Unsuccessful search in <class 'set'>: 1.916241 s
Unsuccessful search in <class 'list'>: 41.178029 s
Unsuccessful search in <class 'tuple'>: 41.856226 s
>>>
>>>
>>> test(1000)
Testing search with size 1000
Successful search in <class 'set'>: 2.256150 s
Successful search in <class 'list'>: 189.991579 s
Successful search in <class 'tuple'>: 187.349630 s
Unsuccessful search in <class 'set'>: 1.869202 s
Unsuccessful search in <class 'list'>: 398.451284 s
Unsuccessful search in <class 'tuple'>: 388.544178 s




As expected, lists and tuples are equally as fast (or slow if you
prefer). Successful searches are about twice as fast as unsuccessful
ones, and performance suffers as the size of the list/tuple increases.
However, sets are nearly just as fast no matter the size of the set, or
whether the search is successfully or unsuccessful.



> You will note that testing against a list constant is just as fast as
> testing against a set. This was surprising for me; apparently the
> __contains__ operator turns a tuple into a set.

I doubt that very much.


--
Steven
--
http://mail.python.org/mailman/listinfo/python-list

First page Previous page 1 2 Next page Last page  View All Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.