Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Which is faster?

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


aorfanakos at gmail

Jan 26, 2005, 6:40 PM

Post #1 of 18 (1569 views)
Permalink
Which is faster?

Any idea which of the following is faster?

'a/b/c/'[:-1]

or

'a/b/c/'.rstrip('/')

Thanks in advance.

P.S. I could time it but I thought of trying my luck here first, in
case someone knows already, and of course the reason.

--
http://mail.python.org/mailman/listinfo/python-list


tdelaney at avaya

Jan 26, 2005, 6:52 PM

Post #2 of 18 (1505 views)
Permalink
RE: Which is faster? [In reply to]

Aggelos I. Orfanakos wrote:

> Any idea which of the following is faster?
>
> 'a/b/c/'[:-1]
>
> or
>
> 'a/b/c/'.rstrip('/')
>
> Thanks in advance.
>
> P.S. I could time it but I thought of trying my luck here first, in
> case someone knows already, and of course the reason.

First, it almost certainly doesn't matter. Use the one that is
self-documenting.

Secondly, time it yourself. It's incredibly easy. There's a module
written especially for doing this - and surprise surprise, it's called
"timeit.py" and it's in the <python install location>/Lib directory.

Tim Delaney
--
http://mail.python.org/mailman/listinfo/python-list


t-meyer at ihug

Jan 26, 2005, 6:52 PM

Post #3 of 18 (1504 views)
Permalink
RE: Which is faster? [In reply to]

> Any idea which of the following is faster?
>
> 'a/b/c/'[:-1]
>
> or
>
> 'a/b/c/'.rstrip('/')
>
> Thanks in advance.
>
> P.S. I could time it but I thought of trying my luck here
> first, in case someone knows already, and of course the reason.

Timing it is almost no work, though:

>>> import timeit
>>> timeit.Timer("'a/b/c/'[:-1]").timeit()
0.23856551600831949
>>> timeit.Timer("'a/b/c/'.rstrip('/')").timeit()
0.74258655778876914

The obvious reason for the former to be faster (although I haven't checked
to see whether it's true) is because the latter checks for a specific
character, whereas the former just snips it off whatever it is.

=Tony.Meyer

--
http://mail.python.org/mailman/listinfo/python-list


aldo at nullcube

Jan 26, 2005, 6:58 PM

Post #4 of 18 (1507 views)
Permalink
Re: Which is faster? [In reply to]

Thus spake Aggelos I. Orfanakos (aorfanakos [at] gmail):

> Any idea which of the following is faster?
>
> 'a/b/c/'[:-1]
>
> or
>
> 'a/b/c/'.rstrip('/')
>
> Thanks in advance.
>
> P.S. I could time it but I thought of trying my luck here
> first, in case someone knows already, and of course the
> reason.

Expecting other people to do something simply because you
couldn't be bothered to do it yourself is not polite... That
said, here are the timings on my system:


> python ./timeit.py "'a/b/c/'[:-1]"
1000000 loops, best of 3: 0.511 usec per loop


> python ./timeit.py "'a/b/c/'.rstrip('/')"
1000000 loops, best of 3: 1.3 usec per loop


As you can see, this suggests that the list access method is
quicker. This is to be expected, since the two methods don't
do the same thing - rstrip will return a copy of your string
with any number of trailing '/'es removed. If there aren't
any, it will return the string as-is. The string access
method will always chop exactly one character off the end.
Even though the results for your specific input are the
same, rstrip is a more complex, and therefore slower, beast.



Cheers,



Aldo




--
Aldo Cortesi
aldo [at] nullcube
http://www.nullcube.com
Off: (02) 9283 1131
Mob: 0419 492 863
--
http://mail.python.org/mailman/listinfo/python-list


aorfanakos at gmail

Jan 26, 2005, 7:07 PM

Post #5 of 18 (1512 views)
Permalink
Re: Which is faster? [In reply to]

Yes, I could do the timing myself. Sorry if this was impolite -- it was
not in my intentions. The main reason I asked was about the reason.
Thanks.

--
http://mail.python.org/mailman/listinfo/python-list


tjreedy at udel

Jan 26, 2005, 9:44 PM

Post #6 of 18 (1523 views)
Permalink
Re: Which is faster? [In reply to]

"Aggelos I. Orfanakos" <aorfanakos [at] gmail> wrote in message
news:1106793602.782397.61260 [at] z14g2000cwz
> Any idea which of the following is faster?
>
> 'a/b/c/'[:-1]
> 'a/b/c/'.rstrip('/')

I find the first easier to read and mentally process. Others may have a
different answer. But perhaps you meant with the CPython 2.x
implementation ;-)

> P.S. I could time it but I thought of trying my luck here first, in
> case someone knows already, and of course the reason.

For more on the CPython (2.2) reason, consider

>>> def f1(s): return s[:-1]
...
>>> def f2(s): return s.rstrip('/')
...
>>> import dis
>>> dis.dis(f1)
0 SET_LINENO 1

3 SET_LINENO 1
6 LOAD_FAST 0 (s)
9 LOAD_CONST 1 (-1)
12 SLICE+2
13 RETURN_VALUE
14 LOAD_CONST 0 (None)
17 RETURN_VALUE
>>> dis.dis(f2)
0 SET_LINENO 1

3 SET_LINENO 1
6 LOAD_FAST 0 (s)
9 LOAD_ATTR 1 (rstrip)
12 LOAD_CONST 1 ('/')
15 CALL_FUNCTION 1
18 RETURN_VALUE
19 LOAD_CONST 0 (None)

The second has a load attribute (via dict lookup) that the first does not.
More important, the second has a generic function call versus a specific
byte-coded slice call. The rstrip will also do a slice after it determines
the endpoint of the slice.

Terry J. Reedy




--
http://mail.python.org/mailman/listinfo/python-list


__peter__ at web

Jan 27, 2005, 12:17 AM

Post #7 of 18 (1511 views)
Permalink
Re: Which is faster? [In reply to]

Aggelos I. Orfanakos wrote:

> Any idea which of the following is faster?
>
> 'a/b/c/'[:-1]
>
> or
>
> 'a/b/c/'.rstrip('/')

Don't ask for the speed, decide whether you want to transform

"a/b/c" --> "a/b/c"
"a/b/c//" --> "a/b/c"

or

"a/b/c" --> "a/b/"
"a/b/c//" --> "a/b/c/"

That is much more important.

Peter

--
http://mail.python.org/mailman/listinfo/python-list


fredrik at pythonware

Aug 29, 2008, 10:57 PM

Post #8 of 18 (1247 views)
Permalink
Re: Which is faster? [In reply to]

cnb wrote:

> For a big nbr of it might matter?
> Is av_grade O(n*2) and the first O(n) when it comes to adding or is
> "sum x for x in y" just traversing the list ones, accumulating the
> values, it doesnt first build the list and then travese it for sum?

sum() doesn't build a list, but even if it would do that, it'd still be
O(n) -- looping over an array twice doesn't change the time complexity.

to find out which one's actually faster, benchmark them.

</F>

--
http://mail.python.org/mailman/listinfo/python-list


gagsl-py2 at yahoo

Aug 29, 2008, 11:01 PM

Post #9 of 18 (1241 views)
Permalink
Re: Which is faster? [In reply to]

En Sat, 30 Aug 2008 01:26:35 -0300, cnb <circularfunc [at] yahoo> escribi�:

> For a big nbr of it might matter?
> Is av_grade O(n*2) and the first O(n) when it comes to adding or is
> "sum x for x in y" just traversing the list ones, accumulating the
> values, it doesnt first build the list and then travese it for sum?

AFAIK both versions are O(n).
sum(x for x in y) contains a "generator expression" [1], it does not
create an intermediate list.
sum([x for x in y]) contains a "list comprehension" [2] and it does create
an intermediate list.

I'd say the version using sum() should be faster, because the iteration is
implemented inside the interpreter, not in Python code as the other one.
But you should actually measure their relative performance; use the timeit
module for that.

> def averageGrade(self):
> tot = 0
> for review in self.reviews:
> tot += review.grade
> return tot / len(self.reviews)
>
> def av_grade(self):
> return sum(review.grade for review in self.reviews) / \
> len(self.reviews)

[1] http://docs.python.org/tut/node11.html#SECTION00111100000000000000000
[2] http://docs.python.org/tut/node7.html
--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


steve at REMOVE-THIS-cybersource

Aug 29, 2008, 11:15 PM

Post #10 of 18 (1244 views)
Permalink
Re: Which is faster? [In reply to]

On Fri, 29 Aug 2008 21:26:35 -0700, cnb wrote:

> def averageGrade(self):
> tot = 0
> for review in self.reviews:
> tot += review.grade
> return tot / len(self.reviews)
>
> def av_grade(self):
> return sum(review.grade for review in self.reviews) / \
> len(self.reviews)

Re-writing the functions so they can be tested alone:

def averageGrade(alist):
tot = 0.0
for x in alist:
tot += x
return tot/len(alist)


def av_grade(alist):
return sum(alist)/len(alist)


>>> from timeit import Timer
>>> # small amount of items
... alist = range(100)
>>> Timer('averageGrade(alist)',
... 'from __main__ import alist, averageGrade').repeat(number=100000)
[3.9559240341186523, 3.4910569190979004, 3.4856188297271729]
>>>
>>> Timer('av_grade(alist)',
... 'from __main__ import alist, av_grade').repeat(number=100000)
[2.0255107879638672, 1.0968310832977295, 1.0733180046081543]


The version with sum() is much faster. How about with lots of data?

>>> alist = xrange(1000000)
>>> Timer('averageGrade(alist)',
... 'from __main__ import alist, averageGrade').repeat(number=50)
[17.699107885360718, 18.182793140411377, 18.651514053344727]
>>>
>>> Timer('av_grade(alist)',
... 'from __main__ import alist, av_grade').repeat(number=50)
[17.125216007232666, 15.72636890411377, 16.309713840484619]

sum() is still a little faster.



--
Steven
--
http://mail.python.org/mailman/listinfo/python-list


gagsl-py2 at yahoo

Aug 30, 2008, 12:11 AM

Post #11 of 18 (1244 views)
Permalink
Re: Which is faster? [In reply to]

En Sat, 30 Aug 2008 03:15:30 -0300, Steven D'Aprano
<steve [at] remove-this-cybersource> escribi�:

> On Fri, 29 Aug 2008 21:26:35 -0700, cnb wrote:
>
>> def averageGrade(self):
>> tot = 0
>> for review in self.reviews:
>> tot += review.grade
>> return tot / len(self.reviews)
>>
>> def av_grade(self):
>> return sum(review.grade for review in self.reviews) / \
>> len(self.reviews)
>
> Re-writing the functions so they can be tested alone:
>
> def averageGrade(alist):
> tot = 0.0
> for x in alist:
> tot += x
> return tot/len(alist)
>
>
> def av_grade(alist):
> return sum(alist)/len(alist)
>
>
>>>> from timeit import Timer
>>>> # small amount of items
> ... alist = range(100)
>>>> Timer('averageGrade(alist)',
> ... 'from __main__ import alist, averageGrade').repeat(number=100000)
> [3.9559240341186523, 3.4910569190979004, 3.4856188297271729]
>>>>
>>>> Timer('av_grade(alist)',
> ... 'from __main__ import alist, av_grade').repeat(number=100000)
> [2.0255107879638672, 1.0968310832977295, 1.0733180046081543]
>
>
> The version with sum() is much faster. How about with lots of data?
>
>>>> alist = xrange(1000000)
>>>> Timer('averageGrade(alist)',
> ... 'from __main__ import alist, averageGrade').repeat(number=50)
> [17.699107885360718, 18.182793140411377, 18.651514053344727]
>>>>
>>>> Timer('av_grade(alist)',
> ... 'from __main__ import alist, av_grade').repeat(number=50)
> [17.125216007232666, 15.72636890411377, 16.309713840484619]
>
> sum() is still a little faster.

Mmm, in this last test you're measuring the long integer operations
performance (because the sum exceeds largely what can be represented in a
plain integer). Long integers are so slow that the difference between both
loops becomes negligible.

I've tried again using float values:
alist = [float(x) for x in xrange(1000000)]
and got consistent results for any input size (the version using sum() is
about twice as fast as the for loop)

--
Gabriel Genellina

--
http://mail.python.org/mailman/listinfo/python-list


circularfunc at yahoo

Aug 30, 2008, 3:30 AM

Post #12 of 18 (1228 views)
Permalink
Re: Which is faster? [In reply to]

how does doing something twice not change complexity? yes it maybe
belongs to the same complexity-class but is still twice as slow no?
--
http://mail.python.org/mailman/listinfo/python-list


fredrik at pythonware

Aug 30, 2008, 4:15 AM

Post #13 of 18 (1226 views)
Permalink
Re: Which is faster? [In reply to]

cnb wrote:

> how does doing something twice not change complexity? yes it maybe
> belongs to the same complexity-class but is still twice as slow no?

doing two things quickly can still be faster than doing one thing slowly.

</F>

--
http://mail.python.org/mailman/listinfo/python-list


deets at nospam

Aug 30, 2008, 4:53 AM

Post #14 of 18 (1233 views)
Permalink
Re: Which is faster? [In reply to]

cnb schrieb:
> how does doing something twice not change complexity? yes it maybe
> belongs to the same complexity-class but is still twice as slow no?

Because big O notation is not about constant factors. Or even subterms
with lower powers.

http://en.wikipedia.org/wiki/Big_O_notation

Diez
--
http://mail.python.org/mailman/listinfo/python-list


Lie.1296 at gmail

Aug 30, 2008, 10:01 AM

Post #15 of 18 (1226 views)
Permalink
Re: Which is faster? [In reply to]

On Aug 30, 5:30 pm, cnb <circularf...@yahoo.se> wrote:
> how does doing something twice not change complexity? yes it maybe
> belongs to the same complexity-class but is still twice as slow no?

Who is doing something twice? Definitely not sum().

sum() does not create intermediate list, and if you pass generator
expressions in it you wouldn't make any intermediate list at all, thus
simple looping but in interpreter code. sum() that is passed a list
comprehension should be faster for extremely small numbers of values
to sum, but we don't care about small things, do we? But using
intermediate list should be fast enough for even large numbers of
values, the time when it can't cope anymore would be when the
intermediate list takes half of your memory (how often is that for
regular applications?).
--
http://mail.python.org/mailman/listinfo/python-list


fredrik at pythonware

Aug 30, 2008, 10:21 AM

Post #16 of 18 (1227 views)
Permalink
Re: Which is faster? [In reply to]

Lie wrote:

>> how does doing something twice not change complexity? yes it maybe
>> belongs to the same complexity-class but is still twice as slow no?
>
> Who is doing something twice? Definitely not sum().

nobody's claiming that -- but as mentioned above, even if sum() had done
two passes over the source sequence, it'd still be O(n).

</F>

--
http://mail.python.org/mailman/listinfo/python-list


python at rcn

Aug 30, 2008, 3:33 PM

Post #17 of 18 (1232 views)
Permalink
Re: Which is faster? [In reply to]

On Aug 29, 9:26 pm, cnb <circularf...@yahoo.se> wrote:
> def av_grade(self):
>      return sum(review.grade for review in self.reviews) / \
>               len(self.reviews)

Minor point. Consider making the divisor: float(len(self.reviews)).
It would be a bummer to throw-off the average because of floor
division.

Also consider an itertools approach:
sum(itertools.imap(operator.itemgetter('review'), self.reviews))

BTW, the sum() in Py2.6 is *much* faster than before. It should run
circles around any other approach.

As Effbot says, if you really care about speed, then just time both
approaches. However, it's a good run of thumb that builtins are hard
to beat by simulating them in pure python (unless you're using Psyco
as an optimizer).


Raymond
--
http://mail.python.org/mailman/listinfo/python-list


steve at REMOVE-THIS-cybersource

Aug 30, 2008, 5:41 PM

Post #18 of 18 (1217 views)
Permalink
Re: Which is faster? [In reply to]

On Sat, 30 Aug 2008 04:11:33 -0300, Gabriel Genellina wrote:

> Mmm, in this last test you're measuring the long integer operations
> performance (because the sum exceeds largely what can be represented in
> a plain integer). Long integers are so slow that the difference between
> both loops becomes negligible.

Nicely caught, and thank you for pointing that out.


--
Steven
--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.