Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

[perl #32331] grep {/PATTERN/} is slow

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


perlbug-followup at perl

Nov 4, 2004, 3:55 PM

Post #1 of 6 (223 views)
Permalink
[perl #32331] grep {/PATTERN/} is slow

# New Ticket Created by abigail [at] abigail
# Please include the string: [perl #32331]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=32331 >



This is a bug report for perl from abigail [at] abigail,
generated with the help of perlbug 1.35 running under perl v5.8.5.


-----------------------------------------------------------------
[Please enter your report here]

I would expect

grep {EXPR} LIST

to be somewhat slower than

grep EXPR, LIST

For instance, the following benchmark shows the block version to
be 5% slower:

use Benchmark qw /cmpthese/;

our @array = 101 .. 200;

cmpthese -2 => {
grep_eq_expr => '$::a1 = grep $_ == 50, @::array',
grep_eq_blck => '$::a2 = grep {$_ == 50} @::array',
};

__END__
Rate grep_eq_blck grep_eq_expr
grep_eq_blck 28902/s -- -5%
grep_eq_expr 30282/s 5% --


If, however, the expression is a regexp, things change drastically:

use Benchmark qw /cmpthese/;

our @array = 101 .. 200;

cmpthese -2 => {
grep_re_expr => '$::b1 = grep /^50$/, @::array',
grep_re_blck => '$::b2 = grep {/^50$/} @::array',
};

__END__
Rate grep_re_blck grep_re_expr
grep_re_blck 6455/s -- -69%
grep_re_expr 20848/s 223% --

This difference in speed surprises me.

It's not the fact that it's a regexp that's run inside a block,
as a similar for loop doesn't show the huge difference:

use Benchmark qw /cmpthese/;

our @array = 101 .. 200;

cmpthese -2 => {
for_eq_expr => '$::c1 = 0; $_ == 50 and $::c1 ++ for @::array',
for_eq_blck => '$::c2 = 0; for (@::array) {$_ == 50 and $::c2 ++}',
};

__END__
Rate for_eq_blck for_eq_expr
for_eq_blck 23640/s -- -5%
for_eq_expr 24992/s 6% --



use Benchmark qw /cmpthese/;

our @array = 101 .. 200;

cmpthese -2 => {
for_re_expr => '$::d1 = 0; /^50$/ and $::d1 ++ for @::array',
for_re_blck => '$::d2 = 0; for (@::array) {/^50$/ and $::d2 ++}',
};

__END__
Rate for_re_blck for_re_expr
for_re_blck 19091/s -- -2%
for_re_expr 19545/s 2% --


In both cases, the block version is slower, but by a small margin.


So, I suspect a bug somewhere.



I'll be leaving for my honeymoon and YAPC::AU tomorrow morning, so it might
be a while before I'll be able to reply to any questions.


Abigail


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
category=core
severity=low
---
Site configuration information for perl v5.8.5:

Configured by abigail at Tue Aug 3 21:00:04 CEST 2004.

Summary of my perl5 (revision 5 version 8 subversion 5) configuration:
Platform:
osname=linux, osvers=2.4.18-bf2.4, archname=i686-linux-64int-ld
uname='linux alexandra 2.4.18-bf2.4 #1 son apr 14 09:53:28 cest 2002 i686 unknown '
config_args='-des -Dusemorebits -Uversiononly -Dmydomain=.abigail.nl -Dcf_email=abigail [at] abigail -Dperladmin=abigail [at] abigail -Doptimize=-g -Dcc=gcc -Dprefix=/opt/perl'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=define use64bitall=undef uselongdouble=define
usemymalloc=n, bincompat5005=undef
Compiler:
cc='gcc', ccflags ='-DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
optimize='-g',
cppflags='-DDEBUGGING -fno-strict-aliasing -pipe -I/usr/local/include'
ccversion='', gccversion='3.0.4', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=12345678
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
ivtype='long long', ivsize=8, nvtype='long double', nvsize=12, Off_t='off_t', lseeksize=8
alignbytes=4, prototype=define
Linker and Libraries:
ld='gcc', ldflags =' -L/usr/local/lib'
libpth=/usr/local/lib /lib /usr/lib
libs=-lnsl -ldl -lm -lcrypt -lutil -lc
perllibs=-lnsl -ldl -lm -lcrypt -lutil -lc
libc=/lib/libc-2.2.5.so, so=so, useshrplib=false, libperl=libperl.a
gnulibc_version='2.2.5'
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E'
cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:
defined-or
no-syntax-warnings

---
@INC for perl v5.8.5:
/home/abigail/Perl
/opt/perl/lib/5.8.5/i686-linux-64int-ld
/opt/perl/lib/5.8.5
/opt/perl/lib/site_perl/5.8.5/i686-linux-64int-ld
/opt/perl/lib/site_perl/5.8.5
/opt/perl/lib/site_perl/5.8.4/i686-linux-64int-ld
/opt/perl/lib/site_perl/5.8.4
/opt/perl/lib/site_perl/5.8.3/i686-linux-64int-ld
/opt/perl/lib/site_perl/5.8.3
/opt/perl/lib/site_perl/5.8.2/i686-linux-64int-ld
/opt/perl/lib/site_perl/5.8.2
/opt/perl/lib/site_perl/5.8.1/i686-linux-64int-ld
/opt/perl/lib/site_perl/5.8.1
/opt/perl/lib/site_perl/5.8.0/i686-linux-64int-ld
/opt/perl/lib/site_perl/5.8.0
/opt/perl/lib/site_perl
.

---
Environment for perl v5.8.5:
HOME=/home/abigail
LANG=C
LANGUAGE (unset)
LD_LIBRARY_PATH=/home/abigail/Lib:/usr/local/lib:/usr/lib:/lib:/usr/X11R6/lib
LOGDIR (unset)
PATH=/home/abigail/Bin:/opt/perl/bin:/usr/local/bin:/usr/local/X11/bin:/usr/bin:/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr/X11R6/bin:/usr/games:/usr/share/texmf/bin:/opt/Acrobat/bin:/opt/java/blackdown/j2sdk1.3.1/bin:/usr/local/games/bin
PERL5LIB=/home/abigail/Perl
PERLDIR=/opt/perl
PERL_BADLANG (unset)
SHELL=/bin/bash


tassilo.von.parseval at rwth-aachen

Nov 10, 2004, 10:51 PM

Post #2 of 6 (216 views)
Permalink
Re: [perl #32331] grep {/PATTERN/} is slow [In reply to]

On Thu, Nov 04, 2004 at 11:55:31PM +0000 abigail [at] abigail (via RT) wrote:
> # New Ticket Created by abigail [at] abigail
> # Please include the string: [perl #32331]
> # in the subject line of all future correspondence about this issue.
> # <URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=32331 >
>
>
>
> This is a bug report for perl from abigail [at] abigail,
> generated with the help of perlbug 1.35 running under perl v5.8.5.
>
>
> -----------------------------------------------------------------
> [Please enter your report here]
>
> I would expect
>
> grep {EXPR} LIST
>
> to be somewhat slower than
>
> grep EXPR, LIST
>
> For instance, the following benchmark shows the block version to
> be 5% slower:
>
> use Benchmark qw /cmpthese/;
>
> our @array = 101 .. 200;
>
> cmpthese -2 => {
> grep_eq_expr => '$::a1 = grep $_ == 50, @::array',
> grep_eq_blck => '$::a2 = grep {$_ == 50} @::array',
> };
>
> __END__
> Rate grep_eq_blck grep_eq_expr
> grep_eq_blck 28902/s -- -5%
> grep_eq_expr 30282/s 5% --
>
>
> If, however, the expression is a regexp, things change drastically:
>
> use Benchmark qw /cmpthese/;
>
> our @array = 101 .. 200;
>
> cmpthese -2 => {
> grep_re_expr => '$::b1 = grep /^50$/, @::array',
> grep_re_blck => '$::b2 = grep {/^50$/} @::array',
> };
>
> __END__
> Rate grep_re_blck grep_re_expr
> grep_re_blck 6455/s -- -69%
> grep_re_expr 20848/s 223% --
>
> This difference in speed surprises me.

[...]

Just for the record: I can confirm this. The problem is still in blead
(tested on #23466).

Tassilo
--
$_=q#",}])!JAPH!qq(tsuJ[{@"tnirp}3..0}_$;//::niam/s~=)]3[))_$-3(rellac(=_$({
pam{rekcahbus})(rekcah{lrePbus})(lreP{rehtonabus})!JAPH!qq(rehtona{tsuJbus#;
$_=reverse,s+(?<=sub).+q#q!'"qq.\t$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval


eda at waniasset

May 1, 2012, 3:48 AM

Post #3 of 6 (135 views)
Permalink
Re: [perl #32331] grep {/PATTERN/} is slow [In reply to]

Some say that

grep { /PATTERN } @x

is better style than

grep /PATTERN/, @x

(Perl Best Practices and perlcritic BuiltinFunctions::RequireBlockGrep)

To keep these people happy it would be good to put in an optimization for the
former code to keep it just as fast as the latter.

If speeding it up is a WONTFIX, then perhaps perlcritic's rules need to change.

--
Ed Avis <eda [at] waniasset>


nick at ccl4

May 1, 2012, 4:09 AM

Post #4 of 6 (142 views)
Permalink
Re: [perl #32331] grep {/PATTERN/} is slow [In reply to]

On Tue, May 01, 2012 at 10:48:53AM +0000, Ed Avis wrote:
> Some say that
>
> grep { /PATTERN } @x
>
> is better style than
>
> grep /PATTERN/, @x
>
> (Perl Best Practices and perlcritic BuiltinFunctions::RequireBlockGrep)
>
> To keep these people happy it would be good to put in an optimization for the
> former code to keep it just as fast as the latter.
>
> If speeding it up is a WONTFIX, then perhaps perlcritic's rules need to change.

It is a safe general-case transformation? Is there any behaviour one can
put into a pattern that would notice the lack of scope exit after each
run? ie the difference at steps 8, 9 and b in the second case:

$ perl -MO=Concise,-exec -e 'grep /PATTERN/, @x'
1 <0> enter
2 <;> nextstate(main 1 -e:1) v:{
3 <0> pushmark s
4 <$> gv(*x) s
5 <1> rv2av[t1] lKM/1
6 <@> grepstart K
7 <|> grepwhile(other->8)[t2] vK
8 </> match(/"PATTERN"/) s/RTIME
goto 7
9 <@> leave[1 ref] vKP/REFC
-e syntax OK

$ perl -MO=Concise,-exec -e 'grep {/PATTERN/} @x'
1 <0> enter
2 <;> nextstate(main 2 -e:1) v:{
3 <0> pushmark s
4 <$> gv(*x) s
5 <1> rv2av[t1] lKM/1
6 <@> grepstart K*
7 <|> grepwhile(other->8)[t2] vK
8 <0> enter s
9 <;> nextstate(main 1 -e:1) v:{
a </> match(/"PATTERN"/) s/RTIME
b <@> leave sKP
goto 7
c <@> leave[1 ref] vKP/REFC
-e syntax OK


The idea of an optimisation seems nice. But I don't know if it's trivially
safe. Or whether it would need to check the pattern for complexity, and
only work on "simple" things.

Nicholas Clark


davem at iabyn

May 1, 2012, 4:13 AM

Post #5 of 6 (131 views)
Permalink
Re: [perl #32331] grep {/PATTERN/} is slow [In reply to]

On Tue, May 01, 2012 at 10:48:53AM +0000, Ed Avis wrote:
> Some say that
>
> grep { /PATTERN } @x
>
> is better style than
>
> grep /PATTERN/, @x
>
> (Perl Best Practices and perlcritic BuiltinFunctions::RequireBlockGrep)
>
> To keep these people happy it would be good to put in an optimization for the
> former code to keep it just as fast as the latter.
>
> If speeding it up is a WONTFIX, then perhaps perlcritic's rules need to change.

It definitely isn't yet a WONTFIX: there's something specifically about
using a regex (as opposed to just == say) that makes it *much* more slower
in a block than would be expected.

Until the reason is diagnosed, we can't really decide what to do about
it (but I'm not intending to look at it just yet).

--
Modern art:
"That's easy, I could have done that!"
"Ah, but you didn't!"


abigail at abigail

May 1, 2012, 4:47 AM

Post #6 of 6 (129 views)
Permalink
Re: [perl #32331] grep {/PATTERN/} is slow [In reply to]

On Tue, May 01, 2012 at 04:14:12AM -0700, Dave Mitchell via RT wrote:
> On Tue, May 01, 2012 at 10:48:53AM +0000, Ed Avis wrote:
> > Some say that
> >
> > grep { /PATTERN } @x
> >
> > is better style than
> >
> > grep /PATTERN/, @x
> >
> > (Perl Best Practices and perlcritic BuiltinFunctions::RequireBlockGrep)
> >
> > To keep these people happy it would be good to put in an optimization for the
> > former code to keep it just as fast as the latter.
> >
> > If speeding it up is a WONTFIX, then perhaps perlcritic's rules need to change.
>
> It definitely isn't yet a WONTFIX: there's something specifically about
> using a regex (as opposed to just == say) that makes it *much* more slower
> in a block than would be expected.

Note that the slowdown isn't a "regexp in a block". It's a
"regexp in a grep block". As the original bug report shows, the
difference between for-expr and for-block are much less (with 5.15.9,
the difference between for-expr and for-block is less than 10% on
my machine).

With map, I get an inbetween value (using 5.15.9):

use Benchmark qw /cmpthese/;

our @array = 1 .. 200;

cmpthese -2 => {
map_re_blck => '$::d1 = map {/^50$/ ? $_ : ()} @::array',
map_re_expr => '$::d2 = map /^50$/ ? $_ : (), @::array',
};

__END__
Rate map_re_blck map_re_expr
map_re_blck 8756/s -- -39%
map_re_expr 14422/s 65% --


Also, if the matches are failing, the difference between grep-block
and grep-expr is bigger (using 5.15.9):

use Benchmark qw /cmpthese/;

our @array1 = 5000 .. 5099;
our @array2 = 4000 .. 4099;

cmpthese -2 => {
grep_re_blck1 => '$::d1 = grep {/^50/} @::array1',
grep_re_expr1 => '$::d2 = grep /^50/, @::array1',
grep_re_blck2 => '$::d1 = grep {/^50/} @::array2',
grep_re_expr2 => '$::d2 = grep /^50/, @::array2',
};

__END__
Rate grep_re_blck1 grep_re_blck2 grep_re_expr1 grep_re_expr2
grep_re_blck1 5879/s -- -32% -51% -81%
grep_re_blck2 8635/s 47% -- -28% -72%
grep_re_expr1 12041/s 105% 39% -- -61%
grep_re_expr2 30629/s 421% 255% 154% --

>
> Until the reason is diagnosed, we can't really decide what to do about
> it (but I'm not intending to look at it just yet).
>


The bug is quite old (I've known about it for longer than the bug report
is old -- it is as least as old as 5.005), so there's no urgency. OTOH,
"grep {/PATTERN/}" is a often used idiom.



Abigail

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.