Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Perl: porters

More results from llvm-gcc

 

 

Perl porters RSS feed   Index | Next | Previous | View Threaded


nothingmuch at woobling

Jun 3, 2008, 7:17 PM

Post #1 of 6 (333 views)
Permalink
More results from llvm-gcc

Hola,

A while ago Claes compiled Perl 5.6 with llvm-gcc and got some
performance improvements. This is a continuation of that effort
using the 5.10 source tree.

Executive summary: this gives even better results than just plain
llvm-gcc, and theoretically opens up the way for much more. Results
are PerlBench improvements of 15% over standard gcc compilation,
using llvm's optimizing linker.


ANd now for the details:


llvm-gcc is basically gcc 4.2 with the backend switched to use
llvm's native code generation, instead of gcc's.

Normally running

llvm-gcc -o foo.o foo.c

generates native code, which is then linked by the normal linker.

However, if you run it as

llvm-gcc -emit-llvm -o foo.o foo.c

then foo.o is not a real object file, but an llvm bytecode file.

This file is then linkable with llvm-ld, allowing interprocedural
optimizations.

The result of linking perl like this

llvm-ld -native -O5 -o perl blah blah blah

is an executable that is on average 15-20% faster as measured by
PerlBench on my machine, than the perl I compiled with gcc 4.0 and
use normally.

I blame this 10% improvement over plain llvm-gcc (without linking
bytecode, but native .o files) to LLVM's extensive link time
optimziations.

When linking without -native the perl executable is actually a
shell script that runs lli on perl.bc. This has a very slow startup
(about 3.5 seconds) but after that it's just as fast and sometimes
faster than the -native executable. Unfortunately it cannot used
with the -e command line option (filter_del emits an error about
removing fitlers). I haven't debugged this yet.

In order for dynamic loading of modules to work llvm-ld has to be
told to -disable-internalize (basically it needs to keep all the
external symbols still available for dynamic linking) and then the
Perl test suite passes except for one error relating to sdbm (output
below). Without this fix the results are faster, but of course the
test suite fails when loading XS code. I suppose whatever it takes
to build a static perl could fix this but i haven't actually tried.

Apple's iPhone SDK ships with an llvm-gcc that does the linking part
automatically but exhibits some breakage. I filed a bug report, and
once they fix it theoretically you could get the same speed
improvements by using llvm-gcc -O4 and changing nothing else.

The steps to run this are replacing ld and cc with the attached
script, and making sure that ar is llvm-ar. I couldn't get this to
work consistently without editing config.sh myself (Configure didn't
respect changing ar or ld, i don't know what the right solution is).

And now for the bad news:

ext/SDBM_File/t/sdbm..........................................perl(83924)
malloc: *** error for object 0x200f07: Non-aligned pointer being
freed
*** set a breakpoint in malloc_error_break to debug
Use of uninitialized value $Dfile in stat at
../ext/SDBM_File/t/sdbm.t line 47.
Use of uninitialized value $mode in bitwise and (&) at
../ext/SDBM_File/t/sdbm.t line 49.
perl(83924) malloc: *** error for object 0x200f67: Non-aligned
pointer being freed
*** set a breakpoint in malloc_error_break to debug

I havne't looked into this yet. This repeats with several other DBM
related tests, but other then that the whole test suite passes.

The iphone sdk llvm-gcc -O4 compilation exhibits a few other test
fails.

And lastly, the future directions:

I hope to embed LLVM's bytecode loading and JIT support in the perl
executable, and patch XSLoader and DynaLoader to support loading of
LLVM bytecode, allowing LLVM based XS modules (could be interesting
for PAR like efforts, not just for optimization), and to retain the
bytecode output of compiling Perl itself so that it's also available
for the JIT.

When I have the opcode definitions (pp_*) available as llvm bytecode
functions I want to try and emit very naive threaded bytecode from
the optree on a per subroutine basis, and transforming these
subroutines to XSUBs with the function pointer returned from the
llvm JIT.

For example, the body of sub { $x + 3 } would become similar to the
definition of:

/* PL_op == cv->START; the nextstate op*/
PL_op = pp_nextstate(aTHX);
PL_op = pp_padsv(aTHX);
PL_op = pp_const(aTHX);
PL_op = pp_add(aTHX);
return pp_leave(aTHX);

assuming that all the op->pp_addr == PL_pp_addr[op->type]. Hopefully
LLVM will be able to perform interprocedural optimizations between
the defintions of the various pp_*.

After that is in place the bytecode emitter can be extended, by
refactoring pp_* into smaller, non stack based functions, that are
not as reliant on the global environment, so that the above code can
actually become more like:

SV tmp1 = opcode_padsv(aTHX_, pad_op);
stack_push(opcode_add(tmp1, const_op_sv);
free_tmp(tmp1); /* free if it has a PV */
PL_op = next_op;

Allowing simple ops to avoid the overhead of pushing/popping data on
the stack, mortalizing, etc.

Lastly, I hope to base this emitter on Runops::Trace's recently
added features to get trace caching like compilation for just the
hotpath, to avoid unnecessary JIT optimization of seldom used
optrees.

Cheers,
Yuval

P.S. my new favourite command is make clean -j50 (yes, fifty).

--
Yuval Kogman <nothingmuch[at]woobling.org>
http://nothingmuch.woobling.org 0xEBD27418
Attachments: llvm-gcc-bc (2.77 KB)


nick at ccl4

Jun 4, 2008, 9:46 AM

Post #2 of 6 (302 views)
Permalink
Re: More results from llvm-gcc [In reply to]

On Wed, Jun 04, 2008 at 05:17:39AM +0300, Yuval Kogman wrote:

> llvm-gcc is basically gcc 4.2 with the backend switched to use
> llvm's native code generation, instead of gcc's.

> is an executable that is on average 15-20% faster as measured by
> PerlBench on my machine, than the perl I compiled with gcc 4.0 and
> use normally.

Interesting. Useful

> I blame this 10% improvement over plain llvm-gcc (without linking
> bytecode, but native .o files) to LLVM's extensive link time
> optimziations.

But you're also comparing 4.2 with 4.0. What's llvm-gcc like against 4.2?

Nicholas Clark


nothingmuch at woobling

Jun 4, 2008, 10:13 AM

Post #3 of 6 (302 views)
Permalink
Re: More results from llvm-gcc [In reply to]

On Wed, Jun 04, 2008 at 17:46:15 +0100, Nicholas Clark wrote:

> But you're also comparing 4.2 with 4.0. What's llvm-gcc like against 4.2?

Given that no GCC optimizations are performed by llvm-gcc it doesn't
matter that much, but the results are:

system gcc-4.2 llvmbc
------ ------- ------
arith/mixed 100 82 119
arith/trig 100 88 103
array/copy 100 96 130
array/foreach 100 83 134
array/index 100 96 134
array/pop 100 98 125
array/shift 100 95 130
array/sort-num 100 100 103
array/sort 100 96 109
call/0arg 100 81 148
call/1arg 100 79 133
call/2arg 100 86 133
call/9arg 100 89 132
call/empty 100 67 139
call/fib 100 87 130
call/method 100 86 126
call/wantarray 100 93 125
hash/copy 100 87 112
hash/each 100 97 127
hash/foreach-sort 100 94 106
hash/foreach 100 85 126
hash/get 100 76 83
hash/set 100 82 108
loop/for-c 100 81 95
loop/for-range-const 100 113 118
loop/for-range 100 111 120
loop/getline 100 94 106
loop/while-my 100 82 96
loop/while 100 85 85
re/const 100 98 99
re/w 100 106 117
startup/fewmod 100 102 105
startup/lotsofsub 100 100 103
startup/noprog 100 108 104
string/base64 100 105 115
string/htmlparser 100 90 104
string/index-const 100 99 109
string/index-var 100 84 114
string/ipol 100 86 83
string/tr 100 97 96

AVERAGE 100 92 115

--
Yuval Kogman <nothingmuch[at]woobling.org>
http://nothingmuch.woobling.org 0xEBD27418


nothingmuch at woobling

Jun 4, 2008, 10:20 AM

Post #4 of 6 (301 views)
Permalink
Re: More results from llvm-gcc [In reply to]

gcc-4.0 gcc-4.2 llvmbcllvmnodl
------- ------- --------------
arith/mixed 100 82 119 119
arith/trig 100 88 103 108
array/copy 100 96 130 130
array/foreach 100 83 134 127
array/index 100 96 134 140
array/pop 100 98 125 125
array/shift 100 95 130 125
array/sort-num 100 100 103 100
array/sort 100 96 109 112
call/0arg 100 81 140 140
call/1arg 100 79 133 127
call/2arg 100 86 133 133
call/9arg 100 89 132 132
call/empty 100 69 139 123
call/fib 100 87 130 137
call/method 100 86 126 126
call/wantarray 100 93 125 132
hash/copy 100 96 125 118
hash/each 100 100 127 133
hash/foreach-sort 100 94 106 118
hash/foreach 100 85 126 126
hash/get 100 76 83 104
hash/set 100 82 108 112
loop/for-c 100 81 95 140
loop/for-range-const 100 113 124 153
loop/for-range 100 111 120 158
loop/getline 100 94 110 118
loop/while-my 100 79 96 121
loop/while 100 85 85 115
re/const 100 98 99 99
re/w 100 106 117 122
startup/fewmod 100 100 105 86
startup/lotsofsub 100 100 103 107
startup/noprog 100 102 103 105
string/base64 100 105 115 115
string/htmlparser 100 90 104 104
string/index-const 100 99 109 109
string/index-var 100 88 118 124
string/ipol 100 86 83 98
string/tr 100 97 96 100

AVERAGE 100 92 115 121

This is all 5.10, ./Configure -de, no threads or mult

llvmnodl = without -disable-internalize, so it can't DynaLoad stuff. Still
interesting though.

--
Yuval Kogman <nothingmuch[at]woobling.org>
http://nothingmuch.woobling.org 0xEBD27418


steve at fisharerojo

Jun 4, 2008, 10:27 AM

Post #5 of 6 (301 views)
Permalink
Re: More results from llvm-gcc [In reply to]

On Wed, Jun 4, 2008 at 12:13 PM, Yuval Kogman <nothingmuch[at]woobling.org> wrote:
> On Wed, Jun 04, 2008 at 17:46:15 +0100, Nicholas Clark wrote:
>
>> But you're also comparing 4.2 with 4.0. What's llvm-gcc like against 4.2?
>
> Given that no GCC optimizations are performed by llvm-gcc it doesn't
> matter that much, but the results are:
>
> system gcc-4.2 llvmbc
> ------ ------- ------
> arith/mixed 100 82 119
> arith/trig 100 88 103
> array/copy 100 96 130
> array/foreach 100 83 134
> array/index 100 96 134
> array/pop 100 98 125
> array/shift 100 95 130
> array/sort-num 100 100 103
> array/sort 100 96 109
> call/0arg 100 81 148
> call/1arg 100 79 133
> call/2arg 100 86 133
> call/9arg 100 89 132
> call/empty 100 67 139
> call/fib 100 87 130
> call/method 100 86 126
> call/wantarray 100 93 125
> hash/copy 100 87 112
> hash/each 100 97 127
> hash/foreach-sort 100 94 106
> hash/foreach 100 85 126
> hash/get 100 76 83
> hash/set 100 82 108
> loop/for-c 100 81 95
> loop/for-range-const 100 113 118
> loop/for-range 100 111 120
> loop/getline 100 94 106
> loop/while-my 100 82 96
> loop/while 100 85 85
> re/const 100 98 99
> re/w 100 106 117
> startup/fewmod 100 102 105
> startup/lotsofsub 100 100 103
> startup/noprog 100 108 104
> string/base64 100 105 115
> string/htmlparser 100 90 104
> string/index-const 100 99 109
> string/index-var 100 84 114
> string/ipol 100 86 83
> string/tr 100 97 96
>
> AVERAGE 100 92 115
>

I would be curious to see how Intel C++ and Sun Studio on Linux are
doing in comparison. Based on my previous smokes, they usually beat
the pants off of gcc.

Steve Peters
steve[at]fisharerojo.org


nothingmuch at woobling

Jun 4, 2008, 12:17 PM

Post #6 of 6 (301 views)
Permalink
Re: More results from llvm-gcc [In reply to]

On Wed, Jun 04, 2008 at 12:27:43 -0500, Steve Peters wrote:

> I would be curious to see how Intel C++ and Sun Studio on Linux are
> doing in comparison. Based on my previous smokes, they usually beat
> the pants off of gcc.

I would be interested to see the results but actually my motive is
getting llvm embedded in Perl, and I don't have either platform, so
I'm going to bow out at this stage ;-)

I'd be happy to help getting llvm-gcc working on linux though.

--
Yuval Kogman <nothingmuch[at]woobling.org>
http://nothingmuch.woobling.org 0xEBD27418

Perl porters RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.