stefan_ml at behnel
Aug 4, 2012, 12:06 PM
Post #23 of 43
Paul Rubin, 04.08.2012 20:18:
Re: On-topic: alternate Python implementations
[In reply to]
> Stefan Behnel writes:
>>> C is pretty poor as a compiler target: how would you translate Python
>>> generators into C, for example?
>> Depends. If you have CPython available, that'd be a straight forward
>> extension type.
> Calling CPython hardly counts as compiling Python into C.
CPython is written in C, though. So anything that CPython does can be done
in C. It's not like the CPython project used a completely unusual way of
writing C code.
Besides, I find your above statement questionable. You will always need
some kind of runtime infrastructure when you "compile Python into C", so
you can just as well use CPython for that instead of reimplementing it
completely from scratch. Both Cython and Nuitka do exactly that, and one of
the major advantages of that approach is that they can freely interact with
arbitrary code (Python or not) that was written for CPython, regardless of
its native dependencies. What good would it be to throw all of that away,
just for the sake of having "pure C code generation"?
>> For the yielding, you can use labels and goto. Given that you generate
>> the code, that's pretty straight forward as well.
> You're going to compile the whole Python program into a single C
> function so that you can do gotos inside of it? What happens if the
> program imports a generator?
No, you are going to compile only the generator function into a function
that uses gotos, maybe with an additional in-out struct parameter that
holds its state. Then, on entry, you read the label (or its ID) from the
previous state, reset local variables and jump to the label. On exit, you
store the state back end return. Cython does it that way. Totally straight
forward, as I said.
>>> How would you handle garbage collection?
>> CPython does it automatically for us at least.
> You mean you're going to have all the same INCREF/DECREF stuff on every
> operation in compiled data? Ugh.
If you don't like that, you can experiment with anything from a dedicated
GC to transactional memory.
>> Lacking that, you'd use one of the available garbage collection
> What implementations would those be? There's the Boehm GC which is
> useful for some purposes but not really suitable at large scale, from
> what I can tell. Is there something else?
No idea - I'll look it up when I need one. Last I heard, PyPy had a couple
of GCs to choose from, but I don't know how closely the are tied into its
>> or provide none at all.
> You're going to let the program just leak memory until it crashes??
Well, it's not like CPython leaks memory until it crashes, now does it? And
it's written in C. So there must be ways to handle this also in C.
Remember that CPython didn't even have a GC before something around 2.0,
IIRC. That worked quite ok in most cases and simply left the tricky cases
to the programmers. It really depends on what your requirements are. Small
embedded systems, time critical code and real-time systems are often much
better off without garbage collection. It's pure convenience, after all.
>> you shouldn't expect too much of a performance gain from what the
>> platform gives you for the underlying implementation. It can optimise
>> the emulator, but it won't see enough of the Python code to make
>> anything efficient out of it. Jython is an example for that.
> Compare that to the performance gain of LuaJIT and it starts to look
> like something is wrong with that approach, or maybe some issue inherent
> in Python itself.
Huh? LuaJIT is a reimplementation of Lua that uses an optimising JIT
compiler specifically for Lua code. How is that similar to the Jython
runtime that runs *on top of* the JVM with its generic byte code based JIT
Basically, LuaJIT's JIT compiler works at the same level as the one in
PyPy, which is why both can theoretically provide the same level of
>> You can get pretty far with static code analysis, optimistic
>> optimisations and code specialisation.
> It seems very hard to do reasonable optimizations in the presence of
> standard Python techniques like dynamically poking class instance
> attributes. I guess some optimizations are still possible, like storing
> attributes named as literals in the program in fixed slots, saving some
> dictionary lookups even though the slot contents would have to still be
Sure. Even when targeting the CPython runtime with the generated C code
(like Cython or Nuitka), you can still do a lot. And sure, static code
analysis will never be able to infer everything that a JIT compiler can see.