stefan_ml at behnel
Apr 11, 2012, 6:31 AM
Post #4 of 8
Armin Rigo, 11.04.2012 14:51:
> On Wed, Apr 11, 2012 at 14:29, Stefan Behnel wrote:
>>> Moreover the performance hit is well below 2x, more like 20%.
>> Hmm, those 20% refer to STM, right? Without hardware support? Then hardware
>> support could be expected to drop that even further?
> Yes, that's using STM on my regular laptop. How HTM would help
> remains unclear at this point, because in this approach transactions
> are typically rather large --- likely much larger than what the
> first-generation HTM-capable processors will support next year.
Ok. I guess once the code is there, the hardware will eventually catch up.
However, I'm not sure what you consider "large". A lot of manipulation
operations for the builtin types are not all that involved, at least in the
"normal" cases (read: fast paths) that involve no memory reallocation etc.,
and anything that can be called by and doesn't call into the interpreter
would be a complete and independent transaction all by itself, as the GIL
is allowed to be released between any two ticks.
Do you know if hybrid TM is possible at this level? I.e. short transactions
run in hardware, long ones in software? (Assuming we know what's "long" and
"short", I guess...)
> But 20% looks good anyway :-)
>> Did you do any experiments with running parallel code so far, to see if
>> that scales as expected?
> Yes, it scales very nicely on small non-conflicting examples. I
> believe that it scales just as nicely on large examples on CPython
> too, based on the approach --- as long as we, as CPython developers,
> make enough efforts to adapt a sufficiently large portion of the
> CPython C code base (which would mean: most mutable built-in objects'
Right, that would involve some work. But the advantage, as I understand it,
is that this can be done incrementally. I.e. make it work, then make it
fast and make it scale.
Python-Dev mailing list
Python-Dev [at] python