Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Experimenting with STM on CPython

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


arigo at tunes

Apr 11, 2012, 4:47 AM

Post #1 of 8 (391 views)
Permalink
Experimenting with STM on CPython

Hi all,

This is an update on the (so far PyPy-only) project of adding "Automatic
Mutual Exclusion" to Python, via STM (Software Transactional Memory).
For the motivation, see here:

http://morepypy.blogspot.com/2012/03/call-for-donations-for-software.html
"""The point is that [with STM/AME] your program is always correct,
and can be tweaked to improve performance. This is the opposite from
what explicit threads and locks give you, which is a performant
program which you need to tweak to remove bugs. Arguably, this
approach is the reason for why you use Python in the first place
:-)"""

The update is: I now believe that it might be (reasonably) possible to
apply the same techniques to CPython, and not only to PyPy. For now I
am experimenting with applying them in a simple CPython-like
interpreter. If it works, it might end up as a patch to the core parts
of CPython. The interesting property is that it would still be able to
run unmodified C extension modules --- the Python code gets the benefits
of multi-core STM/AME only if it involves only the patched parts of the
C code, but in all cases it still works correctly, falling back to
single-core usage.

I did not try to hack CPython so far, but only this custom interpreter
for a Lisp language, whose implementation should be immediately familiar
to anyone who knows CPython C code: https://bitbucket.org/arigo/duhton .
The non-standard built-in function is "transaction", which schedules a
transaction to run later (see test/test_transaction.py).

The code contains the necessary tweaks to reference counting, and seems
to work on all examples, but leaks some of the objects so far. Fixing
this directly might be possible, but I'm not sure yet (it might require
interaction with the cycle-detecting GC of CPython). Moreover the
performance hit is well below 2x, more like 20%.

If anyone is interested, I could create a dedicated mailing list in
order to discuss this in more details. From experience I would think
that this has the potential to become a Psyco-like experiment, but
unlike 10 years ago, today I'm not ready any more to dive completely
alone into a project of that scale :-)


A bientôt,

Armin.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Apr 11, 2012, 5:29 AM

Post #2 of 8 (375 views)
Permalink
Re: Experimenting with STM on CPython [In reply to]

Armin Rigo, 11.04.2012 13:47:
> This is an update on the (so far PyPy-only) project of adding "Automatic
> Mutual Exclusion" to Python, via STM (Software Transactional Memory).
> [...]
> Moreover the performance hit is well below 2x, more like 20%.

Hmm, those 20% refer to STM, right? Without hardware support? Then hardware
support could be expected to drop that even further?

Did you do any experiments with running parallel code so far, to see if
that scales as expected?

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


arigo at tunes

Apr 11, 2012, 5:51 AM

Post #3 of 8 (372 views)
Permalink
Re: Experimenting with STM on CPython [In reply to]

Hi Stefan,

On Wed, Apr 11, 2012 at 14:29, Stefan Behnel <stefan_ml [at] behnel> wrote:
>> Moreover the performance hit is well below 2x, more like 20%.
>
> Hmm, those 20% refer to STM, right? Without hardware support? Then hardware
> support could be expected to drop that even further?

Yes, that's using STM on my regular laptop. How HTM would help
remains unclear at this point, because in this approach transactions
are typically rather large --- likely much larger than what the
first-generation HTM-capable processors will support next year. But
20% looks good anyway :-)

> Did you do any experiments with running parallel code so far, to see if
> that scales as expected?

Yes, it scales very nicely on small non-conflicting examples. I
believe that it scales just as nicely on large examples on CPython
too, based on the approach --- as long as we, as CPython developers,
make enough efforts to adapt a sufficiently large portion of the
CPython C code base (which would mean: most mutable built-in objects'
implementation).


A bientôt,

Armin.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Apr 11, 2012, 6:31 AM

Post #4 of 8 (370 views)
Permalink
Re: Experimenting with STM on CPython [In reply to]

Armin Rigo, 11.04.2012 14:51:
> On Wed, Apr 11, 2012 at 14:29, Stefan Behnel wrote:
>>> Moreover the performance hit is well below 2x, more like 20%.
>>
>> Hmm, those 20% refer to STM, right? Without hardware support? Then hardware
>> support could be expected to drop that even further?
>
> Yes, that's using STM on my regular laptop. How HTM would help
> remains unclear at this point, because in this approach transactions
> are typically rather large --- likely much larger than what the
> first-generation HTM-capable processors will support next year.

Ok. I guess once the code is there, the hardware will eventually catch up.

However, I'm not sure what you consider "large". A lot of manipulation
operations for the builtin types are not all that involved, at least in the
"normal" cases (read: fast paths) that involve no memory reallocation etc.,
and anything that can be called by and doesn't call into the interpreter
would be a complete and independent transaction all by itself, as the GIL
is allowed to be released between any two ticks.

Do you know if hybrid TM is possible at this level? I.e. short transactions
run in hardware, long ones in software? (Assuming we know what's "long" and
"short", I guess...)


> But 20% looks good anyway :-)

Oh, definitely.


>> Did you do any experiments with running parallel code so far, to see if
>> that scales as expected?
>
> Yes, it scales very nicely on small non-conflicting examples. I
> believe that it scales just as nicely on large examples on CPython
> too, based on the approach --- as long as we, as CPython developers,
> make enough efforts to adapt a sufficiently large portion of the
> CPython C code base (which would mean: most mutable built-in objects'
> implementation).

Right, that would involve some work. But the advantage, as I understand it,
is that this can be done incrementally. I.e. make it work, then make it
fast and make it scale.

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


stefan_ml at behnel

Apr 11, 2012, 6:57 AM

Post #5 of 8 (373 views)
Permalink
Re: Experimenting with STM on CPython [In reply to]

Stefan Behnel, 11.04.2012 15:31:
> Armin Rigo, 11.04.2012 14:51:
>> On Wed, Apr 11, 2012 at 14:29, Stefan Behnel wrote:
>>> Did you do any experiments with running parallel code so far, to see if
>>> that scales as expected?
>>
>> Yes, it scales very nicely on small non-conflicting examples. I
>> believe that it scales just as nicely on large examples on CPython
>> too, based on the approach --- as long as we, as CPython developers,
>> make enough efforts to adapt a sufficiently large portion of the
>> CPython C code base (which would mean: most mutable built-in objects'
>> implementation).
>
> Right, that would involve some work. But the advantage, as I understand it,
> is that this can be done incrementally.

Hmm, and according to the papers that are referenced on the PyPy proposal
page, at least some of this work has already been done, it seems.

http://pypy.org/tmdonate.html#why-hasn-t-the-idea-been-implemented-for-cpython-already

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


neologix at free

Apr 11, 2012, 7:27 AM

Post #6 of 8 (369 views)
Permalink
Re: Experimenting with STM on CPython [In reply to]

>> Yes, that's using STM on my regular laptop.  How HTM would help
>> remains unclear at this point, because in this approach transactions
>> are typically rather large --- likely much larger than what the
>> first-generation HTM-capable processors will support next year.
>
> Ok. I guess once the code is there, the hardware will eventually catch up.
>
> However, I'm not sure what you consider "large". A lot of manipulation
> operations for the builtin types are not all that involved, at least in the
> "normal" cases (read: fast paths) that involve no memory reallocation etc.,
> and anything that can be called by and doesn't call into the interpreter
> would be a complete and independent transaction all by itself, as the GIL
> is allowed to be released between any two ticks.

Large as in L2-cache large, and as in "you won't get a page fault or
an interrupt, you won't make any syscall, any I/O..." ;-)
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Apr 11, 2012, 7:33 AM

Post #7 of 8 (370 views)
Permalink
Re: Experimenting with STM on CPython [In reply to]

On Wed, 11 Apr 2012 15:31:09 +0200
Stefan Behnel <stefan_ml [at] behnel> wrote:
>
> Ok. I guess once the code is there, the hardware will eventually catch up.
>
> However, I'm not sure what you consider "large". A lot of manipulation
> operations for the builtin types are not all that involved, at least in the
> "normal" cases (read: fast paths) that involve no memory reallocation etc.,
> and anything that can be called by and doesn't call into the interpreter
> would be a complete and independent transaction all by itself, as the GIL
> is allowed to be released between any two ticks.

I think Armin's plan is not to work at the bytecode level, but make
transactions explicit (at least in framework code - e.g. Twisted or
Stackless -, perhaps not in user code). Perhaps he can elaborate on
that.

> Do you know if hybrid TM is possible at this level? I.e. short transactions
> run in hardware, long ones in software? (Assuming we know what's "long" and
> "short", I guess...)

There are other issues than the size of transactions. For example, one
issue is that not all operations may be allowed in a transaction:

“In addition, there are a number of instructions that may cause an
abort on specific implementations. These instructions include x87 and
MMX, mixed access to XMM and YMM registers, updates to non-status parts
of EFLAGs, updating segment, debug or control registers, ring
transitions, cache and TLB control instructions, any non-writeback
memory type accesses, processor state save, interrupts, I/O,
virtualization (VMX), trusted execution (SMX) and several miscellaneous
types.”

http://realworldtech.com/page.cfm?ArticleID=RWT021512050738

So, realistically, a (S)TM implementation in CPython (and probably also
in PyPy) would have to stand on its own merits, rather than betting on
pie-in-the-sky improvements of HTM implementations.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


arigo at tunes

Apr 11, 2012, 8:08 AM

Post #8 of 8 (367 views)
Permalink
Re: Experimenting with STM on CPython [In reply to]

Hi Antoine, hi Stefan,

On Wed, Apr 11, 2012 at 16:33, Antoine Pitrou <solipsis [at] pitrou> wrote:
> I think Armin's plan is not to work at the bytecode level, but make
> transactions explicit (at least in framework code - e.g. Twisted or
> Stackless -, perhaps not in user code). Perhaps he can elaborate on
> that.

Yes, precisely. It should be explained in the proposal. The
references in "http://pypy.org/tmdonate.html#why-hasn-t-the-idea-been-implemented-for-cpython-already"
don't change CPython (or only minimally). They use Hardware TM, but
(the most important thing imho) they target bytecode-level
transactions --- i.e. the programmer is still stuck with the
"threading" module.

About using it explicitly in user code: I found out that there are use
cases to do so directly. If you have a CPU-intensive program that
does:

for x in some_random_order_iterator:
do_stuff_for(x)

Then if the things you do are "not too dependent on each other", you
can win by replacing it with:

for x in some_random_order_iterator:
transaction.add(do_stuff_for, x)
transaction.run()

and no other change. It has exactly the same semantics, and in this
case you don't really need a framework in which to hide the
transaction.add(). Compare it with the situation of spawning threads:
you need to carefully add locks *everywhere* or your program is buggy
--- both in today's CPython or in a GIL-less,
bytecode-level-transaction CPython.

By the way, that's why I said that transactions are arbitrarily long:
one transaction will be, in this case, everything that do_stuff_for(x)
does.


A bientôt,

Armin.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.