Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Reworking the GIL

 

 

First page Previous page 1 2 3 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded


solipsis at pitrou

Oct 25, 2009, 1:22 PM

Post #1 of 62 (1815 views)
Permalink
Reworking the GIL

Hello there,

The last couple of days I've been working on an experimental rewrite of
the GIL. Since the work has been turning out rather successful (or, at
least, not totally useless and crashing!) I thought I'd announce it
here.

First I want to stress this is not about removing the GIL. There still
is a Global Interpreter Lock which serializes access to most parts of
the interpreter. These protected parts haven't changed either, so Python
doesn't become really better at extracting computational parallelism out
of several cores.

Goals
-----

The new GIL (which is also the name of the sandbox area I've committed
it in, "newgil") addresses the following issues :

1) Switching by opcode counting. Counting opcodes is a very crude way of
estimating times, since the time spent executing a single opcode can
very wildly. Litterally, an opcode can be as short as a handful of
nanoseconds (think something like "... is not None") or as long as a
fraction of second, or even longer (think calling a heavy non-GIL
releasing C function, such as re.search()). Therefore, releasing the GIL
every 100 opcodes, regardless of their length, is a very poor policy.

The new GIL does away with this by ditching _Py_Ticker entirely and
instead using a fixed interval (by default 5 milliseconds, but settable)
after which we ask the main thread to release the GIL and let another
thread be scheduled.

2) GIL overhead and efficiency in contended situations. Apparently, some
OSes (OS X mainly) have problems with lock performance when the lock is
already taken: the system calls are heavy. This is the "Dave Beazley
effect", where he took a very trivial loop, therefore made of very short
opcodes and therefore releasing the GIL very often (probably 100000
times a second), and runs it in one or two threads on an OS with poor
lock performance (OS X). He sees a 50% increase in runtime when using
two threads rather than one, in what is admittedly a pathological case.

Even on better platforms such as Linux, eliminating the overhead of many
GIL acquires and releases (since the new GIL is released on a fixed time
basis rather than on an opcode counting basis) yields slightly better
performance (read: a smaller performance degradation :-)) when there are
several pure Python computation threads running.

3) Thread switching latency. The traditional scheme merely releases the
GIL for a couple of CPU cycles, and reacquires it immediately.
Unfortunately, this doesn't mean the OS will automatically switch to
another, GIL-awaiting thread. In many situations, the same thread will
continue running. This, with the opcode counting scheme, is the reason
why some people have been complaining about latency problems when an I/O
thread competes with a computational thread (the I/O thread wouldn't be
scheduled right away when e.g. a packet arrives; or rather, it would be
scheduled by the OS, but unscheduled immediately when trying to acquire
the GIL, and it would be scheduled again only much later).

The new GIL improves on this by combinating two mechanisms:
- forced thread switching, which means that when the switching interval
is terminated (mentioned in 1) and the GIL is released, we will force
any of the threads waiting on the GIL to be scheduled instead of the
formerly GIL-holding thread. Which thread exactly is an OS decision,
however: the goal here is not to have our own scheduler (this could be
discussed but I wanted the design to remain simple :-) After all,
man-years of work have been invested in scheduling algorithms by kernel
programming teams).
- priority requests, which is an option for a thread requesting the GIL
to be scheduled as soon as possible, and forcibly (rather than any other
threads). This is meant to be used by GIL-releasing methods such as
read() on files and sockets. The scheme, again, is very simple: when a
priority request is done by a thread, the GIL is released as soon as
possible by the thread holding it (including in the eval loop), and then
the thread making the priority request is forcibly scheduled (by making
all other GIL-awaiting threads wait in the meantime).

Implementation
--------------

The new GIL is implemented using a couple of mutexes and condition
variables. A {mutex, condition} pair is used to protect the GIL itself,
which is a mere variable named `gil_locked` (there are a couple of other
variables for bookkeeping). Another {mutex, condition} pair is used for
forced thread switching (described above). Finally, a separate mutex is
used for priority requests (described above).

The code is in the sandbox:
http://svn.python.org/view/sandbox/trunk/newgil/

The file of interest is Python/ceval_gil.h. Changes in other files are
very minimal, except for priority requests which have been added at
strategic places (some methods of I/O modules). Also, the code remains
rather short, while of course being less trivial than the old one.

NB : this is a branch of py3k. There should be no real difficulty
porting it back to trunk, provided someone wants to do the job.

Platforms
---------

I've implemented the new GIL for POSIX and Windows (tested under Linux
and Windows XP (running in a VM)). Judging by what I can read in the
online MSDN docs, the Windows support should include everything from
Windows 2000, and probably recent versions of Windows CE.

Other platforms aren't implemented, because I don't have access to the
necessary hardware. Besides, I must admit I'm not very motivated in
working on niche/obsolete systems. I've e-mailed Andrew MacIntyre in
private to ask him if he'd like to do the OS/2 support.

Supporting a new platform is not very difficult: it's a matter of
writing the 50-or-so lines of necessary platform-specific macros at the
beginning of Python/ceval_gil.h.

The reason I couldn't use the existing thread support
(Python/thread_*.h) is that these abstractions are too poor. Mainly,
they don't provide:
- events, conditions or an equivalent thereof
- the ability to acquire a resource with a timeout

Measurements
------------

Before starting this work, I wrote ccbench (*), a little benchmark
script ("ccbench" being a shorthand for "concurrency benchmark") which
measures two things:
- computation throughput with one or several concurrent threads
- latency to external events (I use an UDP socket) when there is zero,
one, or several background computation threads running

(*) http://svn.python.org/view/sandbox/trunk/ccbench/

The benchmark involves several computation workloads with different GIL
characteristics. By default there are 3 of them:
A- one pure Python workload (computation of a number of digits of pi):
that is, something which spends its time in the eval loop
B- one mostly C workload where the C implementation doesn't release the
GIL (regular expression matching)
C- one mostly C workload where the implementation does release the GIL
(bz2 compression)

In the ccbench directory you will find benchmark results, under Linux,
for two different systems I have here. The new GIL shows roughly similar
but slightly better throughput results than the old one. And it is much
better in the latency tests, especially in workload B (going down from
almost a second of average latency with the old GIL, to a couple of
milliseconds with the new GIL). This is the combined result of using a
time-based scheme (rather than opcode-based) and of forced thread
switching (rather than relying on the OS to actually switch threads when
we speculatively release the GIL).

As a sidenote, I might mention that single-threaded performance is not
degraded at all. It is, actually, theoretically a bit better because the
old ticker check in the eval loop becomes simpler; however, this goes
mostly unnoticed.


Now what remains to be done?

Having other people test it would be fine. Even better if you have an
actual multi-threaded py3k application. But ccbench results for other
OSes would be nice too :-)
(I get good results under the Windows XP VM but I feel that a VM is not
an ideal setup for a concurrency benchmark)

Of course, studying and reviewing the code is welcome. As for
integrating it into the mainline py3k branch, I guess we have to answer
these questions:
- is the approach interesting? (we could decide that it's just not worth
it, and that a good GIL can only be a dead (removed) GIL)
- is the patch good, mature and debugged enough?
- how do we deal with the unsupported platforms (POSIX and Windows
support should cover most bases, but the fate of OS/2 support depends on
Andrew)?

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


brett at python

Oct 25, 2009, 2:57 PM

Post #2 of 62 (1750 views)
Permalink
Re: Reworking the GIL [In reply to]

>
> [.SNIP - a lot of detail on what sounds like a good design]
>
> Now what remains to be done?
>
> Having other people test it would be fine. Even better if you have an
> actual multi-threaded py3k application. But ccbench results for other
> OSes would be nice too :-)
> (I get good results under the Windows XP VM but I feel that a VM is not
> an ideal setup for a concurrency benchmark)
>
> Of course, studying and reviewing the code is welcome. As for
> integrating it into the mainline py3k branch, I guess we have to answer
> these questions:
> - is the approach interesting? (we could decide that it's just not worth
> it, and that a good GIL can only be a dead (removed) GIL)
>

I think it's worth it. Removal of the GIL is a totally open-ended problem
with no solution in sight. This, on the other hand, is a performance benefit
now. I say move forward with this. If it happens to be short-lived because
some actually figures out how to remove the GIL then great, but is that
really going to happen between now and Python 3.2? I doubt it.


> - is the patch good, mature and debugged enough?
> - how do we deal with the unsupported platforms (POSIX and Windows
> support should cover most bases, but the fate of OS/2 support depends on
> Andrew)?
>
>
It's up to Andrew to get the support in. While I have faith he will, this is
why we have been scaling back the support for alternative OSs for a while
and will continue to do so. I suspect the day Andrew stops keeping up will
be the day we push to have OS/2 be externally maintained.

-Brett


tjreedy at udel

Oct 25, 2009, 10:07 PM

Post #3 of 62 (1744 views)
Permalink
Re: Reworking the GIL [In reply to]

Antoine Pitrou wrote:
> Hello there,
>
> The last couple of days I've been working on an experimental rewrite of
> the GIL. Since the work has been turning out rather successful (or, at
> least, not totally useless and crashing!) I thought I'd announce it
> here.

I am curious as to whether the entire mechanism is or can be turned off
when not needed -- when there are not threads (other than the main,
starting thread)?

tjr

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Oct 26, 2009, 3:19 AM

Post #4 of 62 (1742 views)
Permalink
Re: Reworking the GIL [In reply to]

Terry Reedy <tjreedy <at> udel.edu> writes:
>
> I am curious as to whether the entire mechanism is or can be turned off
> when not needed -- when there are not threads (other than the main,
> starting thread)?

It is an implicit feature: when no thread is waiting on the GIL, the GIL-holding
thread isn't notified and doesn't try to release it at all (in the eval loop,
that is; GIL-releasing C extensions still release it).

Note that "no thread is waiting on the GIL" can mean one of two things:
- either there is only one Python thread
- or the other Python threads are doing things with the GIL released (zlib/bz2
compression, waiting on I/O, sleep()ing, etc.)

So, yes, it automatically "turns itself off".

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


andymac at bullseye

Oct 26, 2009, 5:37 AM

Post #5 of 62 (1742 views)
Permalink
Re: Reworking the GIL [In reply to]

Brett Cannon wrote:
> It's up to Andrew to get the support in. While I have faith he will,
> this is why we have been scaling back the support for alternative OSs
> for a while and will continue to do so. I suspect the day Andrew stops
> keeping up will be the day we push to have OS/2 be externally maintained.

Notwithstanding my desire to keep OS/2 supported in the Python tree,
keeping up has been more difficult of late:
- OS/2 is unquestionably a "legacy" environment, with system APIs
different in flavour and semantics from the current mainstream (though
surprisingly capable in many ways despite its age).
- The EMX runtime my OS/2 port currently relies on to abstract the
system API to a Posix-ish API is itself a legacy package, essentially
unmaintained for some years :-( This has been a source of increasing
pain as Python has moved with the mainstream... with regard to Unicode
support and threads in conjunction with multi-processing, in particular.

Real Life hasn't been favourably disposed either...

I have refrained from applying the extensive patches required to make
the port feature complete for 2.6 and later while I investigate an
alternate Posix emulating runtime (derived from FreeBSD's C library,
and which is used by Mozilla on OS/2), which would allow me to dispense
with most of these patches. But it has an issue or two of its own...

The cost in effort has been compounded by effectively having to try and
maintain two ports - 2.x and 3.x. And the 3.x port has suffered more
as its demands are higher.

So while I asked to keep the OS/2 thread support alive, if a decision
were to be taken to remove OS/2 support from the Python 3.x sources I
could live with that. A completed migration to Mercurial might well
make future port maintenance easier for me.

Regards,
Andrew.

--
-------------------------------------------------------------------------
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: andymac [at] bullseye (pref) | Snail: PO Box 370
andymac [at] pcug (alt) | Belconnen ACT 2616
Web: http://www.andymac.org/ | Australia
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


sturla at molden

Oct 26, 2009, 6:43 AM

Post #6 of 62 (1739 views)
Permalink
Re: Reworking the GIL [In reply to]

Antoine Pitrou skrev:
> - priority requests, which is an option for a thread requesting the GIL
> to be scheduled as soon as possible, and forcibly (rather than any other
> threads).
So Python threads become preemptive rather than cooperative? That would
be great. :-)

time.sleep should generate a priority request to re-acquire the GIL; and
so should all other blocking standard library functions with a time-out.


S.M.

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


kristjan at ccpgames

Oct 26, 2009, 7:09 AM

Post #7 of 62 (1737 views)
Permalink
Re: Reworking the GIL [In reply to]

> -----Original Message-----
> From: python-dev-bounces+kristjan=ccpgames.com [at] python
> [mailto:python-dev-bounces+kristjan=ccpgames.com [at] python] On Behalf
> Of Sturla Molden
> time.sleep should generate a priority request to re-acquire the GIL;
> and
> so should all other blocking standard library functions with a time-
> out.

I don't agree. You have to be very careful with priority. time.sleep() does not promise to wake up in any timely manner, and neither do the timeout functions. Rather, the timeout is a way to prevent infinite wait.

In my experience (from stackless python) using priority wakeup for IO can result in very erratic scheduling when there is much IO going on, every IO trumping another. You should stick to round robin except for very special and carefully analysed cases.
K
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ssteinerx at gmail

Oct 26, 2009, 7:25 AM

Post #8 of 62 (1738 views)
Permalink
Re: Reworking the GIL [In reply to]

On Oct 26, 2009, at 10:09 AM, Kristján Valur Jónsson wrote:

>
>
>> -----Original Message-----
>> From: python-dev-bounces+kristjan=ccpgames.com [at] python
>> [mailto:python-dev-bounces+kristjan=ccpgames.com [at] python] On
>> Behalf
>> Of Sturla Molden
>> time.sleep should generate a priority request to re-acquire the GIL;
>> and
>> so should all other blocking standard library functions with a time-
>> out.
>
> I don't agree. You have to be very careful with priority.
> time.sleep() does not promise to wake up in any timely manner, and
> neither do the timeout functions. Rather, the timeout is a way to
> prevent infinite wait.
>
> In my experience (from stackless python) using priority wakeup for
> IO can result in very erratic scheduling when there is much IO going
> on, every IO trumping another. You should stick to round robin
> except for very special and carefully analysed cases.

All the IO tasks can also go in their own round robin so that CPU time
is correctly shared among all waiting IO tasks.

IOW, to make sure that all IO tasks get a fair share *in relation to
all other IO tasks*.

Tasks can be put into the IO round robin when they "pull the IO alarm"
so to speak, so there's no need to decide before-hand which task goes
in which round robin pool.

I'm not familiar with this particular code in Python, but I've used
this in other systems for years to make sure that IO tasks don't
starve the rest of the system and that the most "insistent" IO task
doesn't starve all the others.

S

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


sturla at molden

Oct 26, 2009, 8:46 AM

Post #9 of 62 (1732 views)
Permalink
Re: Reworking the GIL [In reply to]

Antoine Pitrou skrev:
> - priority requests, which is an option for a thread requesting the GIL
> to be scheduled as soon as possible, and forcibly (rather than any other
> threads). T
Should a priority request for the GIL take a priority number?

- If two threads make a priority requests for the GIL, the one with the
higher priority should get the GIL first.

- If a thread with a low priority make a priority request for the GIL,
it should not be allowed to "preempt" (take the GIL away from) a
higher-priority thread, in which case the priority request would be
ignored.

Related issue: Should Python threads have priorities? They are after all
real OS threads.

S.M.

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Oct 26, 2009, 8:58 AM

Post #10 of 62 (1731 views)
Permalink
Re: Reworking the GIL [In reply to]

Sturla Molden <sturla <at> molden.no> writes:
>
> Antoine Pitrou skrev:
> > - priority requests, which is an option for a thread requesting the GIL
> > to be scheduled as soon as possible, and forcibly (rather than any other
> > threads). T
> Should a priority request for the GIL take a priority number?

Er, I prefer to keep things simple. If you have lots of I/O you should probably
use an event loop rather than separate threads.

> Related issue: Should Python threads have priorities? They are after all
> real OS threads.

Well, precisely they are OS threads, and the OS already assigns them (static or
dynamic) priorities. No need to replicate this.

(to answer another notion expressed in another message, there's no "round-robin"
scheduling either)

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


daniel at stutzbachenterprises

Oct 26, 2009, 9:18 AM

Post #11 of 62 (1733 views)
Permalink
Re: Reworking the GIL [In reply to]

On Mon, Oct 26, 2009 at 10:58 AM, Antoine Pitrou <solipsis [at] pitrou>wrote:

> Er, I prefer to keep things simple. If you have lots of I/O you should
> probably
> use an event loop rather than separate threads.
>

On Windows, sometimes using a single-threaded event loop is sometimes
impossible. WaitForMultipleObjects(), which is the Windows equivalent to
select() or poll(), can handle a maximum of only 64 objects.

Do we really need priority requests at all? They seem counter to your
desire for simplicity and allowing the operating system's scheduler to do
its work.

That said, if a thread's time budget is merely paused during I/O rather than
reset, then a thread making frequent (but short) I/O requests cannot starve
the system.

--
Daniel Stutzbach, Ph.D.
President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>


solipsis at pitrou

Oct 26, 2009, 11:10 AM

Post #12 of 62 (1725 views)
Permalink
Re: Reworking the GIL [In reply to]

Daniel Stutzbach <daniel <at> stutzbachenterprises.com> writes:
>
> Do we really need priority requests at all?  They seem counter to your
> desire for simplicity and allowing the operating system's scheduler to do
> its work.

No, they can be disabled (removed) if we prefer. With priority requests
disabled, latency results becomes less excellent but still quite good.

Running ccbench on a dual core machine gives the following latency results,
first with then without priority requets.

--- Latency --- (with prio requests)

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 2 ms.)
CPU threads=2: 0 ms. (std dev: 2 ms.)
CPU threads=3: 0 ms. (std dev: 2 ms.)
CPU threads=4: 0 ms. (std dev: 2 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 3 ms. (std dev: 2 ms.)
CPU threads=2: 3 ms. (std dev: 2 ms.)
CPU threads=3: 3 ms. (std dev: 2 ms.)
CPU threads=4: 4 ms. (std dev: 3 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 2 ms.)
CPU threads=1: 0 ms. (std dev: 2 ms.)
CPU threads=2: 0 ms. (std dev: 0 ms.)
CPU threads=3: 0 ms. (std dev: 2 ms.)
CPU threads=4: 0 ms. (std dev: 1 ms.)

--- Latency --- (without prio requests)

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 2 ms.)
CPU threads=1: 5 ms. (std dev: 0 ms.)
CPU threads=2: 3 ms. (std dev: 3 ms.)
CPU threads=3: 9 ms. (std dev: 7 ms.)
CPU threads=4: 22 ms. (std dev: 23 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 1 ms.)
CPU threads=1: 8 ms. (std dev: 2 ms.)
CPU threads=2: 5 ms. (std dev: 4 ms.)
CPU threads=3: 21 ms. (std dev: 32 ms.)
CPU threads=4: 19 ms. (std dev: 26 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 1 ms.)
CPU threads=1: 0 ms. (std dev: 2 ms.)
CPU threads=2: 0 ms. (std dev: 0 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 0 ms. (std dev: 0 ms.)



_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


collinw at gmail

Oct 26, 2009, 1:01 PM

Post #13 of 62 (1725 views)
Permalink
Re: Reworking the GIL [In reply to]

On Sun, Oct 25, 2009 at 1:22 PM, Antoine Pitrou <solipsis [at] pitrou> wrote:
> Having other people test it would be fine. Even better if you have an
> actual multi-threaded py3k application. But ccbench results for other
> OSes would be nice too :-)

My results for an 2.4 GHz Intel Core 2 Duo MacBook Pro (OS X 10.5.8):

Control (py3k @ r75723)

--- Throughput ---

Pi calculation (Python)

threads=1: 633 iterations/s.
threads=2: 468 ( 74 %)
threads=3: 443 ( 70 %)
threads=4: 442 ( 69 %)

regular expression (C)

threads=1: 281 iterations/s.
threads=2: 282 ( 100 %)
threads=3: 282 ( 100 %)
threads=4: 282 ( 100 %)

bz2 compression (C)

threads=1: 379 iterations/s.
threads=2: 735 ( 193 %)
threads=3: 733 ( 193 %)
threads=4: 724 ( 190 %)

--- Latency ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 1 ms. (std dev: 1 ms.)
CPU threads=2: 1 ms. (std dev: 2 ms.)
CPU threads=3: 3 ms. (std dev: 6 ms.)
CPU threads=4: 2 ms. (std dev: 3 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 975 ms. (std dev: 577 ms.)
CPU threads=2: 1035 ms. (std dev: 571 ms.)
CPU threads=3: 1098 ms. (std dev: 556 ms.)
CPU threads=4: 1195 ms. (std dev: 557 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 2 ms.)
CPU threads=2: 4 ms. (std dev: 5 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 1 ms. (std dev: 4 ms.)



Experiment (newgil branch @ r75723)

--- Throughput ---

Pi calculation (Python)

threads=1: 651 iterations/s.
threads=2: 643 ( 98 %)
threads=3: 637 ( 97 %)
threads=4: 625 ( 95 %)

regular expression (C)

threads=1: 298 iterations/s.
threads=2: 296 ( 99 %)
threads=3: 288 ( 96 %)
threads=4: 287 ( 96 %)

bz2 compression (C)

threads=1: 378 iterations/s.
threads=2: 720 ( 190 %)
threads=3: 724 ( 191 %)
threads=4: 718 ( 189 %)

--- Latency ---

Background CPU task: Pi calculation (Python)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 1 ms.)
CPU threads=2: 0 ms. (std dev: 1 ms.)
CPU threads=3: 0 ms. (std dev: 0 ms.)
CPU threads=4: 1 ms. (std dev: 5 ms.)

Background CPU task: regular expression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 1 ms. (std dev: 0 ms.)
CPU threads=2: 2 ms. (std dev: 1 ms.)
CPU threads=3: 2 ms. (std dev: 2 ms.)
CPU threads=4: 2 ms. (std dev: 1 ms.)

Background CPU task: bz2 compression (C)

CPU threads=0: 0 ms. (std dev: 0 ms.)
CPU threads=1: 0 ms. (std dev: 0 ms.)
CPU threads=2: 2 ms. (std dev: 3 ms.)
CPU threads=3: 0 ms. (std dev: 1 ms.)
CPU threads=4: 0 ms. (std dev: 0 ms.)


I also ran this through Unladen Swallow's threading microbenchmark,
which is a straight copy of what David Beazley was experimenting with
(simply iterating over 1000000 ints in pure Python) [1].
"iterative_count" is doing the loops one after the other,
"threaded_count" is doing the loops in parallel using threads.

The results below are benchmarking py3k as the control, newgil as the
experiment. When it says "x% faster", that is a measure of newgil's
performance over py3k's.

With two threads:

iterative_count:
Min: 0.336573 -> 0.387782: 13.21% slower # I've run this
configuration multiple times and gotten the same slowdown.
Avg: 0.338473 -> 0.418559: 19.13% slower
Significant (t=-38.434785, a=0.95)

threaded_count:
Min: 0.529859 -> 0.397134: 33.42% faster
Avg: 0.581786 -> 0.429933: 35.32% faster
Significant (t=70.100445, a=0.95)


With four threads:

iterative_count:
Min: 0.766617 -> 0.734354: 4.39% faster
Avg: 0.771954 -> 0.751374: 2.74% faster
Significant (t=22.164103, a=0.95)
Stddev: 0.00262 -> 0.00891: 70.53% larger

threaded_count:
Min: 1.175750 -> 0.829181: 41.80% faster
Avg: 1.224157 -> 0.867506: 41.11% faster
Significant (t=161.715477, a=0.95)
Stddev: 0.01900 -> 0.01120: 69.65% smaller


With eight threads:

iterative_count:
Min: 1.527794 -> 1.447421: 5.55% faster
Avg: 1.536911 -> 1.479940: 3.85% faster
Significant (t=35.559595, a=0.95)
Stddev: 0.00394 -> 0.01553: 74.61% larger

threaded_count:
Min: 2.424553 -> 1.677180: 44.56% faster
Avg: 2.484922 -> 1.723093: 44.21% faster
Significant (t=184.766131, a=0.95)
Stddev: 0.02874 -> 0.02956: 2.78% larger


I'd be interested in multithreaded benchmarks with less-homogenous workloads.

Collin Winter

[1] - http://code.google.com/p/unladen-swallow/source/browse/tests/performance/bm_threading.py
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Oct 26, 2009, 2:43 PM

Post #14 of 62 (1720 views)
Permalink
Re: Reworking the GIL [In reply to]

Collin Winter <collinw <at> gmail.com> writes:
>
> My results for an 2.4 GHz Intel Core 2 Duo MacBook Pro (OS X 10.5.8):

Thanks!


[the Dave Beazley benchmark]
> The results below are benchmarking py3k as the control, newgil as the
> experiment. When it says "x% faster", that is a measure of newgil's
> performance over py3k's.
>
> With two threads:
>
> iterative_count:
> Min: 0.336573 -> 0.387782: 13.21% slower # I've run this
> configuration multiple times and gotten the same slowdown.
> Avg: 0.338473 -> 0.418559: 19.13% slower

Those numbers are not very in line with the other "iterative_count" results.
Since iterative_count just runs the loop N times in a row, results should be
proportional to the number N ("number of threads").

Besides, there's no reason for single-threaded performance to be degraded since
the fast path of the eval loop actually got a bit streamlined (there is no
volatile ticker to decrement).

> I'd be interested in multithreaded benchmarks with less-homogenous workloads.

So would I.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


collinw at gmail

Oct 26, 2009, 2:50 PM

Post #15 of 62 (1721 views)
Permalink
Re: Reworking the GIL [In reply to]

On Mon, Oct 26, 2009 at 2:43 PM, Antoine Pitrou <solipsis [at] pitrou> wrote:
> Collin Winter <collinw <at> gmail.com> writes:
> [the Dave Beazley benchmark]
>> The results below are benchmarking py3k as the control, newgil as the
>> experiment. When it says "x% faster", that is a measure of newgil's
>> performance over py3k's.
>>
>> With two threads:
>>
>> iterative_count:
>> Min: 0.336573 -> 0.387782: 13.21% slower  # I've run this
>> configuration multiple times and gotten the same slowdown.
>> Avg: 0.338473 -> 0.418559: 19.13% slower
>
> Those numbers are not very in line with the other "iterative_count" results.
> Since iterative_count just runs the loop N times in a row, results should be
> proportional to the number N ("number of threads").
>
> Besides, there's no reason for single-threaded performance to be degraded since
> the fast path of the eval loop actually got a bit streamlined (there is no
> volatile ticker to decrement).

I agree those numbers are out of line with the others and make no
sense. I've run it with two threads several times and the results are
consistent on this machine. I'm digging into it a bit more.

Collin
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


exarkun at twistedmatrix

Oct 26, 2009, 3:45 PM

Post #16 of 62 (1722 views)
Permalink
Re: Reworking the GIL [In reply to]

On 04:18 pm, daniel [at] stutzbachenterprises wrote:
>On Mon, Oct 26, 2009 at 10:58 AM, Antoine Pitrou
><solipsis [at] pitrou>wrote:
>>Er, I prefer to keep things simple. If you have lots of I/O you should
>>probably
>>use an event loop rather than separate threads.
>
>On Windows, sometimes using a single-threaded event loop is sometimes
>impossible. WaitForMultipleObjects(), which is the Windows equivalent
>to
>select() or poll(), can handle a maximum of only 64 objects.

This is only partially accurate. For one thing, WaitForMultipleObjects
calls are nestable. For another thing, Windows also has I/O completion
ports which are not limited to 64 event sources. The situation is
actually better than on a lot of POSIXes.
>Do we really need priority requests at all? They seem counter to your
>desire for simplicity and allowing the operating system's scheduler to
>do
>its work.

Despite what I said above, however, I would also take a default position
against adding any kind of more advanced scheduling system here. It
would, perhaps, make sense to expose the APIs for controlling the
platform scheduler, though.

Jean-Paul
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


ssteinerx at gmail

Oct 26, 2009, 5:02 PM

Post #17 of 62 (1715 views)
Permalink
Re: Reworking the GIL [In reply to]

On Oct 26, 2009, at 6:45 PM, exarkun [at] twistedmatrix wrote:
>> Despite what I said above, however, I would also take a default
>> position against adding any kind of more advanced scheduling system
>> here. It would, perhaps, make sense to expose the APIs for
>> controlling the platform scheduler, though.

I would also like to have an exposed API and optional profiling
(optional as in conditional compilation, not as in some sort of
profiling 'flag' that slows down non-profiling runs) so that you can
see what's going on well enough to use the API to tune a particular
platform for a particular workload.

S


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


cs at zip

Oct 26, 2009, 6:09 PM

Post #18 of 62 (1715 views)
Permalink
Re: Reworking the GIL [In reply to]

On 26Oct2009 22:45, exarkun [at] twistedmatrix <exarkun [at] twistedmatrix> wrote:
| On 04:18 pm, daniel [at] stutzbachenterprises wrote:
| >On Mon, Oct 26, 2009 at 10:58 AM, Antoine Pitrou write:
| >Do we really need priority requests at all? They seem counter to your
| >desire for simplicity and allowing the operating system's
| >scheduler to do
| >its work.
|
| Despite what I said above, however, I would also take a default
| position against adding any kind of more advanced scheduling system
| here. It would, perhaps, make sense to expose the APIs for
| controlling the platform scheduler, though.

+1 to both sentences from me.
--
Cameron Simpson <cs [at] zip> DoD#743
http://www.cskk.ezoshosting.com/cs/

Plague, Famine, Pestilence, and C++ stalk the land. We're doomed! Doomed!
- Simon E Spero
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Oct 28, 2009, 4:36 AM

Post #19 of 62 (1672 views)
Permalink
Re: Reworking the GIL [In reply to]

Kristján Valur Jónsson <kristjan <at> ccpgames.com> writes:
>
> In my experience (from stackless python) using priority wakeup for IO can
result in very erratic
> scheduling when there is much IO going on, every IO trumping another.

I whipped up a trivial multithreaded HTTP server using
socketserver.ThreadingMixin and wsgiref, and used apachebench against it with a
reasonable concurrency level (10 requests at once). Enabling/disabling priority
requests doesn't seem to make a difference.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Nov 1, 2009, 3:33 AM

Post #20 of 62 (1574 views)
Permalink
Re: Reworking the GIL [In reply to]

Hello again,

Brett Cannon <brett <at> python.org> writes:
>
> I think it's worth it. Removal of the GIL is a totally open-ended problem
> with no solution in sight. This, on the other hand, is a performance benefit
> now. I say move forward with this. If it happens to be short-lived because
> some actually figures out how to remove the GIL then great, but is that
> really going to happen between now and Python 3.2? I doubt it.

Based on this whole discussion, I think I am going to merge the new GIL work
into the py3k branch, with priority requests disabled.

If you think this is premature or uncalled for, or if you just want to review
the changes before making a judgement, please voice up :)

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


brett at python

Nov 1, 2009, 1:17 PM

Post #21 of 62 (1548 views)
Permalink
Re: Reworking the GIL [In reply to]

On Sun, Nov 1, 2009 at 03:33, Antoine Pitrou <solipsis [at] pitrou> wrote:
>
> Hello again,
>
> Brett Cannon <brett <at> python.org> writes:
>>
>> I think it's worth it. Removal of the GIL is a totally open-ended problem
>> with no solution in sight. This, on the other hand, is a performance benefit
>> now. I say move forward with this. If it happens to be short-lived because
>> some actually figures out how to remove the GIL then great, but is that
>> really going to happen between now and Python 3.2? I doubt it.
>
> Based on this whole discussion, I think I am going to merge the new GIL work
> into the py3k branch, with priority requests disabled.

This will be a nice Py3K carrot!

>
> If you think this is premature or uncalled for, or if you just want to review
> the changes before making a judgement, please voice up :)

I know I personally trust you to not mess it up, Antoine, but that
might also come from mental exhaustion and laziness. =)

-Brett
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


lists at cheimes

Nov 1, 2009, 2:12 PM

Post #22 of 62 (1546 views)
Permalink
Re: Reworking the GIL [In reply to]

Antoine Pitrou wrote:
> Based on this whole discussion, I think I am going to merge the new GIL work
> into the py3k branch, with priority requests disabled.
>
> If you think this is premature or uncalled for, or if you just want to review
> the changes before making a judgement, please voice up :)

+1 from me. I trust you like Brett does.

How much work would it cost to make your patch optional at compile time?
For what it's worth we could compare your work on different machines and
on different platforms before it gets enabled by default. Can you
imagine scenarios where your implementation might be slower than the
current GIL implementation?

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Nov 1, 2009, 2:27 PM

Post #23 of 62 (1543 views)
Permalink
Re: Reworking the GIL [In reply to]

Christian Heimes <lists <at> cheimes.de> writes:
>
> +1 from me. I trust you like Brett does.
>
> How much work would it cost to make your patch optional at compile time?

Quite a bit, because it changes the logic for processing asynchronous pending
calls (signals) and asynchronous exceptions in the eval loop. The #defines would
get quite convoluted, I think; I'd prefer not to do that.

> For what it's worth we could compare your work on different machines and
> on different platforms before it gets enabled by default. Can you
> imagine scenarios where your implementation might be slower than the
> current GIL implementation?

I don't really think so. The GIL is taken and released much more predictably
than it was before. The thing that might be worth checking is a workload with
many threads (say 50 or 100). Does anyone have that?

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


lists at cheimes

Nov 1, 2009, 2:43 PM

Post #24 of 62 (1545 views)
Permalink
Re: Reworking the GIL [In reply to]

Antoine Pitrou wrote:
> Christian Heimes <lists <at> cheimes.de> writes:
>> +1 from me. I trust you like Brett does.
>>
>> How much work would it cost to make your patch optional at compile time?
>
> Quite a bit, because it changes the logic for processing asynchronous pending
> calls (signals) and asynchronous exceptions in the eval loop. The #defines would
> get quite convoluted, I think; I'd prefer not to do that.

Based on the new piece of information I totally agree.

> I don't really think so. The GIL is taken and released much more predictably
> than it was before. The thing that might be worth checking is a workload with
> many threads (say 50 or 100). Does anyone have that?

I don't have an application that works on Python 3 and uses that many
threads, sorry.

Christian
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


greg at krypto

Nov 1, 2009, 10:39 PM

Post #25 of 62 (1531 views)
Permalink
Re: Reworking the GIL [In reply to]

On Sun, Nov 1, 2009 at 3:33 AM, Antoine Pitrou <solipsis [at] pitrou> wrote:
>
> Hello again,
>
> Brett Cannon <brett <at> python.org> writes:
>>
>> I think it's worth it. Removal of the GIL is a totally open-ended problem
>> with no solution in sight. This, on the other hand, is a performance benefit
>> now. I say move forward with this. If it happens to be short-lived because
>> some actually figures out how to remove the GIL then great, but is that
>> really going to happen between now and Python 3.2? I doubt it.
>
> Based on this whole discussion, I think I am going to merge the new GIL work
> into the py3k branch, with priority requests disabled.
>
> If you think this is premature or uncalled for, or if you just want to review
> the changes before making a judgement, please voice up :)

+1 Good idea. Thats the best way to make sure this work gets
anywhere. It can be iterated on from there if anyone has objections.
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

First page Previous page 1 2 3 Next page Last page  View All Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.