askutt at gmail
Apr 27, 2012, 6:25 PM
Post #5 of 19
On Apr 27, 2:54†pm, John Nagle <na...@animats.com> wrote:
> † † †I have a multi-threaded CPython program, which has up to four
> threads. †One thread is simply a wait loop monitoring the other
> three and waiting for them to finish, so it can give them more
> work to do. †When the work threads, which read web pages and
> then parse them, are compute-bound, I've had the monitoring thread
> starved of CPU time for as long as 120 seconds.
How exactly are you determining that this is the case?
> † † I know that the CPython thread dispatcher sucks, but I didn't
> realize it sucked that bad. †Is there a preference for running
> threads at the head of the list (like UNIX, circa 1979) or
> something like that?
Not in CPython, which is at the mercy of what the operating system
does. Under the covers, CPython uses a semaphore on Windows, which do
not have FIFO ordering as per http://msdn.microsoft.com/en-us/library/windows/desktop/ms685129(v=vs.85).aspx.
As a result, I think your thread is succumbing to the same issues that
impact signal delivery as described on 22-24 and 35-41 of
I'm not sure there's any easy or reliable way to "fix" that from your
code. I am not a WinAPI programmer though, and I'd suggest finding
one to help you out. It doesn't appear possible to change the
scheduling policy for semaphore programatically, and I don't know
closely they pay any attention to thread priority.
That's just a guess though, and finding out for sure would take some
low-level debugging. However, it seems to be the most probable
situation assuming your code is correct.
> † † (And yes, I know about "multiprocessing". †These threads are already
> in one of several service processes. †I don't want to launch even more
> copies of the Python interpreter.
Why? There's little harm in launching more instances. Processes have
some additional startup and memory overhead compared to threads, but I
can't imagine it woudl be an issue. Given what you're trying to do,
I'd expect to run out of other resources long before I ran out of
memory because I created too many processes or threads.
>†The threads are usually I/O bound,
> but when they hit unusually long web pages, they go compute-bound
> during parsing.)
If your concern is being CPU oversubscribed by using lots of
processes, I suspect it's probably misplaced. A whole mess of CPU-
bound tasks is pretty much the easiest case for a scheduler to