jabba.laci at gmail
May 10, 2012, 5:46 AM
Post #4 of 7
Thanks for the answer. I use Linux with CPython 2.7. I plan to work
with CPU bound and I/O bound problems too. Which packages to use in
these cases? Could you redirect me to some guides? When to use
multiprocessing / gevent?
On Thu, May 10, 2012 at 2:34 PM, Dave Angel <d [at] davea> wrote:
> On 05/10/2012 08:14 AM, Jabba Laci wrote:
>> I would like to do some parallel programming with Python but I don't
>> know how to start. There are several ways to go but I don't know what
>> the differences are between them: threads, multiprocessing, gevent,
>> I want to use a single machine with several cores. I want to solve
>> problems like this: iterate over a loop (with millions of steps) and
>> do some work at each step. The steps are independent, so here I would
>> like to process several steps in parallel. I want to store the results
>> in a global list (which should be "synchronised"). Typical use case:
>> crawl webpages, extract images and collect the images in a list.
>> What's the best way?
> There's no single best-way. First question is your programming
> environment. That includes the OS you're running, and the version # and
> implementation of Python.
> I'll assume you're using CPython 2.7 on Linux, which is what I have the
> most experience on. But after you answer, others will probably make
> suggestions appropriate to whatever you're actually using
> Next question is whether the problem you're solving at any given moment
> is cpu-bound or i/o bound. I'll try to answer for both cases, here.
> In CPython 2.7, there's a GIL, which is a global lock preventing more
> than one CPU-bound thread from running at the same time. it's more
> complex than that, but bottom line is that multiple threads won't help
> (and might hurt) a CPU-bound program, even in a multi-core situation.
> So use multiple processes, and cooperate between them with queues or
> shared memory, or even files. In fact, you can use multiple computers,
> and communicate using sockets, in many cases.
> This is what CPython is good at solving with threads. Once you make a
> blocking I/O call, usually the C code involves releases the GIL, and
> other threads can run. For this situation, the fact that you can share
> data structures makes threads a performance win.
> Web crawling is likely to be IO-bound, but i wanted to be as complete as
> I could.