Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

parallel programming in Python

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


jabba.laci at gmail

May 10, 2012, 5:14 AM

Post #1 of 7 (636 views)
Permalink
parallel programming in Python

Hi,

I would like to do some parallel programming with Python but I don't
know how to start. There are several ways to go but I don't know what
the differences are between them: threads, multiprocessing, gevent,
etc.

I want to use a single machine with several cores. I want to solve
problems like this: iterate over a loop (with millions of steps) and
do some work at each step. The steps are independent, so here I would
like to process several steps in parallel. I want to store the results
in a global list (which should be "synchronised"). Typical use case:
crawl webpages, extract images and collect the images in a list.

What's the best way?

Thanks,

Laszlo
--
http://mail.python.org/mailman/listinfo/python-list


d at davea

May 10, 2012, 5:34 AM

Post #2 of 7 (611 views)
Permalink
Re: parallel programming in Python [In reply to]

On 05/10/2012 08:14 AM, Jabba Laci wrote:
> Hi,
>
> I would like to do some parallel programming with Python but I don't
> know how to start. There are several ways to go but I don't know what
> the differences are between them: threads, multiprocessing, gevent,
> etc.
>
> I want to use a single machine with several cores. I want to solve
> problems like this: iterate over a loop (with millions of steps) and
> do some work at each step. The steps are independent, so here I would
> like to process several steps in parallel. I want to store the results
> in a global list (which should be "synchronised"). Typical use case:
> crawl webpages, extract images and collect the images in a list.
>
> What's the best way?
>
> Thanks,
>
> Laszlo

There's no single best-way. First question is your programming
environment. That includes the OS you're running, and the version # and
implementation of Python.

I'll assume you're using CPython 2.7 on Linux, which is what I have the
most experience on. But after you answer, others will probably make
suggestions appropriate to whatever you're actually using

Next question is whether the problem you're solving at any given moment
is cpu-bound or i/o bound. I'll try to answer for both cases, here.

CPU-bound:
In CPython 2.7, there's a GIL, which is a global lock preventing more
than one CPU-bound thread from running at the same time. it's more
complex than that, but bottom line is that multiple threads won't help
(and might hurt) a CPU-bound program, even in a multi-core situation.
So use multiple processes, and cooperate between them with queues or
shared memory, or even files. In fact, you can use multiple computers,
and communicate using sockets, in many cases.

IO-bound:
This is what CPython is good at solving with threads. Once you make a
blocking I/O call, usually the C code involves releases the GIL, and
other threads can run. For this situation, the fact that you can share
data structures makes threads a performance win.

Web crawling is likely to be IO-bound, but i wanted to be as complete as
I could.

--

DaveA

--
http://mail.python.org/mailman/listinfo/python-list


jeanpierreda at gmail

May 10, 2012, 5:46 AM

Post #3 of 7 (621 views)
Permalink
Re: parallel programming in Python [In reply to]

On Thu, May 10, 2012 at 8:14 AM, Jabba Laci <jabba.laci [at] gmail> wrote:
> What's the best way?

>From what I've heard, http://scrapy.org/ . It is a single-thread
single-process web crawler that nonetheless can download things
concurrently.

Doing what you want in Scrapy would probably involve learning about
Twisted, the library Scrapy works on top of. This is somewhat more
involved than just throwing threads and urllib and lxml.html together,
although most of the Twisted developers are really helpful. It might
not be worth it to you, depending on the size of the task.



Dave's answer is pretty general and good though.

-- Devin
--
http://mail.python.org/mailman/listinfo/python-list


jabba.laci at gmail

May 10, 2012, 5:46 AM

Post #4 of 7 (614 views)
Permalink
Re: parallel programming in Python [In reply to]

Hi,

Thanks for the answer. I use Linux with CPython 2.7. I plan to work
with CPU bound and I/O bound problems too. Which packages to use in
these cases? Could you redirect me to some guides? When to use
multiprocessing / gevent?

Thanks,

Laszlo


On Thu, May 10, 2012 at 2:34 PM, Dave Angel <d [at] davea> wrote:
> On 05/10/2012 08:14 AM, Jabba Laci wrote:
>> Hi,
>>
>> I would like to do some parallel programming with Python but I don't
>> know how to start. There are several ways to go but I don't know what
>> the differences are between them: threads, multiprocessing, gevent,
>> etc.
>>
>> I want to use a single machine with several cores. I want to solve
>> problems like this: iterate over a loop (with millions of steps) and
>> do some work at each step. The steps are independent, so here I would
>> like to process several steps in parallel. I want to store the results
>> in a global list (which should be "synchronised"). Typical use case:
>> crawl webpages, extract images and collect the images in a list.
>>
>> What's the best way?
>>
>> Thanks,
>>
>> Laszlo
>
> There's no single best-way.  First question is your programming
> environment.  That includes the OS you're running, and the version # and
> implementation of Python.
>
> I'll assume you're using CPython 2.7 on Linux, which is what I have the
> most experience on.  But after you answer, others will probably make
> suggestions appropriate to whatever you're actually using
>
> Next question is whether the problem you're solving at any given moment
> is cpu-bound or i/o bound.  I'll try to answer for both cases, here.
>
> CPU-bound:
> In CPython 2.7, there's a GIL, which is a global lock preventing more
> than one CPU-bound thread from running at the same time.  it's more
> complex than that, but bottom line is that multiple threads won't help
> (and might hurt) a CPU-bound program, even in a multi-core situation.
> So use multiple processes, and cooperate between them with queues or
> shared memory, or even files. In fact, you can use multiple computers,
> and communicate using sockets, in many cases.
>
> IO-bound:
> This is what CPython is good at solving with threads.  Once you make a
> blocking I/O call, usually the C code involves releases the GIL, and
> other threads can run.  For this situation, the fact that you can share
> data structures makes threads a performance win.
>
> Web crawling is likely to be IO-bound, but i wanted to be as complete as
> I could.
>
> --
>
> DaveA
>
--
http://mail.python.org/mailman/listinfo/python-list


torriem at gmail

May 10, 2012, 9:01 AM

Post #5 of 7 (604 views)
Permalink
Re: parallel programming in Python [In reply to]

On 05/10/2012 06:46 AM, Devin Jeanpierre wrote:
> On Thu, May 10, 2012 at 8:14 AM, Jabba Laci <jabba.laci [at] gmail> wrote:
>> What's the best way?
>
>>From what I've heard, http://scrapy.org/ . It is a single-thread
> single-process web crawler that nonetheless can download things
> concurrently.

Yes, for i/o bound things, asynchronous (event-driven callbacks, where
events are triggered by the data) will usually beat multi-threaded.
Sometimes a combination of multi-threaded and asynchronous is necessary
(thread pools).

Twisted is another asynchronous framework core that is very popular for
lots of things, both clients and servers.
--
http://mail.python.org/mailman/listinfo/python-list


jabba.laci at gmail

May 29, 2012, 7:43 AM

Post #6 of 7 (554 views)
Permalink
Re: parallel programming in Python [In reply to]

Hehe, I just asked this question a few days ago but I didn't become
much cleverer:

http://www.gossamer-threads.com/lists/python/python/985701

Best,

Laszlo

On Thu, May 10, 2012 at 2:14 PM, Jabba Laci <jabba.laci [at] gmail> wrote:
> Hi,
>
> I would like to do some parallel programming with Python but I don't
> know how to start. There are several ways to go but I don't know what
> the differences are between them: threads, multiprocessing, gevent,
> etc.
>
> I want to use a single machine with several cores. I want to solve
> problems like this: iterate over a loop (with millions of steps) and
> do some work at each step. The steps are independent, so here I would
> like to process several steps in parallel. I want to store the results
> in a global list (which should be "synchronised"). Typical use case:
> crawl webpages, extract images and collect the images in a list.
>
> What's the best way?
>
> Thanks,
>
> Laszlo
--
http://mail.python.org/mailman/listinfo/python-list


werner at thieprojects

May 29, 2012, 9:06 AM

Post #7 of 7 (553 views)
Permalink
Re: parallel programming in Python [In reply to]

For such tasks my choice would be twisted combined with ampoule.
Let's you spread out work to whatever amount of processes you desire,
maxing out whatever iron you're sitting on..

HTH, Werner

http://twistedmatrix.com/trac/
https://launchpad.net/ampoule

On 29.05.2012 16:43, Jabba Laci wrote:
> Hehe, I just asked this question a few days ago but I didn't become
> much cleverer:
>
> http://www.gossamer-threads.com/lists/python/python/985701
>
> Best,
>
> Laszlo
>
> On Thu, May 10, 2012 at 2:14 PM, Jabba Laci<jabba.laci [at] gmail> wrote:
>> Hi,
>>
>> I would like to do some parallel programming with Python but I don't
>> know how to start. There are several ways to go but I don't know what
>> the differences are between them: threads, multiprocessing, gevent,
>> etc.
>>
>> I want to use a single machine with several cores. I want to solve
>> problems like this: iterate over a loop (with millions of steps) and
>> do some work at each step. The steps are independent, so here I would
>> like to process several steps in parallel. I want to store the results
>> in a global list (which should be "synchronised"). Typical use case:
>> crawl webpages, extract images and collect the images in a list.
>>
>> What's the best way?
>>
>> Thanks,
>>
>> Laszlo
Attachments: werner.vcf (0.29 KB)

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.