Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Pass data to a subprocess

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


andrea.crotti.0 at gmail

Jul 31, 2012, 6:56 AM

Post #1 of 21 (2359 views)
Permalink
Pass data to a subprocess

I'm having fun in the world of multiprocessing, and I would like some
suggestions.

For example suppose I want to create many processes and pass them some
data to process (also why they are running).

I found many nice things (Pipe, Manager and so on), but actually even
this seems to work:


class MyProcess(Process):
def __init__(self):
Process.__init__(self)
self.ls = []

def __str__(self):
return str(self.ls)

def add(self, ls):
self.ls += ls

def run(self):
print("running the process in another subprocess")


def procs():
mp = MyProcess()
mp.start()
# with the join we are actually waiting for the end of the running time
mp.join()
mp.add([1,2,3])
mp.add([2,3,4])
print(mp)


Which is a bit surprising, because it means that I can pass data to an
object that is running on another process.
Is it because of some magic in the background and can I rely on that or
simply I didn't understand how it works?
--
http://mail.python.org/mailman/listinfo/python-list


andrea.crotti.0 at gmail

Jul 31, 2012, 7:12 AM

Post #2 of 21 (2342 views)
Permalink
Re: Pass data to a subprocess [In reply to]

>
>
> def procs():
> mp = MyProcess()
> # with the join we are actually waiting for the end of the running time
> mp.add([1,2,3])
> mp.start()
> mp.add([2,3,4])
> mp.join()
> print(mp)
>

I think I got it now, if I already just mix the start before another
add, inside the Process.run it won't see the new data that has been
added after the start.

So this way is perfectly safe only until the process is launched, if
it's running I need to use some multiprocess-aware data structure, is
that correct?
--
http://mail.python.org/mailman/listinfo/python-list


gandalf at shopzeus

Jul 31, 2012, 7:46 AM

Post #3 of 21 (2334 views)
Permalink
Re: Pass data to a subprocess [In reply to]

> I think I got it now, if I already just mix the start before another
add, inside the Process.run it won't see the new data that has been
added after the start. So this way is perfectly safe only until the
process is launched, if it's running I need to use some
multiprocess-aware data structure, is that correct?

Yes. Read this:

http://docs.python.org/library/multiprocessing.html#exchanging-objects-between-processes

You can use Queues and Pipes. Actually, these are basic elements of the
multiprocessing module and they are well documented. I wonder if you
read the documentation at all, before posting questions here.


--
http://mail.python.org/mailman/listinfo/python-list


andrea.crotti.0 at gmail

Jul 31, 2012, 8:26 AM

Post #4 of 21 (2336 views)
Permalink
Re: Pass data to a subprocess [In reply to]

2012/7/31 Laszlo Nagy <gandalf [at] shopzeus>:
>> I think I got it now, if I already just mix the start before another add,
>> inside the Process.run it won't see the new data that has been added after
>> the start. So this way is perfectly safe only until the process is launched,
>> if it's running I need to use some multiprocess-aware data structure, is
>> that correct?
>
> Yes. Read this:
>
> http://docs.python.org/library/multiprocessing.html#exchanging-objects-between-processes
>
> You can use Queues and Pipes. Actually, these are basic elements of the
> multiprocessing module and they are well documented. I wonder if you read
> the documentation at all, before posting questions here.
>
>
> --
> http://mail.python.org/mailman/listinfo/python-list


As I wrote "I found many nice things (Pipe, Manager and so on), but
actually even
this seems to work:" yes I did read the documentation.

I was just surprised that it worked better than I expected even
without Pipes and Queues, but now I understand why..

Anyway now I would like to be able to detach subprocesses to avoid the
nasty code reloading that I was talking about in another thread, but
things get more tricky, because I can't use queues and pipes to
communicate with a running process that it's noit my child, correct?
--
http://mail.python.org/mailman/listinfo/python-list


gandalf at shopzeus

Aug 1, 2012, 1:19 AM

Post #5 of 21 (2324 views)
Permalink
Re: Pass data to a subprocess [In reply to]

>
> As I wrote "I found many nice things (Pipe, Manager and so on), but
> actually even
> this seems to work:" yes I did read the documentation.
Sorry, I did not want be offensive.
>
> I was just surprised that it worked better than I expected even
> without Pipes and Queues, but now I understand why..
>
> Anyway now I would like to be able to detach subprocesses to avoid the
> nasty code reloading that I was talking about in another thread, but
> things get more tricky, because I can't use queues and pipes to
> communicate with a running process that it's noit my child, correct?
>
Yes, I think that is correct. Instead of detaching a child process, you
can create independent processes and use other frameworks for IPC. For
example, Pyro. It is not as effective as multiprocessing.Queue, but in
return, you will have the option to run your service across multiple
servers.

The most effective IPC is usually through shared memory. But there is no
OS independent standard Python module that can communicate over shared
memory. Except multiprocessing of course, but AFAIK it can only be used
to communicate between fork()-ed processes.
--
http://mail.python.org/mailman/listinfo/python-list


andrea.crotti.0 at gmail

Aug 1, 2012, 2:50 AM

Post #6 of 21 (2324 views)
Permalink
Re: Pass data to a subprocess [In reply to]

2012/8/1 Laszlo Nagy <gandalf [at] shopzeus>:
>> I was just surprised that it worked better than I expected even
>> without Pipes and Queues, but now I understand why..
>>
>> Anyway now I would like to be able to detach subprocesses to avoid the
>> nasty code reloading that I was talking about in another thread, but
>> things get more tricky, because I can't use queues and pipes to
>> communicate with a running process that it's noit my child, correct?
>>
> Yes, I think that is correct. Instead of detaching a child process, you can
> create independent processes and use other frameworks for IPC. For example,
> Pyro. It is not as effective as multiprocessing.Queue, but in return, you
> will have the option to run your service across multiple servers.
>
> The most effective IPC is usually through shared memory. But there is no OS
> independent standard Python module that can communicate over shared memory.
> Except multiprocessing of course, but AFAIK it can only be used to
> communicate between fork()-ed processes.


Thanks, there is another thing which is able to interact with running
processes in theory:
https://github.com/lmacken/pyrasite

I don't know though if it's a good idea to use a similar approach for
production code, as far as I understood it uses gdb.. In theory
though I could be able to set up every subprocess with all the data
they need, so I might not even need to share data between them.

Anyway now I had another idea to avoid to be able to stop the main
process without killing the subprocesses, using multiple forks. Does
the following makes sense? I don't really need these subprocesses to
be daemons since they should quit when done, but is there anything
that can go wrong with this approach?

from os import fork
from time import sleep
from itertools import count
from sys import exit

from multiprocessing import Process, Queue

class LongProcess(Process):
def __init__(self, idx, queue):
Process.__init__(self)
# self.daemon = True
self.queue = queue
self.idx = idx

def run(self):
for i in count():
self.queue.put("%d: %d" % (self.idx, i))
print("adding %d: %d" % (self.idx, i))
sleep(2)


if __name__ == '__main__':
qu = Queue()

# how do I do a multiple fork?
for i in range(5):
pid = fork()
# if I create here all the data structures I should still be
able to do things
if pid == 0:
lp = LongProcess(1, qu)
lp.start()
lp.join()
exit(0)
else:
print("started subprocess with pid ", pid)
--
http://mail.python.org/mailman/listinfo/python-list


gandalf at shopzeus

Aug 1, 2012, 3:16 AM

Post #7 of 21 (2326 views)
Permalink
Re: Pass data to a subprocess [In reply to]

>
> Thanks, there is another thing which is able to interact with running
> processes in theory:
> https://github.com/lmacken/pyrasite
>
> I don't know though if it's a good idea to use a similar approach for
> production code, as far as I understood it uses gdb.. In theory
> though I could be able to set up every subprocess with all the data
> they need, so I might not even need to share data between them.
>
> Anyway now I had another idea to avoid to be able to stop the main
> process without killing the subprocesses, using multiple forks. Does
> the following makes sense? I don't really need these subprocesses to
> be daemons since they should quit when done, but is there anything
> that can go wrong with this approach?
On thing is sure: os.fork() doesn't work under Microsoft Windows. Under
Unix, I'm not sure if os.fork() can be mixed with
multiprocessing.Process.start(). I could not find official documentation
on that. This must be tested on your actual platform. And don't forget
to use Queue.get() in your test. :-)

--
http://mail.python.org/mailman/listinfo/python-list


andrea.crotti.0 at gmail

Aug 1, 2012, 3:32 AM

Post #8 of 21 (2323 views)
Permalink
Re: Pass data to a subprocess [In reply to]

2012/8/1 Laszlo Nagy <gandalf [at] shopzeus>:
> On thing is sure: os.fork() doesn't work under Microsoft Windows. Under
> Unix, I'm not sure if os.fork() can be mixed with
> multiprocessing.Process.start(). I could not find official documentation on
> that. This must be tested on your actual platform. And don't forget to use
> Queue.get() in your test. :-)
>

Yes I know we don't care about Windows for this particular project..
I think mixing multiprocessing and fork should not harm, but probably
is unnecessary since I'm already in another process after the fork so
I can just make it run what I want.

Otherwise is there a way to do same thing only using multiprocessing?
(running a process that is detachable from the process that created it)
--
http://mail.python.org/mailman/listinfo/python-list


gandalf at shopzeus

Aug 1, 2012, 3:40 AM

Post #9 of 21 (2324 views)
Permalink
Re: Pass data to a subprocess [In reply to]

> Yes I know we don't care about Windows for this particular project..
> I think mixing multiprocessing and fork should not harm, but probably
> is unnecessary since I'm already in another process after the fork so
> I can just make it run what I want.
>
> Otherwise is there a way to do same thing only using multiprocessing?
> (running a process that is detachable from the process that created it)
>
I'm afraid there is no way to do that. I'm not even sure if
multiprocessing.Queue will work if you detach a forked process.
--
http://mail.python.org/mailman/listinfo/python-list


roy at panix

Aug 1, 2012, 3:59 AM

Post #10 of 21 (2322 views)
Permalink
Re: Pass data to a subprocess [In reply to]

In article <mailman.2809.1343809166.4697.python-list [at] python>,
Laszlo Nagy <gandalf [at] shopzeus> wrote:

> Yes, I think that is correct. Instead of detaching a child process, you
> can create independent processes and use other frameworks for IPC. For
> example, Pyro. It is not as effective as multiprocessing.Queue, but in
> return, you will have the option to run your service across multiple
> servers.

You might want to look at beanstalk (http://kr.github.com/beanstalkd/).
We've been using it in production for the better part of two years. At
a 30,000 foot level, it's an implementation of queues over named pipes
over TCP, but it takes care of a zillion little details for you.

Setup is trivial, and there's clients for all sorts of languages. For a
Python client, go with beanstalkc (pybeanstalk appears to be
abandonware).
>
> The most effective IPC is usually through shared memory. But there is no
> OS independent standard Python module that can communicate over shared
> memory.

It's true that shared memory is faster than serializing objects over a
TCP connection. On the other hand, it's hard to imagine anything
written in Python where you would notice the difference.
--
http://mail.python.org/mailman/listinfo/python-list


gandalf at shopzeus

Aug 1, 2012, 4:07 AM

Post #11 of 21 (2324 views)
Permalink
Re: Pass data to a subprocess [In reply to]

>> The most effective IPC is usually through shared memory. But there is no
>> OS independent standard Python module that can communicate over shared
>> memory.
> It's true that shared memory is faster than serializing objects over a
> TCP connection. On the other hand, it's hard to imagine anything
> written in Python where you would notice the difference.
Well, except in response times. ;-)

The TCP stack likes to wait after you call send() on a socket. Yes, you
can use setsockopt/TCP_NOWAIT, but my experience is that response times
with TCP can be long, especially when you have to do many
request-response pairs.

It also depends on the protocol design - if you can reduce the number of
request-response pairs then it helps a lot.
--
http://mail.python.org/mailman/listinfo/python-list


gandalf at shopzeus

Aug 1, 2012, 4:26 AM

Post #12 of 21 (2326 views)
Permalink
Re: Pass data to a subprocess [In reply to]

On 2012-08-01 12:59, Roy Smith wrote:
> In article <mailman.2809.1343809166.4697.python-list [at] python>,
> Laszlo Nagy <gandalf [at] shopzeus> wrote:
>
>> Yes, I think that is correct. Instead of detaching a child process, you
>> can create independent processes and use other frameworks for IPC. For
>> example, Pyro. It is not as effective as multiprocessing.Queue, but in
>> return, you will have the option to run your service across multiple
>> servers.
> You might want to look at beanstalk (http://kr.github.com/beanstalkd/).
> We've been using it in production for the better part of two years. At
> a 30,000 foot level, it's an implementation of queues over named pipes
> over TCP, but it takes care of a zillion little details for you.
Looks very simple to use. Too bad that it doesn't work on Windows systems.
--
http://mail.python.org/mailman/listinfo/python-list


andrea.crotti.0 at gmail

Aug 1, 2012, 6:25 AM

Post #13 of 21 (2323 views)
Permalink
Re: Pass data to a subprocess [In reply to]

2012/8/1 Roy Smith <roy [at] panix>:
> In article <mailman.2809.1343809166.4697.python-list [at] python>,
> Laszlo Nagy <gandalf [at] shopzeus> wrote:
>
>> Yes, I think that is correct. Instead of detaching a child process, you
>> can create independent processes and use other frameworks for IPC. For
>> example, Pyro. It is not as effective as multiprocessing.Queue, but in
>> return, you will have the option to run your service across multiple
>> servers.
>
> You might want to look at beanstalk (http://kr.github.com/beanstalkd/).
> We've been using it in production for the better part of two years. At
> a 30,000 foot level, it's an implementation of queues over named pipes
> over TCP, but it takes care of a zillion little details for you.
>
> Setup is trivial, and there's clients for all sorts of languages. For a
> Python client, go with beanstalkc (pybeanstalk appears to be
> abandonware).
>>
>> The most effective IPC is usually through shared memory. But there is no
>> OS independent standard Python module that can communicate over shared
>> memory.
>
> It's true that shared memory is faster than serializing objects over a
> TCP connection. On the other hand, it's hard to imagine anything
> written in Python where you would notice the difference.
> --
> http://mail.python.org/mailman/listinfo/python-list


That does look nice and I would like to have something like that..
But since I have to convince my boss of another external dependency I
think it might be worth
to try out zeromq instead, which can also do similar things and looks
more powerful, what do you think?
--
http://mail.python.org/mailman/listinfo/python-list


invalid at invalid

Aug 1, 2012, 7:16 AM

Post #14 of 21 (2329 views)
Permalink
Re: Pass data to a subprocess [In reply to]

On 2012-08-01, Laszlo Nagy <gandalf [at] shopzeus> wrote:
>>
>> As I wrote "I found many nice things (Pipe, Manager and so on), but
>> actually even
>> this seems to work:" yes I did read the documentation.
> Sorry, I did not want be offensive.
>>
>> I was just surprised that it worked better than I expected even
>> without Pipes and Queues, but now I understand why..
>>
>> Anyway now I would like to be able to detach subprocesses to avoid the
>> nasty code reloading that I was talking about in another thread, but
>> things get more tricky, because I can't use queues and pipes to
>> communicate with a running process that it's noit my child, correct?
>>
> Yes, I think that is correct.

I don't understand why detaching a child process on Linux/Unix would
make IPC stop working. Can somebody explain?

--
Grant Edwards grant.b.edwards Yow! My vaseline is
at RUNNING...
gmail.com
--
http://mail.python.org/mailman/listinfo/python-list


gandalf at shopzeus

Aug 1, 2012, 7:32 AM

Post #15 of 21 (2328 views)
Permalink
Re: Pass data to a subprocess [In reply to]

>>> things get more tricky, because I can't use queues and pipes to
>>> communicate with a running process that it's noit my child, correct?
>>>
>> Yes, I think that is correct.
> I don't understand why detaching a child process on Linux/Unix would
> make IPC stop working. Can somebody explain?
>
It is implemented with shared memory. I think (although I'm not 100%
sure) that shared memory is created *and freed up* (shm_unlink() system
call) by the parent process. It makes sense, because the child processes
will surely die with the parent. If you detach a child process, then it
won't be killed with its original parent. But the shared memory will be
freed by the original parent process anyway. I suspect that the child
that has mapped that shared memory segment will try to access a freed up
resource, do a segfault or something similar.
--
http://mail.python.org/mailman/listinfo/python-list


gandalf at shopzeus

Aug 1, 2012, 7:42 AM

Post #16 of 21 (2322 views)
Permalink
Re: Pass data to a subprocess [In reply to]

>>> Yes, I think that is correct.
>> I don't understand why detaching a child process on Linux/Unix would
>> make IPC stop working. Can somebody explain?
>>
> It is implemented with shared memory. I think (although I'm not 100%
> sure) that shared memory is created *and freed up* (shm_unlink()
> system call) by the parent process. It makes sense, because the child
> processes will surely die with the parent. If you detach a child
> process, then it won't be killed with its original parent. But the
> shared memory will be freed by the original parent process anyway. I
> suspect that the child that has mapped that shared memory segment will
> try to access a freed up resource, do a segfault or something similar.
So detaching the child process will not make IPC stop working. But
exiting from the original parent process will. (And why else would you
detach the child?)

--
http://mail.python.org/mailman/listinfo/python-list


andrea.crotti.0 at gmail

Aug 1, 2012, 8:24 AM

Post #17 of 21 (2323 views)
Permalink
Re: Pass data to a subprocess [In reply to]

2012/8/1 Laszlo Nagy <gandalf [at] shopzeus>:
>
> So detaching the child process will not make IPC stop working. But exiting
> from the original parent process will. (And why else would you detach the
> child?)
>
> --
> http://mail.python.org/mailman/listinfo/python-list


Well it makes perfect sense if it stops working to me, so or
- I use zeromq or something similar to communicate
- I make every process independent without the need to further
communicate with the parent..
--
http://mail.python.org/mailman/listinfo/python-list


roy at panix

Aug 1, 2012, 12:07 PM

Post #18 of 21 (2311 views)
Permalink
Re: Pass data to a subprocess [In reply to]

On Aug 1, 2012, at 9:25 AM, andrea crotti wrote:

> [beanstalk] does look nice and I would like to have something like that..
> But since I have to convince my boss of another external dependency I
> think it might be worth
> to try out zeromq instead, which can also do similar things and looks
> more powerful, what do you think?

I'm afraid I have no experience with zeromq, so I can't offer an opinion.

--
Roy Smith
roy [at] panix



--
http://mail.python.org/mailman/listinfo/python-list


invalid at invalid

Aug 1, 2012, 12:48 PM

Post #19 of 21 (2315 views)
Permalink
Re: Pass data to a subprocess [In reply to]

On 2012-08-01, Laszlo Nagy <gandalf [at] shopzeus> wrote:
>
>>>> things get more tricky, because I can't use queues and pipes to
>>>> communicate with a running process that it's noit my child, correct?
>>>>
>>> Yes, I think that is correct.
>> I don't understand why detaching a child process on Linux/Unix would
>> make IPC stop working. Can somebody explain?
>
> It is implemented with shared memory. I think (although I'm not 100%
> sure) that shared memory is created *and freed up* (shm_unlink() system
> call) by the parent process. It makes sense, because the child processes
> will surely die with the parent. If you detach a child process, then it
> won't be killed with its original parent. But the shared memory will be
> freed by the original parent process anyway. I suspect that the child
> that has mapped that shared memory segment will try to access a freed up
> resource, do a segfault or something similar.

I still don't get it. shm_unlink() works the same way unlink() does.
The resource itself doesn't cease to exist until all open file handles
are closed. From the shm_unlink() man page on Linux:

The operation of shm_unlink() is analogous to unlink(2): it
removes a shared memory object name, and, once all processes
have unmapped the object, de-allocates and destroys the
contents of the associated memory region. After a successful
shm_unlink(), attempts to shm_open() an object with the same
name will fail (unless O_CREAT was specified, in which case a
new, distinct object is created).

Even if the parent calls shm_unlink(), the shared-memory resource will
continue to exist (and be usable) until all processes that are holding
open file handles unmap/close them. So not only will detached
children not crash, they'll still be able to use the shared memory
objects to talk to each other.

--
Grant Edwards grant.b.edwards Yow! Why is it that when
at you DIE, you can't take
gmail.com your HOME ENTERTAINMENT
CENTER with you??
--
http://mail.python.org/mailman/listinfo/python-list


gandalf at shopzeus

Aug 1, 2012, 11:10 PM

Post #20 of 21 (2310 views)
Permalink
Re: Pass data to a subprocess [In reply to]

> I still don't get it. shm_unlink() works the same way unlink() does.
> The resource itself doesn't cease to exist until all open file handles
> are closed. From the shm_unlink() man page on Linux:
>
> The operation of shm_unlink() is analogous to unlink(2): it
> removes a shared memory object name, and, once all processes
> have unmapped the object, de-allocates and destroys the
> contents of the associated memory region. After a successful
> shm_unlink(), attempts to shm_open() an object with the same
> name will fail (unless O_CREAT was specified, in which case a
> new, distinct object is created).
>
> Even if the parent calls shm_unlink(), the shared-memory resource will
> continue to exist (and be usable) until all processes that are holding
> open file handles unmap/close them. So not only will detached
> children not crash, they'll still be able to use the shared memory
> objects to talk to each other.
>
I stand corrected. It should still be examined, what kind shared memory
is used under non-linux systems. System V on AIX? And what about
Windows? So maybe the general answer is still no. But I guess that the
OP wanted this to work on a specific system.

Dear Andrea Crotti! Please try to detach two child processes, exit from
the main process, and communicate over a multiprocessing queue. It will
possibly work. Sorry for my bad advice.
--
http://mail.python.org/mailman/listinfo/python-list


invalid at invalid

Aug 2, 2012, 7:29 AM

Post #21 of 21 (2309 views)
Permalink
Re: Pass data to a subprocess [In reply to]

On 2012-08-02, Laszlo Nagy <gandalf [at] shopzeus> wrote:
>
>> I still don't get it. shm_unlink() works the same way unlink() does.
>> The resource itself doesn't cease to exist until all open file
>> handles are closed. From the shm_unlink() man page on Linux:
>>
>> The operation of shm_unlink() is analogous to unlink(2): it
>> removes a shared memory object name, and, once all processes
>> have unmapped the object, de-allocates and destroys the
>> contents of the associated memory region. After a successful
>> shm_unlink(), attempts to shm_open() an object with the same
>> name will fail (unless O_CREAT was specified, in which case a
>> new, distinct object is created).
>>
>> Even if the parent calls shm_unlink(), the shared-memory resource
>> will continue to exist (and be usable) until all processes that are
>> holding open file handles unmap/close them. So not only will
>> detached children not crash, they'll still be able to use the shared
>> memory objects to talk to each other.

Note that when I say the detached children will still be able to talk
to each other using shared memory after the parent calls shm_unlink()
and exit(), I'm talking about the general case -- not specifically
about the multiprocessing module. There may be something else going on
with the multiprocessing module.

> I stand corrected. It should still be examined, what kind shared
> memory is used under non-linux systems. System V on AIX? And what
> about Windows? So maybe the general answer is still no. But I guess
> that the OP wanted this to work on a specific system.
>
> Dear Andrea Crotti! Please try to detach two child processes, exit
> from the main process, and communicate over a multiprocessing queue.
> It will possibly work. Sorry for my bad advice.

I'm not claiming it will work, since I don't know how the IPC in the
multiprocessing module works. It may indeed break when a child
process is detatched (which I'm assuming means being removed from the
process group and/or detached from the controlling tty).

But, I'm not aware of any underlying Unix IPC mechanism that breaks
when a child is detached, so I was curious about what would cause
multiprocessing's IPC to break.

--
Grant Edwards grant.b.edwards Yow! I didn't order any
at WOO-WOO ... Maybe a YUBBA
gmail.com ... But no WOO-WOO!
--
http://mail.python.org/mailman/listinfo/python-list

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.