Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Python

Bidirectional communication through pipes: read/write popen()

 

 

Python python RSS feed   Index | Next | Previous | View Threaded


hniksic at srce

Oct 16, 1999, 5:47 PM

Post #1 of 12 (806 views)
Permalink
Bidirectional communication through pipes: read/write popen()

In all kinds of circumstances it would be very useful to call an
external filter to process some data, and read the results back in.
What I needed was something like popen(), only working for both
reading and writing. However, such a thing is hard to write in a
simple-minded fashion because of deadlocks that occur when handling
more than several bytes of data. Deadlocks can either be caused by
both programs waiting for not-yet-generated input, or (in my case) by
both their writes being blocked waiting for the other to read.

The usual choices are to:

a) Write a deadlock-free communication protocol and use it on both
ends. This is rarely a good solution, because the program that
needs to be invoked is in most cases an external filter that knows
nothing about our deadlock problems.

b) Use PTY's instead of pipes. Many programmers prefer to avoid this
path because of the added system resources that the PTY's require,
and because of the increased complexity.

Given these choices, most people opt to use a temporary file and get
it over with.

However, discussing this problem with a colleague, he thought of a
third solution: break the circularity by using a process only for
reading and writing. This can be done whenever reading and writing
are independent, i.e. when the data read from the subprocess does not
influence future writes to it.

The function below implements that idea. Usage is something like:

rwpopen("""Some long string here...""", "sed", ["s/long/short/"])
-> 'Some short string here...'

I've put the function to good use in a program I'm writing. In
addition to getting rid of temporary files, the whole operation timed
faster than using a tmpfile (that was on a single-CPU machine). The
function will, of course, work only under Unix and its lookalikes.
Additional information is embedded in the docstring and the comments.

I'd like to hear feedback. Do other people find such a thing useful?
Is there a fundamental flaw or a possibility of a deadlock that I'm
missing?


def rwpopen(input, command, args=[]):
"""Execute command with args, pipe input into it, and read it back.
Return the result read from the command.

Normally, when a process tries to write to a child process and
read back its output, a deadlock condition can occur easily,
either by both processes waiting for not-yet-generated input, or
by both their writes() being blocked waiting for the other to
read.

This function prevents deadlocks by using separate processes for
reading and writing, at the expense of an additional fork(). That
way the process that writes to an exec'ed command and the process
that reads from the command are fully independent, and no deadlock
can occur. The child process exits immediately after writing.

More precisely: the current process (A) forks off a process B,
which in turns forks off a process C. While C does the usual
dup,close,exec thing, B merely writes the data to the pipe and
exits. Independently of B, A reads C's response. A deadlock
cannot occur because A and B are independent of each other -- even
if B's write() is stopped because it filled up the pipe buffer, A
will happily keep reading C's output, and B's write() will be
resumed shortly.
"""
# XXX Redo this as a class, with overridable methods for reading
# and writing.
#
# XXX Provide error-checking and propagating exceptions from child
# to parent. This would require either wait()ing on the child
# (which is a bag of worms), or opening another pipe for
# transmitting error messages or serialized exception objects.
#
# XXX This function expects the system to wait for the child upon
# receiving SIGCHLD. This should be the case on most systems as
# long as SIGCHLD is handled by SIG_DFL. If this is not the case,
# zombies will remain.

def safe_traceback():
# Child processes catch exceptions so that they can exit using
# os._exit() without fanfare. They use this function to print
# the traceback to stderr before dying.
import traceback
sys.stderr.write("Error in child process, pid %d.\n" %
os.getpid())
sys.stderr.flush()
traceback.print_exc()
sys.stderr.flush()

# It would be nice if Python provided a way to see if pipes are
# bidirectional. In that case, we could open only one pipe
# instead of two, with p_readfd == p_writefd and c_readfd ==
# c_writefd.
p_readfd, c_writefd = os.pipe()
c_readfd, p_writefd = os.pipe()
if os.fork():
# Parent
for fd in (c_readfd, c_writefd, p_writefd):
os.close(fd)
# Convert the pipe fd to a file object, so we can use its
# read() method to read all data.
fp = os.fdopen(p_readfd, 'r')
result = fp.read()
fp.close() # Will close p_readfd.
return result
else:
# Child
try:
if os.fork():
# Still the same child
os.write(p_writefd, input)
else:
# Grandchild
try:
# Redirect the pipe to stdin.
os.close(0)
os.dup(c_readfd)
# Redirect stdout to the pipe.
os.close(1)
os.dup(c_writefd)
# Now close unneeded descriptors.
for fd in (c_readfd, c_writefd, p_readfd, p_writefd):
os.close(fd)
# Finally, execute the external command.
os.execvp(command, [command] + args)
except:
safe_traceback()
os._exit(127)
except:
safe_traceback()
os._exit(127)
else:
os._exit(0)


tre17 at student

Oct 16, 1999, 7:08 PM

Post #2 of 12 (794 views)
Permalink
Bidirectional communication through pipes: read/write popen() [In reply to]

Looks good. Could this also be done using threads?

Possiblity:

create two threads, one for reading and the other for writing. Then
you can pass strings to the writing thread and get back the buffered
result from the reading thread. This allows interactive communication
without the danger of deadlocks.

Has this been done before or should I give it a go?

--
Tim Evans


phd at sun

Oct 17, 1999, 5:14 AM

Post #3 of 12 (796 views)
Permalink
Bidirectional communication through pipes: read/write popen() [In reply to]

Hi!

On 17 Oct 1999, Hrvoje Niksic wrote:
> def rwpopen(input, command, args=[]):
^^^^^^^
There was well-know problem with passing mutable object as a default.
Not sure if it was fixed in recent versions of Python...

Oleg.
----
Oleg Broytmann National Research Surgery Centre http://sun.med.ru/~phd/
Programmers don't die, they just GOSUB without RETURN.


fredrik at pythonware

Oct 17, 1999, 5:33 AM

Post #4 of 12 (800 views)
Permalink
Bidirectional communication through pipes: read/write popen() [In reply to]

Oleg Broytmann <phd [at] sun> wrote:
> On 17 Oct 1999, Hrvoje Niksic wrote:
> > def rwpopen(input, command, args=[]):
> ^^^^^^^
> There was well-know problem with passing mutable object as a default.
> Not sure if it was fixed in recent versions of Python...

well, that's only a problem if you modify the object
inside the function...

(and no, it hasn't been fixed. I doubt it can be
fixed without breaking stuff).

...

but to make the code a bit more flexible, I'd change
the execvp call to:

os.execvp(command, (command,) + tuple(args))

or, if you prefer:

os.execvp(command, [command] + list(args))

(this allows the caller to use *any* kind of sequence,
not just a list).

</F>

<!-- coming monday:
http://www.pythonware.com/people/fredrik/librarybook.htm
(the eff-bot guide to) the standard python library. -->


phd at sun

Oct 17, 1999, 5:36 AM

Post #5 of 12 (797 views)
Permalink
Bidirectional communication through pipes: read/write popen() [In reply to]

On Sun, 17 Oct 1999, Fredrik Lundh wrote:
> Oleg Broytmann <phd [at] sun> wrote:
> > On 17 Oct 1999, Hrvoje Niksic wrote:
> > > def rwpopen(input, command, args=[]):
> > ^^^^^^^
> > There was well-know problem with passing mutable object as a default.
> > Not sure if it was fixed in recent versions of Python...
>
> well, that's only a problem if you modify the object
> inside the function...

Sooner or later you forget about it and modify args and what? :) No, I
better will avoid this completely until Python fixes it.

> (and no, it hasn't been fixed. I doubt it can be
> fixed without breaking stuff).

Mmm??? Are there a line of code that *relies* on that misfeature?

Oleg.
----
Oleg Broytmann National Research Surgery Centre http://sun.med.ru/~phd/
Programmers don't die, they just GOSUB without RETURN.


fredrik at pythonware

Oct 17, 1999, 5:53 AM

Post #6 of 12 (795 views)
Permalink
Bidirectional communication through pipes: read/write popen() [In reply to]

> > (and no, it hasn't been fixed. I doubt it can be
> > fixed without breaking stuff).
>
> Mmm??? Are there a line of code that *relies* on that misfeature?

yes, there are tons of code that relies on the
fact that the default values are evaluated once,
and more importantly, that they are evaluated
in the namespace where the function/lambda
is defined.

in fact, it's currently the only reasonable way
to pass local variables into a nested namespace
(like when using lambdas). it's also often used
to speed things up, by binding commonly used
globals to local names.

...

but sure, I'm sure Guido is open for proposals. I
don't think you can get away with "always evaluate
them on each call," though...

</F>

<!-- coming monday:
http://www.pythonware.com/people/fredrik/librarybook.htm
(the eff-bot guide to) the standard python library. -->


phd at sun

Oct 17, 1999, 5:59 AM

Post #7 of 12 (790 views)
Permalink
Bidirectional communication through pipes: read/write popen() [In reply to]

Hi!

I marked it with the word "misfeature", but of course I meant only the
problem with mutable types. Sure, I use
lambda x, y=z: ...
often.
(But that's another problem. After 10 years with Pascal, I used to use
local functions, that have access to outer function's variables. Learning
to use lambdas was a little pain for me. And I still hope Python will have
local functions sometime... may be 2.0+)

On Sun, 17 Oct 1999, Fredrik Lundh wrote:
> > Mmm??? Are there a line of code that *relies* on that misfeature?
>
> yes, there are tons of code that relies on the
> fact that the default values are evaluated once,
> and more importantly, that they are evaluated
> in the namespace where the function/lambda
> is defined.
>
> in fact, it's currently the only reasonable way
> to pass local variables into a nested namespace
> (like when using lambdas). it's also often used
> to speed things up, by binding commonly used
> globals to local names.

> but sure, I'm sure Guido is open for proposals. I
> don't think you can get away with "always evaluate
> them on each call," though...

Oleg.
----
Oleg Broytmann National Research Surgery Centre http://sun.med.ru/~phd/
Programmers don't die, they just GOSUB without RETURN.


hniksic at srce

Oct 17, 1999, 10:14 AM

Post #8 of 12 (794 views)
Permalink
Bidirectional communication through pipes: read/write popen() [In reply to]

"Fredrik Lundh" <fredrik [at] pythonware> writes:

> but to make the code a bit more flexible, I'd change
> the execvp call to:
>
> os.execvp(command, (command,) + tuple(args))
[...]
> (this allows the caller to use *any* kind of sequence, not just a
> list).

Thanks for the suggestion; I've now made that change.


hniksic at srce

Oct 17, 1999, 10:17 AM

Post #9 of 12 (805 views)
Permalink
Bidirectional communication through pipes: read/write popen() [In reply to]

Oleg Broytmann <phd [at] sun> writes:

> On Sun, 17 Oct 1999, Fredrik Lundh wrote:
> > Oleg Broytmann <phd [at] sun> wrote:
> > > On 17 Oct 1999, Hrvoje Niksic wrote:
> > > > def rwpopen(input, command, args=[]):
> > > ^^^^^^^
> > > There was well-know problem with passing mutable object as a default.
> > > Not sure if it was fixed in recent versions of Python...
> >
> > well, that's only a problem if you modify the object
> > inside the function...
>
> Sooner or later you forget about it and modify args and what?

No, I don't. It's not generally nice to make destructive
modifications on sequences a function passes as an argument, so I
don't do that with ARGS, regardless of the default value.

> :) No, I better will avoid this completely until Python fixes it.

Your choice, not mine.

> > (and no, it hasn't been fixed. I doubt it can be fixed without
> > breaking stuff).
>
> Mmm??? Are there a line of code that *relies* on that misfeature?

But of course. A misfeature to you is a feature to someone else.


hniksic at srce

Oct 17, 1999, 10:20 AM

Post #10 of 12 (795 views)
Permalink
Bidirectional communication through pipes: read/write popen() [In reply to]

"Tim Evans" <tre17 [at] student> writes:

> Looks good. Could this also be done using threads?

Probably. But this wouldn't work on machines without threading
support, and the task is just too basic to *require* threads.

I don't buy threads as a buzzword -- I believe my particular problem
is solved much more naturally using an helper process. The additional
process touches very little data and modifies none, so COW should make
it inexpensive. My timings show that this is indeed the case.

> create two threads, one for reading and the other for writing. Then
> you can pass strings to the writing thread and get back the buffered
> result from the reading thread. This allows interactive
> communication without the danger of deadlocks.

I think deadlocks can occur as long as there is the writing and
reading thread/process depend on each other.


donn at u

Oct 18, 1999, 12:00 PM

Post #11 of 12 (800 views)
Permalink
Bidirectional communication through pipes: read/write popen() [In reply to]

Quoth Hrvoje Niksic <hniksic [at] srce>:
| In all kinds of circumstances it would be very useful to call an
| external filter to process some data, and read the results back in.
| What I needed was something like popen(), only working for both
| reading and writing. However, such a thing is hard to write in a
| simple-minded fashion because of deadlocks that occur when handling
| more than several bytes of data. Deadlocks can either be caused by
| both programs waiting for not-yet-generated input, or (in my case) by
| both their writes being blocked waiting for the other to read.
|
| The usual choices are to:
|
| a) Write a deadlock-free communication protocol and use it on both
| ends. This is rarely a good solution, because the program that
| needs to be invoked is in most cases an external filter that knows
| nothing about our deadlock problems.
|
| b) Use PTY's instead of pipes. Many programmers prefer to avoid this
| path because of the added system resources that the PTY's require,
| and because of the increased complexity.
|
| Given these choices, most people opt to use a temporary file and get
| it over with.
|
| However, discussing this problem with a colleague, he thought of a
| third solution: break the circularity by using a process only for
| reading and writing. This can be done whenever reading and writing
| are independent, i.e. when the data read from the subprocess does not
| influence future writes to it.
|
| The function below implements that idea. Usage is something like:
|
| rwpopen("""Some long string here...""", "sed", ["s/long/short/"])
| -> 'Some short string here...'
|
| I've put the function to good use in a program I'm writing. In
| addition to getting rid of temporary files, the whole operation timed
| faster than using a tmpfile (that was on a single-CPU machine). The
| function will, of course, work only under Unix and its lookalikes.
| Additional information is embedded in the docstring and the comments.
|
| I'd like to hear feedback. Do other people find such a thing useful?
| Is there a fundamental flaw or a possibility of a deadlock that I'm
| missing?

Interesting idea. I was inspired to try a slightly different
approach, which I will append here.

It's definitely a solution, possibly the only general one, for
deadlocks caused by the pipe buffer size. That's an interesting
problem, but I think a relatively unusual one. In order to get
here, your processes need to be ignoring their input so it stalls
in the pipe ... for example, the parent might wait() for the child
and then read its output, while the child is stuck trying to
finish writing its large output. But I am having a hard time
thinking of an example where it isn't easily avoided. I'm also
surprised that the intermediate process would be more economical
than a temporary file, so I wonder if the resources were all
accounted for. Temporary files do have the liability that their
filesystem may run out of space, but then it seems like a much
safer way to buffer large transfers.

By far the most common intractable deadlock problem is internal
buffering in a command that uses C I/O and hasn't flushed its
own buffer. This is where the pty device comes in, and to my
knowledge it's the only general cure. It works because C I/O
switches to line buffering with a tty device. But again this
problem can be easily avoided in a situation where all the input
for the command can be written before you wait for its output -
just close the pipe after you're finished writing to it! The
problem really arises when you're trying to conduct an exchange
that really needs to alternate reads and writes, like, try to
write a line to "awk", read awk's output, and then write another
line to the same awk process. To do this, you need a pty device.

Anyway, here's my attempt at the 3rd process solution. I made
both processes children of the calling process, the 3rd process
copies I/O both ways, and the caller can issue reads and writes
to the command at its convenience. It's a subclass of a normal
1-stage read/write command. The 3rd process avoids blocking on
reads or writes with the select system call, which is specific
to UNIX.

# Donn Cave, University Computing Services, University of Washington
# donn [at] u
#----------------------------
import os
import select
import sys
import traceback

# External command, with plain read and write dual pipe.
#
# ex. cmd = RWPipe('/bin/sh', ('sh', '-c', 'nslookup'))
# os.write(cmd.input, 'set q=any\n')
# os.write(cmd.input, 'nosuchhost\n')
# os.close(cmd.input)
# while 1:
# x = os.read(cmd.output, 8192)
# if not x:
# break
# print 'output:', x
# status = cmd.wait()
#
# I/O is unbuffered UNIX read/write, caller may make file objects.
#
class RWPipe:
def __init__(self, command, argv, environ = None):
self.command = command
self.argv = argv
if environ is None:
self.environ = os.environ
else:
self.environ = environ
self.start()
def pipexec(self, pipes):
for unit, use in pipes:
os.dup2(use, unit)
os.execve(self.command, self.argv, self.environ)
def setpipes(self, rp, wp, xp):
# Close unused pipe ends.
for p in rp:
# Using read end here.
os.close(p[1])
for p in wp:
# Using write end here.
os.close(p[0])
for p in xp:
# Not using this pipe here.
os.close(p[0])
os.close(p[1])
def start(self):
tocmd = os.pipe()
frcmd = os.pipe()
pid = os.fork()
if not pid:
try:
self.setpipes([tocmd], [frcmd], [])
self.pipexec([(0, tocmd[0]), (1, frcmd[1])])
finally:
traceback.print_exc()
os._exit(127)
self.pid = pid
self.setpipes([frcmd], [tocmd], [])
self.input = tocmd[1]
self.output = frcmd[0]
def wait(self):
p, s = os.waitpid(self.pid, 0)
return (s >> 8) & 0x7f

# Industrial strength external command, with an intermediate process
# that copies I/O, buffering as necessary to avoid deadlock due to
# system pipe buffer size limit.
#
class BigRWPipe(RWPipe):
def buffer(self, xferunits):
# Transfer I/O between pipes: self.buffer([(from, to), ...])
#
xfers = []
for r, w in xferunits:
xfers.append((r, w, ''))
while xfers:
wsel = []
rsel = []
esel = []
nxf = []
for r, w, buf in xfers:
# Compile select masks for active units.
if w >= 0:
if buf:
# Only check for write if any
# data buffered to write.
wsel.append(w)
elif r >= 0:
# If dest invalid, close source.
# Will cause SIGPIPE in source proc.
os.close(r)
r = -1
if r >= 0:
rsel.append(r)
esel.append(r)
elif w >= 0 and not buf:
# If source invalid and no data,
# close dest. Will usually cause
# dest to finish normally.
os.close(w)
w = -1
if w >= 0:
esel.append(w)
if w >= 0 or r >= 0:
nxf.append((r, w, buf))
xfers = nxf
if not xfers:
break

rdset, wdset, edset = select.select(rsel, wsel, esel)

nxf = []
for r, w, buf in xfers:
if r in rdset:
b = os.read(r, 8192)
if b:
buf = buf + b
else:
os.close(r)
r = -1
if r in edset:
r = -1
if w in wdset:
n = os.write(w, buf)
buf = buf[n:]
if w in edset:
w = -1
if r >= 0 or w >= 0:
nxf.append((r, w, buf))
xfers = nxf
def start(self):
frcmd = os.pipe()
tocmd = os.pipe()
frmed = os.pipe()
tomed = os.pipe()

pid = os.fork()
if not pid:
# Set up the buffer process.
try:
self.setpipes([frcmd, tomed], [tocmd, frmed], [])
self.buffer([(frcmd[0], frmed[1]),
(tomed[0], tocmd[1])])
except:
traceback.print_exc()
sys.exit(1)
sys.exit(0)
self.med = pid

pid = os.fork()
if not pid:
try:
self.setpipes([tocmd], [frcmd], [tomed, frmed])
self.pipexec([(0, tocmd[0]), (1, frcmd[1])])
finally:
traceback.print_exc()
os._exit(127)
self.pid = pid
self.setpipes([frmed], [tomed], [frcmd, tocmd])
self.output = frmed[0]
self.input = tomed[1]
def wait(self):
p, s = os.waitpid(self.med, 0)
p, s = os.waitpid(self.pid, 0)
return (s >> 8) & 0x7f


dworkin at ccs

Oct 27, 1999, 12:30 PM

Post #12 of 12 (804 views)
Permalink
Bidirectional communication through pipes: read/write popen() [In reply to]

Hrvoje Niksic <hniksic [at] srce> writes:

> In all kinds of circumstances it would be very useful to call an
> external filter to process some data, and read the results back in.
> What I needed was something like popen(), only working for both
> reading and writing.

In what way was the popen2 standard module insufficient?

I use popen2.popen2() and popen2.popen3() fairly frequently, and am
trying to see what your code would buy you that you can't do with
those functions.

-Justin

Python python RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.