abo at minkirri
Jan 18, 2005, 9:16 PM
Strange segfault in Python threads and linux kernel 2.6
I've Cc'ed this to zope-coders as it might affect other Zope developers
and it had me stumped for ages. I couldn't find anything on it anywhere,
so I figured it would be good to get something into google :-).
We are developing a Zope2.7 application on Debian GNU/Linux that is
using fop to generate pdf's from xml-fo data. fop is a java thing, and
we are using popen2.Popen3(), non-blocking mode, and select loop to
write/read stdin/stdout/stderr. This was all working fine.
Then over the Christmas chaos, various things on my development system
were apt-get updated, and I noticed that java/fop had started
segfaulting. I tried running fop with the exact same input data from the
command line; it worked. I wrote a python script that invoked fop in
exactly the same way as we were invoking it inside zope; it worked. It
only segfaulted when invoked inside Zope.
I googled and tried everything... switched from j2re1.4 to kaffe, rolled
back to a previous version of python, re-built Zope, upgraded Zope from
2.7.2 to 2.7.4, nothing helped. Then I went back from a linux 2.6.8
kernel to a 2.4.27 kernel; it worked!
After googling around, I found references to recent attempts to resolve
some signal handling problems in Python threads. There was one post that
mentioned subtle differences between how Linux 2.4 and Linux 2.6 did
signals to threads.
So it seems this is a problem with Python threads and Linux kernel 2.6.
The attached program demonstrates that it has nothing to do with Zope.
Using it to run "fop-test /usr/bin/fop </dev/null" on a Debian box with
fop installed will show the segfault. Running the same thing on a
machine with 2.4 kernel will instead get the fop "usage" message. It is
not a generic fop/java problem with 2.6 because the commented
un-threaded line works fine. It doesn't seem to segfault for any
command... "cat -" works OK, so it must be something about java
After searching the Python bugs, the closest I could find was #971213
<http://sourceforge.net/tracker/?group_id=5470&atid=105470&func=detail&aid=971213>. Is this the same bug? Should I submit a new bug report? Is there any other way I can help resolve this?
BTW, built in file objects really could use better non-blocking
support... I've got a half-drafted PEP for it... anyone interested in
Donovan Baarda <abo [at] minkirri>