Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Python: Dev

Green buildbot failure.

 

 

Python dev RSS feed   Index | Next | Previous | View Threaded


tjreedy at udel

Aug 10, 2013, 6:40 PM

Post #1 of 7 (25 views)
Permalink
Green buildbot failure.

This run recorded here shows a green test (it appears to have timed out)
http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/7017
but the corresponding log for this Windows bot
http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/7017/steps/test/logs/stdio
has the expected os.chown failure.

Are such green failures intended?

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


solipsis at pitrou

Aug 11, 2013, 3:00 AM

Post #2 of 7 (22 views)
Permalink
Re: Green buildbot failure. [In reply to]

On Sat, 10 Aug 2013 21:40:46 -0400
Terry Reedy <tjreedy [at] udel> wrote:
>
> This run recorded here shows a green test (it appears to have timed out)
> http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/7017
> but the corresponding log for this Windows bot
> http://buildbot.python.org/all/builders/x86%20Windows7%203.x/builds/7017/steps/test/logs/stdio
> has the expected os.chown failure.

You've got the answer at the bottom:

"program finished with exit code 0"

So for some reason, the test suite crashed, but with a successful exit
code. Buildbot thinks it ran fine.

> Are such green failures intended?

Not really, no.

Regards

Antoine.


_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


shibturn at gmail

Aug 11, 2013, 5:27 AM

Post #3 of 7 (20 views)
Permalink
Re: Green buildbot failure. [In reply to]

On 11/08/2013 11:00am, Antoine Pitrou wrote:
> You've got the answer at the bottom:
>
> "program finished with exit code 0"
>
> So for some reason, the test suite crashed, but with a successful exit
> code. Buildbot thinks it ran fine.

Was the test terminated because it took too long?

TerminateProcess(handle, exitcode) sometimes makes the program exit with
return code 0 instead of exitcode. At any rate, test_multiprocessing
contains this disabled test:

# XXX sometimes get p.exitcode == 0 on Windows ...
#self.assertEqual(p.exitcode, -signal.SIGTERM)

--
Richard

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


shibturn at gmail

Aug 11, 2013, 5:41 AM

Post #4 of 7 (20 views)
Permalink
Re: Green buildbot failure. [In reply to]

http://stackoverflow.com/questions/2061735/42-passed-to-terminateprocess-sometimes-getexitcodeprocess-returns-0

--
Richard

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


db3l.net at gmail

Aug 11, 2013, 2:10 PM

Post #5 of 7 (19 views)
Permalink
Re: Green buildbot failure. [In reply to]

Richard Oudkerk <shibturn [at] gmail> writes:

> On 11/08/2013 11:00am, Antoine Pitrou wrote:
>> You've got the answer at the bottom:
>>
>> "program finished with exit code 0"
>>
>> So for some reason, the test suite crashed, but with a successful exit
>> code. Buildbot thinks it ran fine.
>
> Was the test terminated because it took too long?

Yes, it looks like it.

This test (and one on the XP-4 buildbot in the same time frame) was
terminated by an external watchdog script that kills python_d
processes that have been running for more than 2 hours. I put the
script in place (quite a while back) as a workaround for failures that
would strand a python process, blocking future tests due to files
remaining in use. It's a last ditch, crude, sledge-hammer.

Historically, if this code ran, the buildbot had already itself timed
out, so the exit code (which I can't control) wasn't very important.
2 hours had been conservative (and a trade-off as longer values also
risks failing more future tests) but it may need to be increased.

In this particular case it was a false alarm - the host was heavily
loaded during this time frame, which I think prolonged the test time
by an unusually large amount.

-- David

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


victor.stinner at gmail

Aug 11, 2013, 2:49 PM

Post #6 of 7 (18 views)
Permalink
Re: Green buildbot failure. [In reply to]

2013/8/11 David Bolen <db3l.net [at] gmail>:
>> Was the test terminated because it took too long?
>
> Yes, it looks like it.
>
> This test (and one on the XP-4 buildbot in the same time frame) was
> terminated by an external watchdog script that kills python_d
> processes that have been running for more than 2 hours. I put the
> script in place (quite a while back) as a workaround for failures that
> would strand a python process, blocking future tests due to files
> remaining in use. It's a last ditch, crude, sledge-hammer.

test.regrtest uses faulthandler.dump_traceback_later() to stop the
test after a timeout if --timeout command line option is used.

http://docs.python.org/dev/library/faulthandler.html#faulthandler.dump_traceback_later

Do you pass this option?

The timeout is not global but one a single function of a test file, so
you can use shorter timeout. It has also the advantage of dumping the
traceback of all Python threads before exiting. It didn't try this
feature recently on Windows, but it is supposed to work :-)

Victor
_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com


db3l.net at gmail

Aug 11, 2013, 3:49 PM

Post #7 of 7 (18 views)
Permalink
Re: Green buildbot failure. [In reply to]

Victor Stinner <victor.stinner [at] gmail> writes:

> test.regrtest uses faulthandler.dump_traceback_later() to stop the
> test after a timeout if --timeout command line option is used.

The slave doesn't actually control the test parameters, which come
from build/Tools/buildbot/test.bat (which runs build/PCBuild/rt.bat)
plus anything sent from the master. But no, it doesn't look like that
flow is currently using --timeout, so the main timeout in place is
that from the buildbot slave processing (currently 3900s and based on
output activity by the process under test).

Windows buildbots also have an additional "kill" path where the build
scripts build and execute a separate kill_python_d executable (in
PCBuild) to kill off any python_d process. It does have some
sequencing issues (it runs during the build stage rather than clean)
but no matter where it is used, being part of the build sequence risks
it being skipped if the master/slave connection breaks mid-test.

For some additional background, see email threads:

http://mail.python.org/pipermail/python-dev/2010-November/105585.html
http://mail.python.org/pipermail/python-dev/2010-December/106510.html
http://mail.python.org/pipermail/python-dev/2011-January/107776.html


Anyway, the termination in this particular case is completely separate
from buildbot processing. It's a small script combining pslist/pskill
from sysinternals (as pskill proved always able to kill the processes)
and just looking for old python_d processes that just runs constantly
in the background.

My Windows buildbots have three additional layers of termination
handling (beyond the standard buildbot timeout and kill_python in the
test itself):

1. Modification to buildbot slave code to prevent Windows process and
file dialogs.
2. Auto-it script in the background to acknowledge C RTL dialogs that
the prior step doesn't block. (There have been past discussions
about having Python itself disable RTL dialogs in test builds)
3. The external watchdog script as a fail-safe.

The first two cases will definitely be recognized as test failures, since
while the dialogs are suppressed/acknowledged, the triggering code will
receive a failure result.

The purpose of the watchdog script was to handle cases encountered for
which the normal termination processing (buildbot or python itself)
simply didn't seem to work. The buildbot slave/master thought the
test ended or aborted, so started new tests, but a process remained
stuck in memory from the prior test. The frequency of occurrence
varied over time, but during some periods was a major pain in the neck
adversely affecting buildbot stability.


Not sure if faulthandler's approach to process termination would have
more luck, or if it would even run if, for example, the process was
stuck in the RTL or at the Win32 layer.

I'd certainly be willing to retire the watchdog scripts (as long as I
don't just end up firefighting stuck processes again), but I suspect
the first challenge would be to figure out how to simulate an
appropriately stuck process that would have required the watchdog
script previously, given that it was never really obvious why they
were hung.

-- David

_______________________________________________
Python-Dev mailing list
Python-Dev [at] python
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: http://mail.python.org/mailman/options/python-dev/list-python-dev%40lists.gossamer-threads.com

Python dev RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.