Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Xen: Devel

[xen-unstable test] 11946: regressions - FAIL

 

 

Xen devel RSS feed   Index | Next | Previous | View Threaded


ian.jackson at eu

Feb 13, 2012, 12:16 PM

Post #1 of 24 (415 views)
Permalink
[xen-unstable test] 11946: regressions - FAIL

flight 11946 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
test-amd64-i386-xl-credit2 7 debian-install fail REGR. vs. 11944

Regressions which are regarded as allowable (not blocking):
test-i386-i386-win 14 guest-start.2 fail like 11944

Tests which did not succeed, but are not blocking:
test-amd64-amd64-xl-qemuu-winxpsp3 7 windows-install fail never pass
test-amd64-amd64-xl-qemuu-win7-amd64 7 windows-install fail never pass
test-amd64-i386-qemuu-rhel6hvm-intel 7 redhat-install fail never pass
test-i386-i386-xl-qemuu-winxpsp3 7 windows-install fail never pass
test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass
test-amd64-i386-rhel6hvm-amd 11 leak-check/check fail never pass
test-amd64-i386-rhel6hvm-intel 11 leak-check/check fail never pass
test-amd64-i386-qemuu-rhel6hvm-amd 7 redhat-install fail never pass
test-amd64-amd64-win 16 leak-check/check fail never pass
test-amd64-i386-win-vcpus1 16 leak-check/check fail never pass
test-amd64-amd64-xl-winxpsp3 13 guest-stop fail never pass
test-i386-i386-xl-winxpsp3 13 guest-stop fail never pass
test-amd64-i386-win 16 leak-check/check fail never pass
test-amd64-amd64-xl-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xl-winxpsp3-vcpus1 13 guest-stop fail never pass
test-amd64-amd64-xl-win 13 guest-stop fail never pass
test-amd64-i386-xend-winxpsp3 16 leak-check/check fail never pass
test-i386-i386-xl-win 13 guest-stop fail never pass
test-amd64-i386-xl-win7-amd64 13 guest-stop fail never pass
test-amd64-i386-xl-win-vcpus1 13 guest-stop fail never pass

version targeted for testing:
xen 9207cc3a0862
baseline version:
xen 9ad1e42c341b

------------------------------------------------------------
People who touched revisions under test:
David Vrabel <david.vrabel [at] citrix>
Ian Campbell <ian.campbell [at] citrix>
Ian Jackson <ian.jackson [at] eu>
Jan Beulich <jbeulich [at] suse>
Julian Pidancet <julian.pidancet [at] gmail>
Keir Fraser <keir [at] xen>
Stefano Stabellini <stefano.stabellini [at] eu>
Tim Deegan <tim [at] xen>
Yongjie Ren <yongjie.ren [at] intel>
------------------------------------------------------------

jobs:
build-amd64 pass
build-i386 pass
build-amd64-oldkern pass
build-i386-oldkern pass
build-amd64-pvops pass
build-i386-pvops pass
test-amd64-amd64-xl pass
test-amd64-i386-xl pass
test-i386-i386-xl pass
test-amd64-i386-rhel6hvm-amd fail
test-amd64-i386-qemuu-rhel6hvm-amd fail
test-amd64-amd64-xl-qemuu-win7-amd64 fail
test-amd64-amd64-xl-win7-amd64 fail
test-amd64-i386-xl-win7-amd64 fail
test-amd64-i386-xl-credit2 fail
test-amd64-amd64-xl-pcipt-intel fail
test-amd64-i386-rhel6hvm-intel fail
test-amd64-i386-qemuu-rhel6hvm-intel fail
test-amd64-i386-xl-multivcpu pass
test-amd64-amd64-pair pass
test-amd64-i386-pair pass
test-i386-i386-pair pass
test-amd64-amd64-xl-sedf-pin pass
test-amd64-amd64-pv pass
test-amd64-i386-pv pass
test-i386-i386-pv pass
test-amd64-amd64-xl-sedf pass
test-amd64-i386-win-vcpus1 fail
test-amd64-i386-xl-win-vcpus1 fail
test-amd64-i386-xl-winxpsp3-vcpus1 fail
test-amd64-amd64-win fail
test-amd64-i386-win fail
test-i386-i386-win fail
test-amd64-amd64-xl-win fail
test-i386-i386-xl-win fail
test-amd64-amd64-xl-qemuu-winxpsp3 fail
test-i386-i386-xl-qemuu-winxpsp3 fail
test-amd64-i386-xend-winxpsp3 fail
test-amd64-amd64-xl-winxpsp3 fail
test-i386-i386-xl-winxpsp3 fail


------------------------------------------------------------
sg-report-flight on woking.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images

Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Not pushing.

------------------------------------------------------------
changeset: 24790:9207cc3a0862
tag: tip
user: David Vrabel <david.vrabel [at] citrix>
date: Mon Feb 13 13:34:47 2012 +0000

libfdt: add to build

Signed-off-by: David Vrabel <david.vrabel [at] citrix>
Acked-by: Tim Deegan <tim [at] xen>
Committed-by: Keir Fraser <keir [at] xen>


changeset: 24789:e060d1bd7b60
user: David Vrabel <david.vrabel [at] citrix>
date: Mon Feb 13 13:34:08 2012 +0000

libfdt: fixup libfdt_env.h for xen

Signed-off-by: David Vrabel <david.vrabel [at] citrix>
Acked-by: Tim Deegan <tim [at] xen>
Committed-by: Keir Fraser <keir [at] xen>


changeset: 24788:fcc188f21e47
user: David Vrabel <david.vrabel [at] citrix>
date: Mon Feb 13 13:33:26 2012 +0000

libfdt: add version 1.3.0

Add libfdt 1.3.0 from http://git.jdl.com/gitweb/?p=dtc.git

This will be used by Xen to parse the DTBs provided by bootloaders on
ARM platforms.

Signed-off-by: David Vrabel <david.vrabel [at] citrix>
Acked-by: Tim Deegan <tim [at] xen>
Committed-by: Keir Fraser <keir [at] xen>


changeset: 24787:bd0a11ed1a67
user: Ian Campbell <ian.campbell [at] citrix>
date: Mon Feb 13 12:53:28 2012 +0000

MAINTAINERS: Add entry for ARM w/ virt extensions port

Signed-off-by: Ian Campbell <ian.campbell [at] citrix>
Committed-by: Keir Fraser <keir [at] xen>


changeset: 24786:79fe73117c12
user: Julian Pidancet <julian.pidancet [at] gmail>
date: Mon Feb 13 12:50:46 2012 +0000

firmware: Introduce CONFIG_ROMBIOS and CONFIG_SEABIOS options

This patch introduces configuration options allowing to built either a
rombios only or a seabios only hvmloader.

Building option ROMs like vgabios or etherboot is only enabled for a
rombios hvmloader, since SeaBIOS takes care or extracting option ROMs
itself from the PCI devices (these option ROMs are provided by the
device model and do not need to be built in hvmloader).

The Makefile in tools/firmware/ now only checks for bcc if rombios is
enabled.

These two configuration options are left on by default to remain
compatible.

Signed-off-by: Julian Pidancet <julian.pidancet [at] gmail>
Acked-by: Ian Campbell <ian.campbell [at] citrix>


changeset: 24785:e4d8d2524407
user: Julian Pidancet <julian.pidancet [at] gmail>
date: Mon Feb 13 12:50:04 2012 +0000

hvmloader: Move option ROM loading into a separate optionnal file

Make load_rom field in struct bios_config an optionnal callback rather
than a boolean value. It allow BIOS specific code to implement it's
own option ROM loading methods.

Facilities to scan PCI devices, extract an deploy ROMs are moved into
a separate file that can be compiled optionnaly.

Signed-off-by: Julian Pidancet <julian.pidancet [at] gmail>
Acked-by: Ian Campbell <ian.campbell [at] citrix>


changeset: 24784:ab47cfef2b0a
user: Julian Pidancet <julian.pidancet [at] gmail>
date: Mon Feb 13 12:49:06 2012 +0000

firmware: Use mkhex from hvmloader directory for etherboot ROMs

To remain consistent with how other ROMs are built into hvmloader,
call mkhex on etherboot ROMs from the hvmloader directory, instead of
the etherboot directory. In other words, eb-roms.h is not used any
more.

Introduce ETHERBOOT_NICS config option to choose which ROMs should be
built (kept rtl8139 and 8086100e per default as before).

Signed-off-by: Julian Pidancet <julian.pidancet [at] gmail>
Acked-by: Ian Campbell <ian.campbell [at] citrix>


changeset: 24783:0fe9e2556e20
user: Julian Pidancet <julian.pidancet [at] gmail>
date: Mon Feb 13 12:48:20 2012 +0000

hvmloader: Allow the mkhex command to take several file arguments
Signed-off-by: Julian Pidancet <julian.pidancet [at] gmail>
Acked-by: Ian Campbell <ian.campbell [at] citrix>


changeset: 24782:e1f10d12b9fe
user: Julian Pidancet <julian.pidancet [at] gmail>
date: Mon Feb 13 12:47:46 2012 +0000

hvmloader: Only compile 32bitbios_support.c when rombios is enabled

32bitbios_support.c only contains code specific to rombios, and should
not be built-in when building hvmloader for SeaBIOS only (as for
rombios.c).

Signed-off-by: Julian Pidancet <julian.pidancet [at] gmail>
Acked-by: Ian Campbell <ian.campbell [at] citrix>


changeset: 24781:6ae5506e49ab
user: Jan Beulich <jbeulich [at] suse>
date: Mon Feb 13 13:12:30 2012 +0100

x86/vMCE: MC{G,i}_CTL handling adjustments

- g_mcg_cap was read to determine whether MCG_CTL exists before it got
initialized
- h_mci_ctrl[] and dom_vmce()->mci_ctl[] both got initialized via
memset() with an inappropriate size (hence causing a [minor?]
information leak)

Signed-off-by: Jan Beulich <jbeulich [at] suse>
Acked-by: Keir Fraser <keir [at] xen>


changeset: 24780:e953d536d3c6
user: Jan Beulich <jbeulich [at] suse>
date: Mon Feb 13 13:09:02 2012 +0100

x86/paging: use clear_guest() for zero-filling guest buffers

While static arrays of all zeros may be tolerable (but are simply
inefficient now that we have the necessary infrastructure), using on-
stack arrays for this purpose (particularly when their size doesn't
have an upper limit enforced) is calling for eventual problems (even
if the code can be reached via administrative interfaces only).

Signed-off-by: Jan Beulich <jbeulich [at] suse>
Acked-by: Tim Deegan <tim [at] xen>


changeset: 24779:9ad1e42c341b
user: Ian Campbell <ian.campbell [at] citrix>
date: Fri Feb 10 17:24:50 2012 +0000

xend: populate HVM guest grant table on boot

Signed-off-by: Ian Campbell <ian.campbell [at] citrix>
Committed-by: Ian Jackson <ian.jackson [at] eu>


========================================
commit 8cc8a3651c9c5bc2d0086d12f4b870fc525b9387
Author: Jan Beulich <JBeulich [at] suse>
Date: Tue Feb 7 18:42:56 2012 +0000

qemu-dm: fix unregister_iomem()

This function (introduced quite a long time ago in
e7911109f4321e9ba0cc56a253b653600aa46bea - "disable qemu PCI
devices in HVM domains") appears to be completely broken, causing
the regression reported in
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=1805 (due to
the newly added caller of it in
56d7747a3cf811910c4cf865e1ebcb8b82502005 - "qemu: clean up
MSI-X table handling"). It's unclear how the function can ever have
fulfilled its purpose: the value returned by iomem_index() is *not* an
index into mmio[].

Additionally, fix two problems:
- unregister_iomem() must not clear mmio[].start, otherwise
cpu_register_physical_memory() won't be able to re-use the previous
slot, thus causing a leak
- cpu_unregister_io_memory() must not check mmio[].size, otherwise it
won't properly clean up entries (temporarily) squashed through
unregister_iomem()

Signed-off-by: Jan Beulich <jbeulich [at] suse>
Tested-by: Stefano Stabellini <stefano.stabellini [at] eu>
Tested-by: Yongjie Ren <yongjie.ren [at] intel>

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


Ian.Campbell at citrix

Feb 14, 2012, 2:44 AM

Post #2 of 24 (359 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
> flight 11946 xen-unstable real [real]
> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
>
> Regressions :-(
>
> Tests which did not succeed and are blocking,
> including tests which could not be run:
> test-amd64-i386-xl-credit2 7 debian-install fail REGR. vs. 11944

Host crash:
http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log

This is the debug Andrew Cooper added recently to track down the IRQ
assertion we've been seeing, sadly it looks like the debug code tries to
call xfree from interrupt context and therefore doesn't produce full
output :-(

Or is 24675:d82a1e3d3c65 ("xsm: Add security label to IRQ debug output")
at fault for adding the xfree in what may be an IRQ context? (are
keyhandlers run in IRQ context?)

A skanky quick "fix" follows.

Feb 13 17:17:29.777522 (XEN) *** IRQ BUG found ***
Feb 13 17:19:32.594539 (XEN) CPU0 -Testing vector 229 from bitmap 34,48,57,64,72,75,80,83,88,97,104-105,113,120-121,129,136,144,152,160,168,176,184,192,202
Feb 13 17:19:32.617515 (XEN) Guest interrupt information:
Feb 13 17:19:32.617536 (XEN) IRQ: 0 affinity:001 vec:f0 type=IO-APIC-edge status=00000000 mapped, unbound
Feb 13 17:19:32.617567 (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
Feb 13 17:19:32.626489 (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Not tainted ]----
Feb 13 17:19:32.626512 (XEN) CPU: 0
Feb 13 17:19:32.626525 (XEN) RIP: e008:[<ffff82c48012c842>] xfree+0x33/0x121
Feb 13 17:19:32.641496 (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor
Feb 13 17:19:32.641519 (XEN) rax: ffff82c4802d0800 rbx: ffff8301a7e00080 rcx: 0000000000000000
Feb 13 17:19:32.650560 (XEN) rdx: 0000000000000000 rsi: 0000000000000083 rdi: 0000000000000000
Feb 13 17:19:32.665510 (XEN) rbp: ffff82c4802afd18 rsp: ffff82c4802afcf8 r8: 0000000000000004
Feb 13 17:19:32.665550 (XEN) r9: 0000000000000000 r10: 0000000000000006 r11: ffff82c480224aa0
Feb 13 17:19:32.673509 (XEN) r12: ffff8301a7e00580 r13: 0000000000000005 r14: ffff82c4802aff18
Feb 13 17:19:32.685503 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006f0
Feb 13 17:19:32.685537 (XEN) cr3: 00000001a7f54000 cr2: 00000000c4b4ee84
Feb 13 17:19:32.697505 (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008
Feb 13 17:19:32.697540 (XEN) Xen stack trace from rsp=ffff82c4802afcf8:
Feb 13 17:19:32.706513 (XEN) ffff8301a7e00080 ffff8301a7e00580 0000000000000005 ffff82c4802aff18
Feb 13 17:19:32.721495 (XEN) ffff82c4802afd88 ffff82c4801658ee ffff82c4802afd38 ffff82c48010098a
Feb 13 17:19:32.721531 (XEN) 00000400802afd68 0000000000000083 ffff8301a7e000a8 0000000000000000
Feb 13 17:19:32.729495 (XEN) 00000000fffffffa 00000000000000e5 ffff8301a7e00580 0000000000000005
Feb 13 17:19:32.738490 (XEN) ffff82c4802aff18 ffff8301a7e005a8 ffff82c4802afe28 ffff82c480167781
Feb 13 17:19:32.738515 (XEN) ffff8301a7ece000 ffff82c4802afde8 0000000000000000 ffff82c4802aff18
Feb 13 17:19:32.750497 (XEN) ffff82c4802aff18 0000000000000002 ffff82c4802aff18 ffff82c4802fa060
Feb 13 17:19:32.762568 (XEN) 000000e500000000 ffff82c4802fa060 ffff82c4802afe08 ffff82c48017bd51
Feb 13 17:19:32.762596 (XEN) ffff82c4802aff18 ffff82c4802aff18 ffff82c48025e380 ffff82c4802aff18
Feb 13 17:19:32.773513 (XEN) 00000000ffffffff 0000000000000002 00007d3b7fd501a7 ffff82c4801525d0
Feb 13 17:19:32.785503 (XEN) 0000000000000002 00000000ffffffff ffff82c4802aff18 ffff82c48025e380
Feb 13 17:19:32.785539 (XEN) ffff82c4802afee0 ffff82c4802aff18 0000001863058413 00000000000c0000
Feb 13 17:19:32.794514 (XEN) 000000000e1ff99c 000000000000c701 ffff82c4802f9a90 0000000000000000
Feb 13 17:19:32.809503 (XEN) 0000000000000000 ffff8301a7f5dc80 0000000000000000 0000002000000000
Feb 13 17:19:32.809529 (XEN) ffff82c4801581a9 000000000000e008 0000000000000246 ffff82c4802afee0
Feb 13 17:19:32.814513 (XEN) 0000000000000000 ffff82c4802aff10 ffff82c48015a647 0000000000000000
Feb 13 17:19:32.829506 (XEN) ffff8300d7cfb000 ffff8300d7af9000 0000000000000000 ffff82c4802afd88
Feb 13 17:19:32.829549 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
Feb 13 17:19:32.841510 (XEN) 00000000dfc91f90 00000000deadbeef 0000000000000000 0000000000000000
Feb 13 17:19:32.853508 (XEN) 0000000000000000 0000000000000000 0000000000000000 00000000deadbeef
Feb 13 17:19:32.858496 (XEN) Xen call trace:
Feb 13 17:19:32.858518 (XEN) [<ffff82c48012c842>] xfree+0x33/0x121
Feb 13 17:19:32.858547 (XEN) [<ffff82c4801658ee>] dump_irqs+0x2a3/0x2ca
Feb 13 17:19:32.870500 (XEN) [<ffff82c480167781>] smp_irq_move_cleanup_interrupt+0x303/0x37b
Feb 13 17:19:32.870554 (XEN) [<ffff82c4801525d0>] irq_move_cleanup_interrupt+0x30/0x40
Feb 13 17:19:32.885510 (XEN) [<ffff82c4801581a9>] default_idle+0x99/0x9e
Feb 13 17:19:32.885541 (XEN) [<ffff82c48015a647>] idle_loop+0x6c/0x7c
Feb 13 17:19:32.897496 (XEN)
Feb 13 17:19:32.897510 (XEN)
Feb 13 17:19:32.897520 (XEN) ****************************************
Feb 13 17:19:32.897537 (XEN) Panic on CPU 0:
Feb 13 17:19:32.905499 (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
Feb 13 17:19:32.905522 (XEN) ****************************************
Feb 13 17:19:32.913488 (XEN)
Feb 13 17:19:32.913506 (XEN) Reboot in five seconds...

# HG changeset patch
# User Ian Campbell <ian.campbell [at] citrix>
# Date 1329216241 0
# Node ID 738424a5e5a5053c75cfbe64f6675b5d756daf1b
# Parent 0ba87b95e80bae059fe70b4b117dcc409f2471ef
xen: don't try to print IRQ SSID in IRQ debug from irq context.

It is not possible to call xfree() in that context.

Signed-off-by: Ian Campbell <ian.campbell [at] citrix>

diff -r 0ba87b95e80b -r 738424a5e5a5 xen/arch/x86/irq.c
--- a/xen/arch/x86/irq.c Mon Feb 13 17:26:08 2012 +0000
+++ b/xen/arch/x86/irq.c Tue Feb 14 10:44:01 2012 +0000
@@ -2026,7 +2026,7 @@ static void dump_irqs(unsigned char key)
if ( !irq_desc_initialized(desc) || desc->handler == &no_irq_type )
continue;

- ssid = xsm_show_irq_sid(irq);
+ ssid = in_irq() ? NULL : xsm_show_irq_sid(irq);

spin_lock_irqsave(&desc->lock, flags);

@@ -2073,7 +2073,8 @@ static void dump_irqs(unsigned char key)

spin_unlock_irqrestore(&desc->lock, flags);

- xfree(ssid);
+ if ( ssid )
+ xfree(ssid);
}

dump_ioapic_irq_info();




_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


dgdegra at tycho

Feb 14, 2012, 11:17 AM

Post #3 of 24 (355 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On 02/14/2012 05:44 AM, Ian Campbell wrote:
> On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
>> flight 11946 xen-unstable real [real]
>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
>>
>> Regressions :-(
>>
>> Tests which did not succeed and are blocking,
>> including tests which could not be run:
>> test-amd64-i386-xl-credit2 7 debian-install fail REGR. vs. 11944
>
> Host crash:
> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log
>
> This is the debug Andrew Cooper added recently to track down the IRQ
> assertion we've been seeing, sadly it looks like the debug code tries to
> call xfree from interrupt context and therefore doesn't produce full
> output :-(
>
> Or is 24675:d82a1e3d3c65 ("xsm: Add security label to IRQ debug output")
> at fault for adding the xfree in what may be an IRQ context? (are
> keyhandlers run in IRQ context?)

Keyhandlers are not run in IRQ context (or at least, the primary methods of
invoking them don't run there - serial keypress, xl debug-key). The placement
of the xsm call and xfree was to avoid a similar backtrace from attempting
allocation while holding the irq's spinlock.

> A skanky quick "fix" follows.
>
> Feb 13 17:17:29.777522 (XEN) *** IRQ BUG found ***
> Feb 13 17:19:32.594539 (XEN) CPU0 -Testing vector 229 from bitmap 34,48,57,64,72,75,80,83,88,97,104-105,113,120-121,129,136,144,152,160,168,176,184,192,202
> Feb 13 17:19:32.617515 (XEN) Guest interrupt information:
> Feb 13 17:19:32.617536 (XEN) IRQ: 0 affinity:001 vec:f0 type=IO-APIC-edge status=00000000 mapped, unbound
> Feb 13 17:19:32.617567 (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> Feb 13 17:19:32.626489 (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Not tainted ]----
> Feb 13 17:19:32.626512 (XEN) CPU: 0
> Feb 13 17:19:32.626525 (XEN) RIP: e008:[<ffff82c48012c842>] xfree+0x33/0x121
> Feb 13 17:19:32.641496 (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor
> Feb 13 17:19:32.641519 (XEN) rax: ffff82c4802d0800 rbx: ffff8301a7e00080 rcx: 0000000000000000
> Feb 13 17:19:32.650560 (XEN) rdx: 0000000000000000 rsi: 0000000000000083 rdi: 0000000000000000
> Feb 13 17:19:32.665510 (XEN) rbp: ffff82c4802afd18 rsp: ffff82c4802afcf8 r8: 0000000000000004
> Feb 13 17:19:32.665550 (XEN) r9: 0000000000000000 r10: 0000000000000006 r11: ffff82c480224aa0
> Feb 13 17:19:32.673509 (XEN) r12: ffff8301a7e00580 r13: 0000000000000005 r14: ffff82c4802aff18
> Feb 13 17:19:32.685503 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006f0
> Feb 13 17:19:32.685537 (XEN) cr3: 00000001a7f54000 cr2: 00000000c4b4ee84
> Feb 13 17:19:32.697505 (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008
> Feb 13 17:19:32.697540 (XEN) Xen stack trace from rsp=ffff82c4802afcf8:
> Feb 13 17:19:32.706513 (XEN) ffff8301a7e00080 ffff8301a7e00580 0000000000000005 ffff82c4802aff18
> Feb 13 17:19:32.721495 (XEN) ffff82c4802afd88 ffff82c4801658ee ffff82c4802afd38 ffff82c48010098a
> Feb 13 17:19:32.721531 (XEN) 00000400802afd68 0000000000000083 ffff8301a7e000a8 0000000000000000
> Feb 13 17:19:32.729495 (XEN) 00000000fffffffa 00000000000000e5 ffff8301a7e00580 0000000000000005
> Feb 13 17:19:32.738490 (XEN) ffff82c4802aff18 ffff8301a7e005a8 ffff82c4802afe28 ffff82c480167781
> Feb 13 17:19:32.738515 (XEN) ffff8301a7ece000 ffff82c4802afde8 0000000000000000 ffff82c4802aff18
> Feb 13 17:19:32.750497 (XEN) ffff82c4802aff18 0000000000000002 ffff82c4802aff18 ffff82c4802fa060
> Feb 13 17:19:32.762568 (XEN) 000000e500000000 ffff82c4802fa060 ffff82c4802afe08 ffff82c48017bd51
> Feb 13 17:19:32.762596 (XEN) ffff82c4802aff18 ffff82c4802aff18 ffff82c48025e380 ffff82c4802aff18
> Feb 13 17:19:32.773513 (XEN) 00000000ffffffff 0000000000000002 00007d3b7fd501a7 ffff82c4801525d0
> Feb 13 17:19:32.785503 (XEN) 0000000000000002 00000000ffffffff ffff82c4802aff18 ffff82c48025e380
> Feb 13 17:19:32.785539 (XEN) ffff82c4802afee0 ffff82c4802aff18 0000001863058413 00000000000c0000
> Feb 13 17:19:32.794514 (XEN) 000000000e1ff99c 000000000000c701 ffff82c4802f9a90 0000000000000000
> Feb 13 17:19:32.809503 (XEN) 0000000000000000 ffff8301a7f5dc80 0000000000000000 0000002000000000
> Feb 13 17:19:32.809529 (XEN) ffff82c4801581a9 000000000000e008 0000000000000246 ffff82c4802afee0
> Feb 13 17:19:32.814513 (XEN) 0000000000000000 ffff82c4802aff10 ffff82c48015a647 0000000000000000
> Feb 13 17:19:32.829506 (XEN) ffff8300d7cfb000 ffff8300d7af9000 0000000000000000 ffff82c4802afd88
> Feb 13 17:19:32.829549 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Feb 13 17:19:32.841510 (XEN) 00000000dfc91f90 00000000deadbeef 0000000000000000 0000000000000000
> Feb 13 17:19:32.853508 (XEN) 0000000000000000 0000000000000000 0000000000000000 00000000deadbeef
> Feb 13 17:19:32.858496 (XEN) Xen call trace:
> Feb 13 17:19:32.858518 (XEN) [<ffff82c48012c842>] xfree+0x33/0x121
> Feb 13 17:19:32.858547 (XEN) [<ffff82c4801658ee>] dump_irqs+0x2a3/0x2ca
> Feb 13 17:19:32.870500 (XEN) [<ffff82c480167781>] smp_irq_move_cleanup_interrupt+0x303/0x37b
> Feb 13 17:19:32.870554 (XEN) [<ffff82c4801525d0>] irq_move_cleanup_interrupt+0x30/0x40
> Feb 13 17:19:32.885510 (XEN) [<ffff82c4801581a9>] default_idle+0x99/0x9e
> Feb 13 17:19:32.885541 (XEN) [<ffff82c48015a647>] idle_loop+0x6c/0x7c
> Feb 13 17:19:32.897496 (XEN)
> Feb 13 17:19:32.897510 (XEN)
> Feb 13 17:19:32.897520 (XEN) ****************************************
> Feb 13 17:19:32.897537 (XEN) Panic on CPU 0:
> Feb 13 17:19:32.905499 (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> Feb 13 17:19:32.905522 (XEN) ****************************************
> Feb 13 17:19:32.913488 (XEN)
> Feb 13 17:19:32.913506 (XEN) Reboot in five seconds...
>
> # HG changeset patch
> # User Ian Campbell <ian.campbell [at] citrix>
> # Date 1329216241 0
> # Node ID 738424a5e5a5053c75cfbe64f6675b5d756daf1b
> # Parent 0ba87b95e80bae059fe70b4b117dcc409f2471ef
> xen: don't try to print IRQ SSID in IRQ debug from irq context.
>
> It is not possible to call xfree() in that context.
>
> Signed-off-by: Ian Campbell <ian.campbell [at] citrix>
>
> diff -r 0ba87b95e80b -r 738424a5e5a5 xen/arch/x86/irq.c
> --- a/xen/arch/x86/irq.c Mon Feb 13 17:26:08 2012 +0000
> +++ b/xen/arch/x86/irq.c Tue Feb 14 10:44:01 2012 +0000
> @@ -2026,7 +2026,7 @@ static void dump_irqs(unsigned char key)
> if ( !irq_desc_initialized(desc) || desc->handler == &no_irq_type )
> continue;
>
> - ssid = xsm_show_irq_sid(irq);
> + ssid = in_irq() ? NULL : xsm_show_irq_sid(irq);
>
> spin_lock_irqsave(&desc->lock, flags);
>
> @@ -2073,7 +2073,8 @@ static void dump_irqs(unsigned char key)
>
> spin_unlock_irqrestore(&desc->lock, flags);
>
> - xfree(ssid);
> + if ( ssid )
> + xfree(ssid);
> }
>
> dump_ioapic_irq_info();
>
>
>


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xensource.com/xen-devel


Ian.Campbell at citrix

Mar 27, 2012, 3:36 AM

Post #4 of 24 (348 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On Tue, 2012-02-14 at 10:44 +0000, Ian Campbell wrote:
> On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
> > flight 11946 xen-unstable real [real]
> > http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
> >
> > Regressions :-(
> >
> > Tests which did not succeed and are blocking,
> > including tests which could not be run:
> > test-amd64-i386-xl-credit2 7 debian-install fail REGR. vs. 11944
>
> Host crash:
> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log
>
> This is the debug Andrew Cooper added recently to track down the IRQ
> assertion we've been seeing, sadly it looks like the debug code tries to
> call xfree from interrupt context and therefore doesn't produce full
> output :-(

Are we still seeing the issue this debugging was intended to address? We
don't seem to be seeing the host crashes any more. Should the debug code
be patched up as in the following patch, otherwise when we do see it it
doesn't end up printing any useful info.

Someone recently reported bugs.debian.org/665433 to Debian, is this the
same underlying issue? That report is with Xen 4.0 FWIW.

> Or is 24675:d82a1e3d3c65 ("xsm: Add security label to IRQ debug output")
> at fault for adding the xfree in what may be an IRQ context? (are
> keyhandlers run in IRQ context?)
>
> A skanky quick "fix" follows.
>
> Feb 13 17:17:29.777522 (XEN) *** IRQ BUG found ***
> Feb 13 17:19:32.594539 (XEN) CPU0 -Testing vector 229 from bitmap 34,48,57,64,72,75,80,83,88,97,104-105,113,120-121,129,136,144,152,160,168,176,184,192,202
> Feb 13 17:19:32.617515 (XEN) Guest interrupt information:
> Feb 13 17:19:32.617536 (XEN) IRQ: 0 affinity:001 vec:f0 type=IO-APIC-edge status=00000000 mapped, unbound
> Feb 13 17:19:32.617567 (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> Feb 13 17:19:32.626489 (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Not tainted ]----
> Feb 13 17:19:32.626512 (XEN) CPU: 0
> Feb 13 17:19:32.626525 (XEN) RIP: e008:[<ffff82c48012c842>] xfree+0x33/0x121
> Feb 13 17:19:32.641496 (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor
> Feb 13 17:19:32.641519 (XEN) rax: ffff82c4802d0800 rbx: ffff8301a7e00080 rcx: 0000000000000000
> Feb 13 17:19:32.650560 (XEN) rdx: 0000000000000000 rsi: 0000000000000083 rdi: 0000000000000000
> Feb 13 17:19:32.665510 (XEN) rbp: ffff82c4802afd18 rsp: ffff82c4802afcf8 r8: 0000000000000004
> Feb 13 17:19:32.665550 (XEN) r9: 0000000000000000 r10: 0000000000000006 r11: ffff82c480224aa0
> Feb 13 17:19:32.673509 (XEN) r12: ffff8301a7e00580 r13: 0000000000000005 r14: ffff82c4802aff18
> Feb 13 17:19:32.685503 (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000006f0
> Feb 13 17:19:32.685537 (XEN) cr3: 00000001a7f54000 cr2: 00000000c4b4ee84
> Feb 13 17:19:32.697505 (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008
> Feb 13 17:19:32.697540 (XEN) Xen stack trace from rsp=ffff82c4802afcf8:
> Feb 13 17:19:32.706513 (XEN) ffff8301a7e00080 ffff8301a7e00580 0000000000000005 ffff82c4802aff18
> Feb 13 17:19:32.721495 (XEN) ffff82c4802afd88 ffff82c4801658ee ffff82c4802afd38 ffff82c48010098a
> Feb 13 17:19:32.721531 (XEN) 00000400802afd68 0000000000000083 ffff8301a7e000a8 0000000000000000
> Feb 13 17:19:32.729495 (XEN) 00000000fffffffa 00000000000000e5 ffff8301a7e00580 0000000000000005
> Feb 13 17:19:32.738490 (XEN) ffff82c4802aff18 ffff8301a7e005a8 ffff82c4802afe28 ffff82c480167781
> Feb 13 17:19:32.738515 (XEN) ffff8301a7ece000 ffff82c4802afde8 0000000000000000 ffff82c4802aff18
> Feb 13 17:19:32.750497 (XEN) ffff82c4802aff18 0000000000000002 ffff82c4802aff18 ffff82c4802fa060
> Feb 13 17:19:32.762568 (XEN) 000000e500000000 ffff82c4802fa060 ffff82c4802afe08 ffff82c48017bd51
> Feb 13 17:19:32.762596 (XEN) ffff82c4802aff18 ffff82c4802aff18 ffff82c48025e380 ffff82c4802aff18
> Feb 13 17:19:32.773513 (XEN) 00000000ffffffff 0000000000000002 00007d3b7fd501a7 ffff82c4801525d0
> Feb 13 17:19:32.785503 (XEN) 0000000000000002 00000000ffffffff ffff82c4802aff18 ffff82c48025e380
> Feb 13 17:19:32.785539 (XEN) ffff82c4802afee0 ffff82c4802aff18 0000001863058413 00000000000c0000
> Feb 13 17:19:32.794514 (XEN) 000000000e1ff99c 000000000000c701 ffff82c4802f9a90 0000000000000000
> Feb 13 17:19:32.809503 (XEN) 0000000000000000 ffff8301a7f5dc80 0000000000000000 0000002000000000
> Feb 13 17:19:32.809529 (XEN) ffff82c4801581a9 000000000000e008 0000000000000246 ffff82c4802afee0
> Feb 13 17:19:32.814513 (XEN) 0000000000000000 ffff82c4802aff10 ffff82c48015a647 0000000000000000
> Feb 13 17:19:32.829506 (XEN) ffff8300d7cfb000 ffff8300d7af9000 0000000000000000 ffff82c4802afd88
> Feb 13 17:19:32.829549 (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> Feb 13 17:19:32.841510 (XEN) 00000000dfc91f90 00000000deadbeef 0000000000000000 0000000000000000
> Feb 13 17:19:32.853508 (XEN) 0000000000000000 0000000000000000 0000000000000000 00000000deadbeef
> Feb 13 17:19:32.858496 (XEN) Xen call trace:
> Feb 13 17:19:32.858518 (XEN) [<ffff82c48012c842>] xfree+0x33/0x121
> Feb 13 17:19:32.858547 (XEN) [<ffff82c4801658ee>] dump_irqs+0x2a3/0x2ca
> Feb 13 17:19:32.870500 (XEN) [<ffff82c480167781>] smp_irq_move_cleanup_interrupt+0x303/0x37b
> Feb 13 17:19:32.870554 (XEN) [<ffff82c4801525d0>] irq_move_cleanup_interrupt+0x30/0x40
> Feb 13 17:19:32.885510 (XEN) [<ffff82c4801581a9>] default_idle+0x99/0x9e
> Feb 13 17:19:32.885541 (XEN) [<ffff82c48015a647>] idle_loop+0x6c/0x7c
> Feb 13 17:19:32.897496 (XEN)
> Feb 13 17:19:32.897510 (XEN)
> Feb 13 17:19:32.897520 (XEN) ****************************************
> Feb 13 17:19:32.897537 (XEN) Panic on CPU 0:
> Feb 13 17:19:32.905499 (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> Feb 13 17:19:32.905522 (XEN) ****************************************
> Feb 13 17:19:32.913488 (XEN)
> Feb 13 17:19:32.913506 (XEN) Reboot in five seconds...
>
> # HG changeset patch
> # User Ian Campbell <ian.campbell [at] citrix>
> # Date 1329216241 0
> # Node ID 738424a5e5a5053c75cfbe64f6675b5d756daf1b
> # Parent 0ba87b95e80bae059fe70b4b117dcc409f2471ef
> xen: don't try to print IRQ SSID in IRQ debug from irq context.
>
> It is not possible to call xfree() in that context.
>
> Signed-off-by: Ian Campbell <ian.campbell [at] citrix>
>
> diff -r 0ba87b95e80b -r 738424a5e5a5 xen/arch/x86/irq.c
> --- a/xen/arch/x86/irq.c Mon Feb 13 17:26:08 2012 +0000
> +++ b/xen/arch/x86/irq.c Tue Feb 14 10:44:01 2012 +0000
> @@ -2026,7 +2026,7 @@ static void dump_irqs(unsigned char key)
> if ( !irq_desc_initialized(desc) || desc->handler == &no_irq_type )
> continue;
>
> - ssid = xsm_show_irq_sid(irq);
> + ssid = in_irq() ? NULL : xsm_show_irq_sid(irq);
>
> spin_lock_irqsave(&desc->lock, flags);
>
> @@ -2073,7 +2073,8 @@ static void dump_irqs(unsigned char key)
>
> spin_unlock_irqrestore(&desc->lock, flags);
>
> - xfree(ssid);
> + if ( ssid )
> + xfree(ssid);
> }
>
> dump_ioapic_irq_info();
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xensource.com/xen-devel



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


JBeulich at suse

Mar 27, 2012, 3:52 AM

Post #5 of 24 (353 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

>>> On 27.03.12 at 12:36, Ian Campbell <Ian.Campbell [at] citrix> wrote:
>> # HG changeset patch
>> # User Ian Campbell <ian.campbell [at] citrix>
>> # Date 1329216241 0
>> # Node ID 738424a5e5a5053c75cfbe64f6675b5d756daf1b
>> # Parent 0ba87b95e80bae059fe70b4b117dcc409f2471ef
>> xen: don't try to print IRQ SSID in IRQ debug from irq context.
>>
>> It is not possible to call xfree() in that context.
>>
>> Signed-off-by: Ian Campbell <ian.campbell [at] citrix>
>>
>> diff -r 0ba87b95e80b -r 738424a5e5a5 xen/arch/x86/irq.c
>> --- a/xen/arch/x86/irq.c Mon Feb 13 17:26:08 2012 +0000
>> +++ b/xen/arch/x86/irq.c Tue Feb 14 10:44:01 2012 +0000
>> @@ -2026,7 +2026,7 @@ static void dump_irqs(unsigned char key)
>> if ( !irq_desc_initialized(desc) || desc->handler == &no_irq_type )
>> continue;
>>
>> - ssid = xsm_show_irq_sid(irq);
>> + ssid = in_irq() ? NULL : xsm_show_irq_sid(irq);
>>
>> spin_lock_irqsave(&desc->lock, flags);
>>
>> @@ -2073,7 +2073,8 @@ static void dump_irqs(unsigned char key)
>>
>> spin_unlock_irqrestore(&desc->lock, flags);
>>
>> - xfree(ssid);
>> + if ( ssid )
>> + xfree(ssid);

But perhaps xfree(NULL) should be made usable in any context (i.e.
the assertion in there moved down)? Otherwise the construct above
is likely to get collapsed again at some point with "xfree(NULL) is
perfectly valid" in mind.

Jan

>> }
>>
>> dump_ioapic_irq_info();



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


apxeng at gmail

May 4, 2012, 12:48 PM

Post #6 of 24 (347 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On Tue, Mar 27, 2012 at 3:36 AM, Ian Campbell <Ian.Campbell [at] citrix> wrote:
> On Tue, 2012-02-14 at 10:44 +0000, Ian Campbell wrote:
>> On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
>> > flight 11946 xen-unstable real [real]
>> > http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
>> >
>> > Regressions :-(
>> >
>> > Tests which did not succeed and are blocking,
>> > including tests which could not be run:
>> >  test-amd64-i386-xl-credit2    7 debian-install            fail REGR. vs. 11944
>>
>> Host crash:
>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log
>>
>> This is the debug Andrew Cooper added recently to track down the IRQ
>> assertion we've been seeing, sadly it looks like the debug code tries to
>> call xfree from interrupt context and therefore doesn't produce full
>> output :-(
>
> Are we still seeing the issue this debugging was intended to address? We
> don't seem to be seeing the host crashes any more. Should the debug code
> be patched up as in the following patch, otherwise when we do see it it
> doesn't end up printing any useful info.
>
> Someone recently reported bugs.debian.org/665433 to Debian, is this the
> same underlying issue? That report is with Xen 4.0 FWIW.

I saw the issue (xen-unstable 25256:9dda0efd8ce1) that the debugging
code added. Can the fix to the debugging code be checked in until the
original issue has been fixed?

Thanks,
AP

(XEN) *** IRQ BUG found ***
(XEN) CPU0 -Testing vector 236 from bitmap
41,47,49,57,64,72,80,88,96,100,104,120,136,152,160-161,168,171,192,200-201,208
(XEN) Guest interrupt information:
(XEN) IRQ: 0 affinity:01 vec:f0 type=IO-APIC-edge
status=00000000 mapped, unbound
(XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
(XEN) ----[ Xen-4.2-unstable x86_64 debug=y Tainted: C ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<ffff82c48012cefb>] xfree+0x33/0x118
(XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor
(XEN) rax: 0000000000000000 rbx: ffff830214ac0080 rcx: 0000000000000000
(XEN) rdx: ffff82c4802d8880 rsi: 0000000000000083 rdi: 0000000000000000
(XEN) rbp: ffff82c4802b7c78 rsp: ffff82c4802b7c58 r8: 0000000000000004
(XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000010
(XEN) r12: ffff830214ac0c80 r13: 000000000000000c r14: ffff830214ac0ca8
(XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000426f0
(XEN) cr3: 0000000168971000 cr2: 0000000001095e00
(XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) Xen stack trace from rsp=ffff82c4802b7c58:
(XEN) ffff830214ac0080 ffff830214ac0c80 000000000000000c ffff830214ac0ca8
(XEN) ffff82c4802b7ce8 ffff82c4801664d4 ffff82c4802e214a ffff82c400000020
(XEN) ffff82c4802b7cf8 0000000000000083 ffff830214ac00a8 0000000000000000
(XEN) 00000000000000ec 00000000000000ec ffff830214ac0c80 000000000000000c
(XEN) ffff830214ac0ca8 ffff82c480302760 ffff82c4802b7d58 ffff82c480168000
(XEN) ffff82c4802b7f18 ffff82c4802b7f18 000000ec00000000 ffff82c4802b7f18
(XEN) 0000000000000000 0000000000000000 ffff82c480302324 0000000000000020
(XEN) ffff82c4802b7dd8 0000000000000003 0000000000000000 0000000000000000
(XEN) ffff82c4802b7dc8 ffff82c4801683d3 ffff8300da991000 ffff8300da996000
(XEN) 0000000000000000 ffffffff802b7d90 ffff82c480159160 ffff82c4802b7e20
(XEN) ffff82c48015d7db ffff82c4802b7f18 ffff8300da991000 0000000000000003
(XEN) 0000000000000000 0000000000000000 00007d3b7fd48207 ffff82c480160426
(XEN) 0000000000000000 0000000000000000 0000000000000003 ffff8300da991000
(XEN) ffff82c4802b7ef8 ffff82c4802b7f18 0000000000000282 ffff82c4802319a0
(XEN) 00000000deadbeef 0000000000000000 ffff83021c0b8081 0000000000000000
(XEN) 0000000000000048 ffff8801d7227ec0 ffff8300da991000 0000002000000000
(XEN) ffff82c4801865c1 000000000000e008 0000000000000202 ffff82c4802b7e88
(XEN) 000000000000e010 0000000000000003 ffff82c4802b7ef8 ffff82c4802230d8
(XEN) ffff82c4802b7f18 0000000000000000 0000000000000246 ffffffff810013aa
(XEN) 0000000000000000 ffffffff810013aa 000000000000e030 0000000000000246
(XEN) Xen call trace:
(XEN) [<ffff82c48012cefb>] xfree+0x33/0x118
(XEN) [<ffff82c4801664d4>] dump_irqs+0x2a4/0x2e8
(XEN) [<ffff82c480168000>] irq_move_cleanup_interrupt+0x29f/0x2db
(XEN) [<ffff82c4801683d3>] do_IRQ+0x9e/0x5a4
(XEN) [<ffff82c480160426>] common_interrupt+0x26/0x30
(XEN) [<ffff82c4801865c1>] async_exception_cleanup+0x1/0x35a
(XEN) [<ffff82c480228438>] syscall_enter+0xc8/0x122
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
(XEN) ****************************************
(XEN)
(XEN) Reboot in five seconds...

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


andrew.cooper3 at citrix

May 4, 2012, 1:11 PM

Post #7 of 24 (341 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On 04/05/12 20:48, AP wrote:
> On Tue, Mar 27, 2012 at 3:36 AM, Ian Campbell <Ian.Campbell [at] citrix> wrote:
>> On Tue, 2012-02-14 at 10:44 +0000, Ian Campbell wrote:
>>> On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
>>>> flight 11946 xen-unstable real [real]
>>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
>>>>
>>>> Regressions :-(
>>>>
>>>> Tests which did not succeed and are blocking,
>>>> including tests which could not be run:
>>>> test-amd64-i386-xl-credit2 7 debian-install fail REGR. vs. 11944
>>> Host crash:
>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log
>>>
>>> This is the debug Andrew Cooper added recently to track down the IRQ
>>> assertion we've been seeing, sadly it looks like the debug code tries to
>>> call xfree from interrupt context and therefore doesn't produce full
>>> output :-(
>> Are we still seeing the issue this debugging was intended to address? We
>> don't seem to be seeing the host crashes any more. Should the debug code
>> be patched up as in the following patch, otherwise when we do see it it
>> doesn't end up printing any useful info.
>>
>> Someone recently reported bugs.debian.org/665433 to Debian, is this the
>> same underlying issue? That report is with Xen 4.0 FWIW.
> I saw the issue (xen-unstable 25256:9dda0efd8ce1) that the debugging
> code added. Can the fix to the debugging code be checked in until the
> original issue has been fixed?
>
> Thanks,
> AP
>
> (XEN) *** IRQ BUG found ***
> (XEN) CPU0 -Testing vector 236 from bitmap
> 41,47,49,57,64,72,80,88,96,100,104,120,136,152,160-161,168,171,192,200-201,208
> (XEN) Guest interrupt information:
> (XEN) IRQ: 0 affinity:01 vec:f0 type=IO-APIC-edge
> status=00000000 mapped, unbound
> (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Tainted: C ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff82c48012cefb>] xfree+0x33/0x118
> (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor
> (XEN) rax: 0000000000000000 rbx: ffff830214ac0080 rcx: 0000000000000000
> (XEN) rdx: ffff82c4802d8880 rsi: 0000000000000083 rdi: 0000000000000000
> (XEN) rbp: ffff82c4802b7c78 rsp: ffff82c4802b7c58 r8: 0000000000000004
> (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000010
> (XEN) r12: ffff830214ac0c80 r13: 000000000000000c r14: ffff830214ac0ca8
> (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000426f0
> (XEN) cr3: 0000000168971000 cr2: 0000000001095e00
> (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802b7c58:
> (XEN) ffff830214ac0080 ffff830214ac0c80 000000000000000c ffff830214ac0ca8
> (XEN) ffff82c4802b7ce8 ffff82c4801664d4 ffff82c4802e214a ffff82c400000020
> (XEN) ffff82c4802b7cf8 0000000000000083 ffff830214ac00a8 0000000000000000
> (XEN) 00000000000000ec 00000000000000ec ffff830214ac0c80 000000000000000c
> (XEN) ffff830214ac0ca8 ffff82c480302760 ffff82c4802b7d58 ffff82c480168000
> (XEN) ffff82c4802b7f18 ffff82c4802b7f18 000000ec00000000 ffff82c4802b7f18
> (XEN) 0000000000000000 0000000000000000 ffff82c480302324 0000000000000020
> (XEN) ffff82c4802b7dd8 0000000000000003 0000000000000000 0000000000000000
> (XEN) ffff82c4802b7dc8 ffff82c4801683d3 ffff8300da991000 ffff8300da996000
> (XEN) 0000000000000000 ffffffff802b7d90 ffff82c480159160 ffff82c4802b7e20
> (XEN) ffff82c48015d7db ffff82c4802b7f18 ffff8300da991000 0000000000000003
> (XEN) 0000000000000000 0000000000000000 00007d3b7fd48207 ffff82c480160426
> (XEN) 0000000000000000 0000000000000000 0000000000000003 ffff8300da991000
> (XEN) ffff82c4802b7ef8 ffff82c4802b7f18 0000000000000282 ffff82c4802319a0
> (XEN) 00000000deadbeef 0000000000000000 ffff83021c0b8081 0000000000000000
> (XEN) 0000000000000048 ffff8801d7227ec0 ffff8300da991000 0000002000000000
> (XEN) ffff82c4801865c1 000000000000e008 0000000000000202 ffff82c4802b7e88
> (XEN) 000000000000e010 0000000000000003 ffff82c4802b7ef8 ffff82c4802230d8
> (XEN) ffff82c4802b7f18 0000000000000000 0000000000000246 ffffffff810013aa
> (XEN) 0000000000000000 ffffffff810013aa 000000000000e030 0000000000000246
> (XEN) Xen call trace:
> (XEN) [<ffff82c48012cefb>] xfree+0x33/0x118
> (XEN) [<ffff82c4801664d4>] dump_irqs+0x2a4/0x2e8
> (XEN) [<ffff82c480168000>] irq_move_cleanup_interrupt+0x29f/0x2db
> (XEN) [<ffff82c4801683d3>] do_IRQ+0x9e/0x5a4
> (XEN) [<ffff82c480160426>] common_interrupt+0x26/0x30
> (XEN) [<ffff82c4801865c1>] async_exception_cleanup+0x1/0x35a
> (XEN) [<ffff82c480228438>] syscall_enter+0xc8/0x122
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> (XEN) ****************************************
> (XEN)
> (XEN) Reboot in five seconds...
The attached patch should prevent this panic, allowing for all the debug
information to be printed to the console.

--
Andrew Cooper - Dom0 Kernel Engineer, Citrix XenServer
T: +44 (0)1223 225 900, http://www.citrix.com
Attachments: irq-fix-dump_irqs.patch (1.47 KB)


apxeng at gmail

May 4, 2012, 5:21 PM

Post #8 of 24 (343 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On Fri, May 4, 2012 at 8:11 PM, Andrew Cooper <andrew.cooper3 [at] citrix>
wrote:
>
> On 04/05/12 20:48, AP wrote:
> > On Tue, Mar 27, 2012 at 3:36 AM, Ian Campbell <Ian.Campbell [at] citrix>
> > wrote:
> >> On Tue, 2012-02-14 at 10:44 +0000, Ian Campbell wrote:
> >>> On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
> >>>> flight 11946 xen-unstable real [real]
> >>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
> >>>>
> >>>> Regressions :-(
> >>>>
> >>>> Tests which did not succeed and are blocking,
> >>>> including tests which could not be run:
> >>>> test-amd64-i386-xl-credit2 7 debian-install fail REGR.
> >>>> vs. 11944
> >>> Host crash:
> >>>
> >>>
http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log
> >>>
> >>> This is the debug Andrew Cooper added recently to track down the IRQ
> >>> assertion we've been seeing, sadly it looks like the debug code tries
> >>> to
> >>> call xfree from interrupt context and therefore doesn't produce full
> >>> output :-(
> >> Are we still seeing the issue this debugging was intended to address?
> >> We
> >> don't seem to be seeing the host crashes any more. Should the debug
> >> code
> >> be patched up as in the following patch, otherwise when we do see it it
> >> doesn't end up printing any useful info.
> >>
> >> Someone recently reported bugs.debian.org/665433 to Debian, is this the
> >> same underlying issue? That report is with Xen 4.0 FWIW.
> > I saw the issue (xen-unstable 25256:9dda0efd8ce1) that the debugging
> > code added. Can the fix to the debugging code be checked in until the
> > original issue has been fixed?
> >
> > Thanks,
> > AP
> >
> > (XEN) *** IRQ BUG found ***
> > (XEN) CPU0 -Testing vector 236 from bitmap
> >
> >
41,47,49,57,64,72,80,88,96,100,104,120,136,152,160-161,168,171,192,200-201,208
> > (XEN) Guest interrupt information:
> > (XEN) IRQ: 0 affinity:01 vec:f0 type=IO-APIC-edge
> > status=00000000 mapped, unbound
> > (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> > (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Tainted: C ]----
> > (XEN) CPU: 0
> > (XEN) RIP: e008:[<ffff82c48012cefb>] xfree+0x33/0x118
> > (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000 rbx: ffff830214ac0080 rcx:
> > 0000000000000000
> > (XEN) rdx: ffff82c4802d8880 rsi: 0000000000000083 rdi:
> > 0000000000000000
> > (XEN) rbp: ffff82c4802b7c78 rsp: ffff82c4802b7c58 r8:
> > 0000000000000004
> > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11:
> > 0000000000000010
> > (XEN) r12: ffff830214ac0c80 r13: 000000000000000c r14:
> > ffff830214ac0ca8
> > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4:
> > 00000000000426f0
> > (XEN) cr3: 0000000168971000 cr2: 0000000001095e00
> > (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c4802b7c58:
> > (XEN) ffff830214ac0080 ffff830214ac0c80 000000000000000c
> > ffff830214ac0ca8
> > (XEN) ffff82c4802b7ce8 ffff82c4801664d4 ffff82c4802e214a
> > ffff82c400000020
> > (XEN) ffff82c4802b7cf8 0000000000000083 ffff830214ac00a8
> > 0000000000000000
> > (XEN) 00000000000000ec 00000000000000ec ffff830214ac0c80
> > 000000000000000c
> > (XEN) ffff830214ac0ca8 ffff82c480302760 ffff82c4802b7d58
> > ffff82c480168000
> > (XEN) ffff82c4802b7f18 ffff82c4802b7f18 000000ec00000000
> > ffff82c4802b7f18
> > (XEN) 0000000000000000 0000000000000000 ffff82c480302324
> > 0000000000000020
> > (XEN) ffff82c4802b7dd8 0000000000000003 0000000000000000
> > 0000000000000000
> > (XEN) ffff82c4802b7dc8 ffff82c4801683d3 ffff8300da991000
> > ffff8300da996000
> > (XEN) 0000000000000000 ffffffff802b7d90 ffff82c480159160
> > ffff82c4802b7e20
> > (XEN) ffff82c48015d7db ffff82c4802b7f18 ffff8300da991000
> > 0000000000000003
> > (XEN) 0000000000000000 0000000000000000 00007d3b7fd48207
> > ffff82c480160426
> > (XEN) 0000000000000000 0000000000000000 0000000000000003
> > ffff8300da991000
> > (XEN) ffff82c4802b7ef8 ffff82c4802b7f18 0000000000000282
> > ffff82c4802319a0
> > (XEN) 00000000deadbeef 0000000000000000 ffff83021c0b8081
> > 0000000000000000
> > (XEN) 0000000000000048 ffff8801d7227ec0 ffff8300da991000
> > 0000002000000000
> > (XEN) ffff82c4801865c1 000000000000e008 0000000000000202
> > ffff82c4802b7e88
> > (XEN) 000000000000e010 0000000000000003 ffff82c4802b7ef8
> > ffff82c4802230d8
> > (XEN) ffff82c4802b7f18 0000000000000000 0000000000000246
> > ffffffff810013aa
> > (XEN) 0000000000000000 ffffffff810013aa 000000000000e030
> > 0000000000000246
> > (XEN) Xen call trace:
> > (XEN) [<ffff82c48012cefb>] xfree+0x33/0x118
> > (XEN) [<ffff82c4801664d4>] dump_irqs+0x2a4/0x2e8
> > (XEN) [<ffff82c480168000>] irq_move_cleanup_interrupt+0x29f/0x2db
> > (XEN) [<ffff82c4801683d3>] do_IRQ+0x9e/0x5a4
> > (XEN) [<ffff82c480160426>] common_interrupt+0x26/0x30
> > (XEN) [<ffff82c4801865c1>] async_exception_cleanup+0x1/0x35a
> > (XEN) [<ffff82c480228438>] syscall_enter+0xc8/0x122
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Reboot in five seconds...
> The attached patch should prevent this panic, allowing for all the debug
> information to be printed to the console.

Thanks, that fixed it. Here is what I see now:

(XEN) *** IRQ BUG found ***
(XEN) CPU0 -Testing vector 236 from bitmap
37,41,49,51,64,72,80,88,96,104,120,136,145,152,158,160,168,175,182,192,200,211
(XEN) Guest interrupt information:
(XEN) IRQ: 0 affinity:01 vec:f0 type=IO-APIC-edge status=00000000
mapped, unbound
(XEN) IRQ: 1 affinity:01 vec:d3 type=IO-APIC-edge status=00000030
in-flight=0 domain-list=0: 1(-S--),
(XEN) IRQ: 2 affinity:ff vec:e2 type=XT-PIC status=00000000
mapped, unbound
(XEN) IRQ: 3 affinity:01 vec:40 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 4 affinity:01 vec:48 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 5 affinity:01 vec:50 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 6 affinity:01 vec:58 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 7 affinity:01 vec:60 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 8 affinity:08 vec:29 type=IO-APIC-edge status=00000030
in-flight=0 domain-list=0: 8(-S--),
(XEN) IRQ: 9 affinity:02 vec:25 type=IO-APIC-level status=00000030
in-flight=0 domain-list=0: 9(-S--),
(XEN) IRQ: 10 affinity:01 vec:78 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 11 affinity:01 vec:88 type=IO-APIC-edge status=00000002
mapped, unbound
[ 5129.737147] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer
elapsed... blt ring idle [waiting on 1800652, at 1800652], missed IRQ?

Let me know if you need any more info.
Thanks,
AP


Ian.Campbell at citrix

May 5, 2012, 3:33 AM

Post #9 of 24 (342 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On Fri, 2012-05-04 at 21:11 +0100, Andrew Cooper wrote:
> On 04/05/12 20:48, AP wrote:
> > On Tue, Mar 27, 2012 at 3:36 AM, Ian Campbell <Ian.Campbell [at] citrix> wrote:
> >> On Tue, 2012-02-14 at 10:44 +0000, Ian Campbell wrote:
> >>> On Mon, 2012-02-13 at 20:16 +0000, xen.org wrote:
> >>>> flight 11946 xen-unstable real [real]
> >>>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/
> >>>>
> >>>> Regressions :-(
> >>>>
> >>>> Tests which did not succeed and are blocking,
> >>>> including tests which could not be run:
> >>>> test-amd64-i386-xl-credit2 7 debian-install fail REGR. vs. 11944
> >>> Host crash:
> >>> http://www.chiark.greenend.org.uk/~xensrcts/logs/11946/test-amd64-i386-xl-credit2/serial-woodlouse.log
> >>>
> >>> This is the debug Andrew Cooper added recently to track down the IRQ
> >>> assertion we've been seeing, sadly it looks like the debug code tries to
> >>> call xfree from interrupt context and therefore doesn't produce full
> >>> output :-(
> >> Are we still seeing the issue this debugging was intended to address? We
> >> don't seem to be seeing the host crashes any more. Should the debug code
> >> be patched up as in the following patch, otherwise when we do see it it
> >> doesn't end up printing any useful info.
> >>
> >> Someone recently reported bugs.debian.org/665433 to Debian, is this the
> >> same underlying issue? That report is with Xen 4.0 FWIW.
> > I saw the issue (xen-unstable 25256:9dda0efd8ce1) that the debugging
> > code added. Can the fix to the debugging code be checked in until the
> > original issue has been fixed?
> >
> > Thanks,
> > AP
> >
> > (XEN) *** IRQ BUG found ***
> > (XEN) CPU0 -Testing vector 236 from bitmap
> > 41,47,49,57,64,72,80,88,96,100,104,120,136,152,160-161,168,171,192,200-201,208
> > (XEN) Guest interrupt information:
> > (XEN) IRQ: 0 affinity:01 vec:f0 type=IO-APIC-edge
> > status=00000000 mapped, unbound
> > (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> > (XEN) ----[ Xen-4.2-unstable x86_64 debug=y Tainted: C ]----
> > (XEN) CPU: 0
> > (XEN) RIP: e008:[<ffff82c48012cefb>] xfree+0x33/0x118
> > (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor
> > (XEN) rax: 0000000000000000 rbx: ffff830214ac0080 rcx: 0000000000000000
> > (XEN) rdx: ffff82c4802d8880 rsi: 0000000000000083 rdi: 0000000000000000
> > (XEN) rbp: ffff82c4802b7c78 rsp: ffff82c4802b7c58 r8: 0000000000000004
> > (XEN) r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000010
> > (XEN) r12: ffff830214ac0c80 r13: 000000000000000c r14: ffff830214ac0ca8
> > (XEN) r15: 0000000000000000 cr0: 000000008005003b cr4: 00000000000426f0
> > (XEN) cr3: 0000000168971000 cr2: 0000000001095e00
> > (XEN) ds: 002b es: 002b fs: 0000 gs: 0000 ss: e010 cs: e008
> > (XEN) Xen stack trace from rsp=ffff82c4802b7c58:
> > (XEN) ffff830214ac0080 ffff830214ac0c80 000000000000000c ffff830214ac0ca8
> > (XEN) ffff82c4802b7ce8 ffff82c4801664d4 ffff82c4802e214a ffff82c400000020
> > (XEN) ffff82c4802b7cf8 0000000000000083 ffff830214ac00a8 0000000000000000
> > (XEN) 00000000000000ec 00000000000000ec ffff830214ac0c80 000000000000000c
> > (XEN) ffff830214ac0ca8 ffff82c480302760 ffff82c4802b7d58 ffff82c480168000
> > (XEN) ffff82c4802b7f18 ffff82c4802b7f18 000000ec00000000 ffff82c4802b7f18
> > (XEN) 0000000000000000 0000000000000000 ffff82c480302324 0000000000000020
> > (XEN) ffff82c4802b7dd8 0000000000000003 0000000000000000 0000000000000000
> > (XEN) ffff82c4802b7dc8 ffff82c4801683d3 ffff8300da991000 ffff8300da996000
> > (XEN) 0000000000000000 ffffffff802b7d90 ffff82c480159160 ffff82c4802b7e20
> > (XEN) ffff82c48015d7db ffff82c4802b7f18 ffff8300da991000 0000000000000003
> > (XEN) 0000000000000000 0000000000000000 00007d3b7fd48207 ffff82c480160426
> > (XEN) 0000000000000000 0000000000000000 0000000000000003 ffff8300da991000
> > (XEN) ffff82c4802b7ef8 ffff82c4802b7f18 0000000000000282 ffff82c4802319a0
> > (XEN) 00000000deadbeef 0000000000000000 ffff83021c0b8081 0000000000000000
> > (XEN) 0000000000000048 ffff8801d7227ec0 ffff8300da991000 0000002000000000
> > (XEN) ffff82c4801865c1 000000000000e008 0000000000000202 ffff82c4802b7e88
> > (XEN) 000000000000e010 0000000000000003 ffff82c4802b7ef8 ffff82c4802230d8
> > (XEN) ffff82c4802b7f18 0000000000000000 0000000000000246 ffffffff810013aa
> > (XEN) 0000000000000000 ffffffff810013aa 000000000000e030 0000000000000246
> > (XEN) Xen call trace:
> > (XEN) [<ffff82c48012cefb>] xfree+0x33/0x118
> > (XEN) [<ffff82c4801664d4>] dump_irqs+0x2a4/0x2e8
> > (XEN) [<ffff82c480168000>] irq_move_cleanup_interrupt+0x29f/0x2db
> > (XEN) [<ffff82c4801683d3>] do_IRQ+0x9e/0x5a4
> > (XEN) [<ffff82c480160426>] common_interrupt+0x26/0x30
> > (XEN) [<ffff82c4801865c1>] async_exception_cleanup+0x1/0x35a
> > (XEN) [<ffff82c480228438>] syscall_enter+0xc8/0x122
> > (XEN)
> > (XEN)
> > (XEN) ****************************************
> > (XEN) Panic on CPU 0:
> > (XEN) Assertion '!in_irq()' failed at xmalloc_tlsf.c:607
> > (XEN) ****************************************
> > (XEN)
> > (XEN) Reboot in five seconds...
> The attached patch should prevent this panic

This is effectively the same as my patch from
<1332844592.25560.9.camel [at] zakaz>. I think "if (ssid)
xfree(...)" is preferable to "if (in_irq()) xfree(...)" but not enough
to prevent me:

Acked-by: Ian Campbell <ian.campbell [at] citrix>

If the debug code is going to stay for 4.2 then IMHO we should also take
this patch to make it actually useful. Otherwise we should just revert
the original debug patch before the release.



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


andrew.cooper3 at citrix

May 5, 2012, 4:04 AM

Post #10 of 24 (350 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

> Thanks, that fixed it. Here is what I see now:
>
> (XEN) *** IRQ BUG found ***
> (XEN) CPU0 -Testing vector 236 from bitmap
> 37,41,49,51,64,72,80,88,96,104,120,136,145,152,158,160,168,175,182,192,200,211
> (XEN) Guest interrupt information:
> (XEN) IRQ: 0 affinity:01 vec:f0 type=IO-APIC-edge
> status=00000000 mapped, unbound
> (XEN) IRQ: 1 affinity:01 vec:d3 type=IO-APIC-edge
> status=00000030 in-flight=0 domain-list=0: 1(-S--),
> (XEN) IRQ: 2 affinity:ff vec:e2 type=XT-PIC
> status=00000000 mapped, unbound
> (XEN) IRQ: 3 affinity:01 vec:40 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 4 affinity:01 vec:48 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 5 affinity:01 vec:50 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 6 affinity:01 vec:58 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 7 affinity:01 vec:60 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 8 affinity:08 vec:29 type=IO-APIC-edge
> status=00000030 in-flight=0 domain-list=0: 8(-S--),
> (XEN) IRQ: 9 affinity:02 vec:25 type=IO-APIC-level
> status=00000030 in-flight=0 domain-list=0: 9(-S--),
> (XEN) IRQ: 10 affinity:01 vec:78 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 11 affinity:01 vec:88 type=IO-APIC-edge
> status=00000002 mapped, unbound
> [ 5129.737147] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer
> elapsed... blt ring idle [waiting on 1800652, at 1800652], missed IRQ?
>
> Let me know if you need any more info.
> Thanks,
> AP
>

There should be quite a lot more irq information dumped than just that.
Was there any more on the console or had it given up by that point? It
might be worth trying to set synchronous console to get all of that
debug information?

How easy is this error to reproduce for you? I never managed to
reproduce it reliably enough to be able to debug?

If you could provide your Xen boot console log, that would be very useful

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


andrew.cooper3 at citrix

May 5, 2012, 4:11 AM

Post #11 of 24 (344 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

>> The attached patch should prevent this panic
> This is effectively the same as my patch from
> <1332844592.25560.9.camel [at] zakaz>. I think "if (ssid)
> xfree(...)" is preferable to "if (in_irq()) xfree(...)" but not enough
> to prevent me:
>
> Acked-by: Ian Campbell <ian.campbell [at] citrix>
>
> If the debug code is going to stay for 4.2 then IMHO we should also take
> this patch to make it actually useful. Otherwise we should just revert
> the original debug patch before the release.
>
>

Yes - I was thinking the same. I suggest that when xen-4.2-testing.hg
gets branched off unstable, this debugging gets put back to just being
an assert as before. However, I am quite unsure as to what would happen
with interrupts following that failed assert.

I shall re-do the patch. I think it is a fairly sensible patch to have
in even after the main debugging has been removed, especially if similar
debugging needs to be done in the future.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


apxeng at gmail

May 5, 2012, 11:41 AM

Post #12 of 24 (346 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On Sat, May 5, 2012 at 4:04 AM, Andrew Cooper <andrew.cooper3 [at] citrix>
wrote:
>
>
> > Thanks, that fixed it. Here is what I see now:
> >
> > (XEN) *** IRQ BUG found ***
> > (XEN) CPU0 -Testing vector 236 from bitmap
> >
37,41,49,51,64,72,80,88,96,104,120,136,145,152,158,160,168,175,182,192,200,211
> > (XEN) Guest interrupt information:
> > (XEN) IRQ: 0 affinity:01 vec:f0 type=IO-APIC-edge
> > status=00000000 mapped, unbound
> > (XEN) IRQ: 1 affinity:01 vec:d3 type=IO-APIC-edge
> > status=00000030 in-flight=0 domain-list=0: 1(-S--),
> > (XEN) IRQ: 2 affinity:ff vec:e2 type=XT-PIC
> > status=00000000 mapped, unbound
> > (XEN) IRQ: 3 affinity:01 vec:40 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > (XEN) IRQ: 4 affinity:01 vec:48 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > (XEN) IRQ: 5 affinity:01 vec:50 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > (XEN) IRQ: 6 affinity:01 vec:58 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > (XEN) IRQ: 7 affinity:01 vec:60 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > (XEN) IRQ: 8 affinity:08 vec:29 type=IO-APIC-edge
> > status=00000030 in-flight=0 domain-list=0: 8(-S--),
> > (XEN) IRQ: 9 affinity:02 vec:25 type=IO-APIC-level
> > status=00000030 in-flight=0 domain-list=0: 9(-S--),
> > (XEN) IRQ: 10 affinity:01 vec:78 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > (XEN) IRQ: 11 affinity:01 vec:88 type=IO-APIC-edge
> > status=00000002 mapped, unbound
> > [ 5129.737147] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer
> > elapsed... blt ring idle [waiting on 1800652, at 1800652], missed IRQ?
> >
> > Let me know if you need any more info.
> > Thanks,
> > AP
> >
>
> There should be quite a lot more irq information dumped than just that.
> Was there any more on the console or had it given up by that point? It

There was nothing more on the console. The system was hung.

> might be worth trying to set synchronous console to get all of that
> debug information?

I was running with sync_console and console_to_ring options.

> How easy is this error to reproduce for you? I never managed to
> reproduce it reliably enough to be able to debug?

I cannot reproduce it easily either.

> If you could provide your Xen boot console log, that would be very useful

I will send full logs the next time I see the problem.

Thanks,
AP


apxeng at gmail

May 5, 2012, 12:06 PM

Post #13 of 24 (342 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On Sat, May 5, 2012 at 11:41 AM, AP <apxeng [at] gmail> wrote:
>
> On Sat, May 5, 2012 at 4:04 AM, Andrew Cooper <andrew.cooper3 [at] citrix>
wrote:
> >
> >
> > > Thanks, that fixed it. Here is what I see now:
> > >
> > > (XEN) *** IRQ BUG found ***
> > > (XEN) CPU0 -Testing vector 236 from bitmap
> > >
37,41,49,51,64,72,80,88,96,104,120,136,145,152,158,160,168,175,182,192,200,211
> > > (XEN) Guest interrupt information:
> > > (XEN) IRQ: 0 affinity:01 vec:f0 type=IO-APIC-edge
> > > status=00000000 mapped, unbound
> > > (XEN) IRQ: 1 affinity:01 vec:d3 type=IO-APIC-edge
> > > status=00000030 in-flight=0 domain-list=0: 1(-S--),
> > > (XEN) IRQ: 2 affinity:ff vec:e2 type=XT-PIC
> > > status=00000000 mapped, unbound
> > > (XEN) IRQ: 3 affinity:01 vec:40 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > (XEN) IRQ: 4 affinity:01 vec:48 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > (XEN) IRQ: 5 affinity:01 vec:50 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > (XEN) IRQ: 6 affinity:01 vec:58 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > (XEN) IRQ: 7 affinity:01 vec:60 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > (XEN) IRQ: 8 affinity:08 vec:29 type=IO-APIC-edge
> > > status=00000030 in-flight=0 domain-list=0: 8(-S--),
> > > (XEN) IRQ: 9 affinity:02 vec:25 type=IO-APIC-level
> > > status=00000030 in-flight=0 domain-list=0: 9(-S--),
> > > (XEN) IRQ: 10 affinity:01 vec:78 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > (XEN) IRQ: 11 affinity:01 vec:88 type=IO-APIC-edge
> > > status=00000002 mapped, unbound
> > > [ 5129.737147] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer
> > > elapsed... blt ring idle [waiting on 1800652, at 1800652], missed IRQ?
> > >
> > > Let me know if you need any more info.
> > > Thanks,
> > > AP
> > >
> >
> > There should be quite a lot more irq information dumped than just that.
> > Was there any more on the console or had it given up by that point? It
>
> There was nothing more on the console. The system was hung.
>
> > might be worth trying to set synchronous console to get all of that
> > debug information?
>
> I was running with sync_console and console_to_ring options.
>
>
> > How easy is this error to reproduce for you? I never managed to
> > reproduce it reliably enough to be able to debug?
>
> I cannot reproduce it easily either.
>
>
> > If you could provide your Xen boot console log, that would be very
useful
>
> I will send full logs the next time I see the problem.

I have attached the full logs. I had a CentOS 5.6 and a Windows 7 HVM
domain running.

Thanks,
AP
Attachments: irq_bug.log (70.4 KB)


JBeulich at suse

May 7, 2012, 1:10 AM

Post #14 of 24 (336 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

>>> On 05.05.12 at 02:21, AP <apxeng [at] gmail> wrote:
> (XEN) *** IRQ BUG found ***
> (XEN) CPU0 -Testing vector 236 from bitmap

236 = 0xec = FIRST_LEGACY_VECTOR + 0x0c, i.e. an IRQ12 coming
in through the 8259A. Something fundamentally fishy must be going
on here, and I would suppose the code in question shouldn't even be
reached for legacy vectors.

Furthermore, calling dump_irqs() from the debugging code with
desc->lock still held makes it impossible to get full output, as that
function wants to lock all initialized IRQ descriptors.

Jan

> 37,41,49,51,64,72,80,88,96,104,120,136,145,152,158,160,168,175,182,192,200,211
> (XEN) Guest interrupt information:
> (XEN) IRQ: 0 affinity:01 vec:f0 type=IO-APIC-edge status=00000000
> mapped, unbound
> (XEN) IRQ: 1 affinity:01 vec:d3 type=IO-APIC-edge status=00000030
> in-flight=0 domain-list=0: 1(-S--),
> (XEN) IRQ: 2 affinity:ff vec:e2 type=XT-PIC status=00000000
> mapped, unbound
> (XEN) IRQ: 3 affinity:01 vec:40 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 4 affinity:01 vec:48 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 5 affinity:01 vec:50 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 6 affinity:01 vec:58 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 7 affinity:01 vec:60 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 8 affinity:08 vec:29 type=IO-APIC-edge status=00000030
> in-flight=0 domain-list=0: 8(-S--),
> (XEN) IRQ: 9 affinity:02 vec:25 type=IO-APIC-level status=00000030
> in-flight=0 domain-list=0: 9(-S--),
> (XEN) IRQ: 10 affinity:01 vec:78 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 11 affinity:01 vec:88 type=IO-APIC-edge status=00000002
> mapped, unbound
> [ 5129.737147] [drm:i915_hangcheck_ring_idle] *ERROR* Hangcheck timer
> elapsed... blt ring idle [waiting on 1800652, at 1800652], missed IRQ?
>
> Let me know if you need any more info.
> Thanks,
> AP




_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


andrew.cooper3 at citrix

May 7, 2012, 4:50 AM

Post #15 of 24 (333 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On 07/05/2012 09:10, Jan Beulich wrote:
>>>> On 05.05.12 at 02:21, AP <apxeng [at] gmail> wrote:
>> (XEN) *** IRQ BUG found ***
>> (XEN) CPU0 -Testing vector 236 from bitmap
> 236 = 0xec = FIRST_LEGACY_VECTOR + 0x0c, i.e. an IRQ12 coming
> in through the 8259A. Something fundamentally fishy must be going
> on here, and I would suppose the code in question shouldn't even be
> reached for legacy vectors.
>
> Furthermore, calling dump_irqs() from the debugging code with
> desc->lock still held makes it impossible to get full output, as that
> function wants to lock all initialized IRQ descriptors.
>
> Jan

Yes - it has been vector 236 on each of the 3 reported failures from AP,
and I believe it was also vector 236 in the one case I managed to
reproduce the issue.

However, once we have set up the IO-APIC, the 8259A should not be used
any more. The boot dmeg shows that io_ack_method is indeed "old" (which
was going to be my first suggestion), and that EOI Broadcast Suppression
is enabled, which I have already identified as a source of problems for
some customers. As a 'fix', I provided the ability for
"io_ack_method=new" to prevent EOI Broadcast Suppression being enabled.
This was upstreamed in c/s 24870:9bf3ec036bef, but apparently has not
completely fixed the customer problems - just made it substantially more
rare.

AP: Can you manually invoke the 'i' debug key and provide that - it will
help to see how Xen is setting up the IO-APIC(s) on your system.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


JBeulich at suse

May 7, 2012, 6:34 AM

Post #16 of 24 (334 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

>>> On 07.05.12 at 13:50, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
> On 07/05/2012 09:10, Jan Beulich wrote:
>>>>> On 05.05.12 at 02:21, AP <apxeng [at] gmail> wrote:
>>> (XEN) *** IRQ BUG found ***
>>> (XEN) CPU0 -Testing vector 236 from bitmap
>> 236 = 0xec = FIRST_LEGACY_VECTOR + 0x0c, i.e. an IRQ12 coming
>> in through the 8259A. Something fundamentally fishy must be going
>> on here, and I would suppose the code in question shouldn't even be
>> reached for legacy vectors.
>>
>> Furthermore, calling dump_irqs() from the debugging code with
>> desc->lock still held makes it impossible to get full output, as that
>> function wants to lock all initialized IRQ descriptors.
>
> Yes - it has been vector 236 on each of the 3 reported failures from AP,
> and I believe it was also vector 236 in the one case I managed to
> reproduce the issue.
>
> However, once we have set up the IO-APIC, the 8259A should not be used
> any more. The boot dmeg shows that io_ack_method is indeed "old" (which
> was going to be my first suggestion), and that EOI Broadcast Suppression
> is enabled, which I have already identified as a source of problems for
> some customers. As a 'fix', I provided the ability for
> "io_ack_method=new" to prevent EOI Broadcast Suppression being enabled.
> This was upstreamed in c/s 24870:9bf3ec036bef, but apparently has not
> completely fixed the customer problems - just made it substantially more
> rare.
>
> AP: Can you manually invoke the 'i' debug key and provide that - it will
> help to see how Xen is setting up the IO-APIC(s) on your system.

Seeing the 'z' output might also be helpful, especially to see whether
any of the IO-APICs' RTEs is an ExtINT one.

Further, checking that no 8259A IRQ got (or was left) enabled for
some reason might be useful as well (cached_irq_mask plus the raw
port 0x21 and 0xA1 values).

In any case the debugging code's locking should be fixed.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


andrew.cooper3 at citrix

May 7, 2012, 7:41 AM

Post #17 of 24 (334 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On 07/05/2012 14:34, Jan Beulich wrote:
>>>> On 07.05.12 at 13:50, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
>> On 07/05/2012 09:10, Jan Beulich wrote:
>>>>>> On 05.05.12 at 02:21, AP <apxeng [at] gmail> wrote:
>>>> (XEN) *** IRQ BUG found ***
>>>> (XEN) CPU0 -Testing vector 236 from bitmap
>>> 236 = 0xec = FIRST_LEGACY_VECTOR + 0x0c, i.e. an IRQ12 coming
>>> in through the 8259A. Something fundamentally fishy must be going
>>> on here, and I would suppose the code in question shouldn't even be
>>> reached for legacy vectors.
>>>
>>> Furthermore, calling dump_irqs() from the debugging code with
>>> desc->lock still held makes it impossible to get full output, as that
>>> function wants to lock all initialized IRQ descriptors.
>> Yes - it has been vector 236 on each of the 3 reported failures from AP,
>> and I believe it was also vector 236 in the one case I managed to
>> reproduce the issue.
>>
>> However, once we have set up the IO-APIC, the 8259A should not be used
>> any more. The boot dmeg shows that io_ack_method is indeed "old" (which
>> was going to be my first suggestion), and that EOI Broadcast Suppression
>> is enabled, which I have already identified as a source of problems for
>> some customers. As a 'fix', I provided the ability for
>> "io_ack_method=new" to prevent EOI Broadcast Suppression being enabled.
>> This was upstreamed in c/s 24870:9bf3ec036bef, but apparently has not
>> completely fixed the customer problems - just made it substantially more
>> rare.
>>
>> AP: Can you manually invoke the 'i' debug key and provide that - it will
>> help to see how Xen is setting up the IO-APIC(s) on your system.
> Seeing the 'z' output might also be helpful, especially to see whether
> any of the IO-APICs' RTEs is an ExtINT one.
>
> Further, checking that no 8259A IRQ got (or was left) enabled for
> some reason might be useful as well (cached_irq_mask plus the raw
> port 0x21 and 0xA1 values).
>
> In any case the debugging code's locking should be fixed.
>
> Jan
>

It appears we have two functions to dump the IO-APIC state:
__print_IO_APIC() which gets called on boot and from 'z', and
dump_ioapic_irq_info() which gets called from the end of 'i'. These
should probably be consolidated somehow.

As for the debugging, perhaps change the call to dump_irqs() with a call
to dump_ioapic_irq_info() instead.

Given that the legacy vectors cant migrate, is it wise including them in
the loop in irq_move_cleanup_interrupt()? In fact, is it wise including
any vector above LAST_DYNAMIC_VECTOR?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


JBeulich at suse

May 7, 2012, 7:50 AM

Post #18 of 24 (334 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

>>> On 07.05.12 at 16:41, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
> Given that the legacy vectors cant migrate, is it wise including them in
> the loop in irq_move_cleanup_interrupt()? In fact, is it wise including
> any vector above LAST_DYNAMIC_VECTOR?

Likely not, but then again this is the final piece of moving an interrupt,
so there must have been something earlier that incorrectly initiated a
move. In other words, rather than fixing the loop here, we should
make sure execution can't even make it there for legacy vectors.

And of course this is irrespective of the fact that no legacy interrupt
should occur in the first place, unless this is a very strange system.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


JBeulich at suse

May 7, 2012, 7:54 AM

Post #19 of 24 (334 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

>>> On 07.05.12 at 16:41, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
> It appears we have two functions to dump the IO-APIC state:
> __print_IO_APIC() which gets called on boot and from 'z', and
> dump_ioapic_irq_info() which gets called from the end of 'i'. These
> should probably be consolidated somehow.

Rather not - 'z' provides information on the IO-APIC that isn't
directly related to specific interrupts, while 'i' (when it comes to
the IO-APIC) is exclusively interested in the RTEs. Unless
dump_ioapic_irq_info() is _fully_ redundant with 'z' (didn't check
in detail yet), in which case I'd vote for removing this function.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


andrew.cooper3 at citrix

May 7, 2012, 8:40 AM

Post #20 of 24 (334 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On 07/05/2012 15:50, Jan Beulich wrote:
>>>> On 07.05.12 at 16:41, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
>> Given that the legacy vectors cant migrate, is it wise including them in
>> the loop in irq_move_cleanup_interrupt()? In fact, is it wise including
>> any vector above LAST_DYNAMIC_VECTOR?
> Likely not, but then again this is the final piece of moving an interrupt,
> so there must have been something earlier that incorrectly initiated a
> move. In other words, rather than fixing the loop here, we should
> make sure execution can't even make it there for legacy vectors.
>
> And of course this is irrespective of the fact that no legacy interrupt
> should occur in the first place, unless this is a very strange system.
>
> Jan
>

The only way to get to this point is if desc->arch.move_cleanup_count is
non 0, in which case, one of these functions:

hpet_msi_ack (hpet.c)
ack_edge_ioapic_irq (io_apci.c)
mask_and_ack_level_ioapic_irq (io_apic.c)
ack_nonmaskable_msi_irq (msi.c)
iommu_msi_mask (iommu_init.c)
dma_msi_mask (iommu.c)

has called irq_complete_move, after something has called
__assign_irq_vector() to move the irq to another CPU.

I would say something very fishy is going on - no desc used by any of
those functions should have a vector from the legacy region.

As for the loop, it is probably quite sensible to reduce that down to
LAST_DYNAMIC_VECTOR. Leaving it at NR_VECTORS is just 32 wasted
iterations of the loop in interrupt context.

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


JBeulich at suse

May 7, 2012, 8:43 AM

Post #21 of 24 (334 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

>>> On 07.05.12 at 17:40, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
> As for the loop, it is probably quite sensible to reduce that down to
> LAST_DYNAMIC_VECTOR. Leaving it at NR_VECTORS is just 32 wasted
> iterations of the loop in interrupt context.

No, you can't leave there. You'd have to skip the legacy vectors, and
continue with the ones Xen itself may have in use.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


andrew.cooper3 at citrix

May 7, 2012, 8:51 AM

Post #22 of 24 (332 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On 07/05/2012 15:54, Jan Beulich wrote:
>>>> On 07.05.12 at 16:41, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
>> It appears we have two functions to dump the IO-APIC state:
>> __print_IO_APIC() which gets called on boot and from 'z', and
>> dump_ioapic_irq_info() which gets called from the end of 'i'. These
>> should probably be consolidated somehow.
> Rather not - 'z' provides information on the IO-APIC that isn't
> directly related to specific interrupts, while 'i' (when it comes to
> the IO-APIC) is exclusively interested in the RTEs. Unless
> dump_ioapic_irq_info() is _fully_ redundant with 'z' (didn't check
> in detail yet), in which case I'd vote for removing this function.
>
> Jan
>

dump_ioapic_irq_info() loops through nr_irqs_gsi and uses irq_2_pin to
work out which io-apic RTE to read and decode.

__print_IO_APIC() loop through nr_ioapics, then through each RTE and
decodes it. At the end, it loops through nr_irqs_gsi and matches irqs
to ioapic:pin pairs.

So they are probably different enough to be worth keeping.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


apxeng at gmail

May 7, 2012, 11:29 AM

Post #23 of 24 (339 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

On Mon, May 7, 2012 at 1:34 PM, Jan Beulich <JBeulich [at] suse> wrote:
>
> >>> On 07.05.12 at 13:50, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
> > On 07/05/2012 09:10, Jan Beulich wrote:
> >>>>> On 05.05.12 at 02:21, AP <apxeng [at] gmail> wrote:
> >>> (XEN) *** IRQ BUG found ***
> >>> (XEN) CPU0 -Testing vector 236 from bitmap
> >> 236 = 0xec = FIRST_LEGACY_VECTOR + 0x0c, i.e. an IRQ12 coming
> >> in through the 8259A. Something fundamentally fishy must be going
> >> on here, and I would suppose the code in question shouldn't even be
> >> reached for legacy vectors.
> >>
> >> Furthermore, calling dump_irqs() from the debugging code with
> >> desc->lock still held makes it impossible to get full output, as that
> >> function wants to lock all initialized IRQ descriptors.
> >
> > Yes - it has been vector 236 on each of the 3 reported failures from AP,
> > and I believe it was also vector 236 in the one case I managed to
> > reproduce the issue.
> >
> > However, once we have set up the IO-APIC, the 8259A should not be used
> > any more. The boot dmeg shows that io_ack_method is indeed "old" (which
> > was going to be my first suggestion), and that EOI Broadcast Suppression
> > is enabled, which I have already identified as a source of problems for
> > some customers. As a 'fix', I provided the ability for
> > "io_ack_method=new" to prevent EOI Broadcast Suppression being enabled.
> > This was upstreamed in c/s 24870:9bf3ec036bef, but apparently has not
> > completely fixed the customer problems - just made it substantially more
> > rare.
> >
> > AP: Can you manually invoke the 'i' debug key and provide that - it will
> > help to see how Xen is setting up the IO-APIC(s) on your system.

(XEN) Guest interrupt information:
(XEN) IRQ: 0 affinity:01 vec:f0 type=IO-APIC-edge status=00000000
mapped, unbound
(XEN) IRQ: 1 affinity:02 vec:85 type=IO-APIC-edge status=00000030
in-flight=0 domain-list=0: 1(----),
(XEN) IRQ: 2 affinity:ff vec:e2 type=XT-PIC status=00000000
mapped, unbound
(XEN) IRQ: 3 affinity:01 vec:40 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 4 affinity:01 vec:48 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 5 affinity:01 vec:50 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 6 affinity:01 vec:58 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 7 affinity:01 vec:60 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 8 affinity:08 vec:29 type=IO-APIC-edge status=00000030
in-flight=0 domain-list=0: 8(----),
(XEN) IRQ: 9 affinity:02 vec:7f type=IO-APIC-level status=00000010
in-flight=0 domain-list=0: 9(----),
(XEN) IRQ: 10 affinity:01 vec:78 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 11 affinity:01 vec:88 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 12 affinity:08 vec:d4 type=IO-APIC-edge status=00000030
in-flight=0 domain-list=0: 12(----),
(XEN) IRQ: 13 affinity:0f vec:98 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 14 affinity:01 vec:a0 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 15 affinity:01 vec:a8 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 16 affinity:02 vec:a6 type=IO-APIC-level status=00000030
in-flight=0 domain-list=0: 16(----),
(XEN) IRQ: 17 affinity:0f vec:c0 type=IO-APIC-level status=00000002
mapped, unbound
(XEN) IRQ: 18 affinity:0f vec:c8 type=IO-APIC-level status=00000002
mapped, unbound
(XEN) IRQ: 19 affinity:0f vec:f1 type=IO-APIC-level status=00000000
mapped, unbound
(XEN) IRQ: 20 affinity:0f vec:61 type=IO-APIC-level status=00000002
mapped, unbound
(XEN) IRQ: 22 affinity:0f vec:32 type=IO-APIC-level status=00000002
mapped, unbound
(XEN) IRQ: 23 affinity:01 vec:ac type=IO-APIC-level status=00000030
in-flight=0 domain-list=0: 23(----),
(XEN) IRQ: 24 affinity:01 vec:28 type=DMA_MSI status=00000000
mapped, unbound
(XEN) IRQ: 25 affinity:01 vec:30 type=DMA_MSI status=00000000
mapped, unbound
(XEN) IRQ: 26 affinity:01 vec:31 type=PCI-MSI/-X status=00000030
in-flight=0 domain-list=0:279(----),
(XEN) IRQ: 27 affinity:01 vec:39 type=PCI-MSI/-X status=00000030
in-flight=0 domain-list=0:278(----),
(XEN) IRQ: 28 affinity:01 vec:41 type=PCI-MSI/-X status=00000030
in-flight=0 domain-list=0:277(----),
(XEN) IRQ: 29 affinity:01 vec:49 type=PCI-MSI/-X status=00000030
in-flight=0 domain-list=0:276(----),
(XEN) IRQ: 30 affinity:01 vec:51 type=PCI-MSI/-X status=00000030
in-flight=0 domain-list=0:275(----),
(XEN) IRQ: 31 affinity:04 vec:d7 type=PCI-MSI status=00000030
in-flight=0 domain-list=0:274(----),
(XEN) IRQ: 32 affinity:04 vec:df type=PCI-MSI status=00000030
in-flight=0 domain-list=0:273(----),
(XEN) IRQ: 33 affinity:02 vec:b0 type=PCI-MSI status=00000010
in-flight=0 domain-list=0:272(----),
(XEN) IRQ: 34 affinity:02 vec:a8 type=PCI-MSI status=00000010
in-flight=0 domain-list=0:271(----),
(XEN) IRQ: 35 affinity:04 vec:ad type=PCI-MSI status=00000030
in-flight=0 domain-list=0:270(----),
(XEN) IO-APIC interrupt information:
(XEN) IRQ 0 Vec240:
(XEN) Apic 0x00, Pin 2: vec=f0 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 1 Vec133:
(XEN) Apic 0x00, Pin 1: vec=85 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 3 Vec 64:
(XEN) Apic 0x00, Pin 3: vec=40 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 4 Vec 72:
(XEN) Apic 0x00, Pin 4: vec=48 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 5 Vec 80:
(XEN) Apic 0x00, Pin 5: vec=50 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 6 Vec 88:
(XEN) Apic 0x00, Pin 6: vec=58 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 7 Vec 96:
(XEN) Apic 0x00, Pin 7: vec=60 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 8 Vec 41:
(XEN) Apic 0x00, Pin 8: vec=29 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 9 Vec127:
(XEN) Apic 0x00, Pin 9: vec=7f delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=L mask=0 dest_id:0
(XEN) IRQ 10 Vec120:
(XEN) Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 11 Vec136:
(XEN) Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 12 Vec212:
(XEN) Apic 0x00, Pin 12: vec=d4 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 13 Vec152:
(XEN) Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=1 dest_id:0
(XEN) IRQ 14 Vec160:
(XEN) Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 15 Vec168:
(XEN) Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 16 Vec166:
(XEN) Apic 0x00, Pin 16: vec=a6 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN) IRQ 17 Vec192:
(XEN) Apic 0x00, Pin 17: vec=c0 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN) IRQ 18 Vec200:
(XEN) Apic 0x00, Pin 18: vec=c8 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN) IRQ 19 Vec241:
(XEN) Apic 0x00, Pin 19: vec=f1 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN) IRQ 20 Vec 97:
(XEN) Apic 0x00, Pin 20: vec=61 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN) IRQ 22 Vec 50:
(XEN) Apic 0x00, Pin 22: vec=32 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN) IRQ 23 Vec172:
(XEN) Apic 0x00, Pin 23: vec=ac delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0

> Seeing the 'z' output might also be helpful, especially to see whether
> any of the IO-APICs' RTEs is an ExtINT one.

(XEN) number of MP IRQ sources: 15.
(XEN) number of IO-APIC #2 registers: 24.
(XEN) testing the IO APIC.......................
(XEN) IO APIC #2......
(XEN) .... register #00: 02000000
(XEN) ....... : physical APIC id: 02
(XEN) ....... : Delivery Type: 0
(XEN) ....... : LTS : 0
(XEN) .... register #01: 00170020
(XEN) ....... : max redirection entries: 0017
(XEN) ....... : PRQ implemented: 0
(XEN) ....... : IO APIC version: 0020
(XEN) .... IRQ redirection table:
(XEN) NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
(XEN) 00 000 00 1 0 0 0 0 0 0 00
(XEN) 01 000 00 0 0 0 0 0 1 1 85
(XEN) 02 000 00 0 0 0 0 0 1 1 F0
(XEN) 03 000 00 0 0 0 0 0 1 1 40
(XEN) 04 000 00 0 0 0 0 0 1 1 48
(XEN) 05 000 00 0 0 0 0 0 1 1 50
(XEN) 06 000 00 0 0 0 0 0 1 1 58
(XEN) 07 000 00 0 0 0 0 0 1 1 60
(XEN) 08 000 00 0 0 0 0 0 1 1 29
(XEN) 09 000 00 0 1 0 0 0 1 1 A7
(XEN) 0a 000 00 0 0 0 0 0 1 1 78
(XEN) 0b 000 00 0 0 0 0 0 1 1 88
(XEN) 0c 000 00 0 0 0 0 0 1 1 D4
(XEN) 0d 000 00 1 0 0 0 0 1 1 98
(XEN) 0e 000 00 0 0 0 0 0 1 1 A0
(XEN) 0f 000 00 0 0 0 0 0 1 1 A8
(XEN) 10 000 00 0 1 0 1 0 1 1 AE
(XEN) 11 000 00 1 1 0 1 0 1 1 C0
(XEN) 12 000 00 1 1 0 1 0 1 1 C8
(XEN) 13 000 00 0 1 0 1 0 1 1 F1
(XEN) 14 000 00 1 1 0 1 0 1 1 61
(XEN) 15 0CA 0A 1 0 0 0 0 1 2 71
(XEN) 16 000 00 1 1 0 1 0 1 1 32
(XEN) 17 000 00 0 1 0 1 0 1 1 AC
(XEN) Using vector-based indexing
(XEN) IRQ to pin mappings:
(XEN) IRQ240 -> 0:2
(XEN) IRQ133 -> 0:1
(XEN) IRQ64 -> 0:3
(XEN) IRQ72 -> 0:4
(XEN) IRQ80 -> 0:5
(XEN) IRQ88 -> 0:6
(XEN) IRQ96 -> 0:7
(XEN) IRQ41 -> 0:8
(XEN) IRQ167 -> 0:9
(XEN) IRQ120 -> 0:10
(XEN) IRQ136 -> 0:11
(XEN) IRQ212 -> 0:12
(XEN) IRQ152 -> 0:13
(XEN) IRQ160 -> 0:14
(XEN) IRQ168 -> 0:15
(XEN) IRQ174 -> 0:16
(XEN) IRQ192 -> 0:17
(XEN) IRQ200 -> 0:18
(XEN) IRQ241 -> 0:19
(XEN) IRQ97 -> 0:20
(XEN) IRQ50 -> 0:22
(XEN) IRQ172 -> 0:23
(XEN) .................................... done.


JBeulich at suse

May 7, 2012, 11:37 PM

Post #24 of 24 (334 views)
Permalink
Re: [xen-unstable test] 11946: regressions - FAIL [In reply to]

>>> On 07.05.12 at 20:29, AP <apxeng [at] gmail> wrote:
> On Mon, May 7, 2012 at 1:34 PM, Jan Beulich <JBeulich [at] suse> wrote:
>> Seeing the 'z' output might also be helpful, especially to see whether
>> any of the IO-APICs' RTEs is an ExtINT one.
>
> (XEN) number of MP IRQ sources: 15.
> (XEN) number of IO-APIC #2 registers: 24.
> (XEN) testing the IO APIC.......................
> (XEN) IO APIC #2......
> (XEN) .... register #00: 02000000
> (XEN) ....... : physical APIC id: 02
> (XEN) ....... : Delivery Type: 0
> (XEN) ....... : LTS : 0
> (XEN) .... register #01: 00170020
> (XEN) ....... : max redirection entries: 0017
> (XEN) ....... : PRQ implemented: 0
> (XEN) ....... : IO APIC version: 0020
> (XEN) .... IRQ redirection table:
> (XEN) NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
> (XEN) 00 000 00 1 0 0 0 0 0 0 00
> (XEN) 01 000 00 0 0 0 0 0 1 1 85
> (XEN) 02 000 00 0 0 0 0 0 1 1 F0
> (XEN) 03 000 00 0 0 0 0 0 1 1 40
> (XEN) 04 000 00 0 0 0 0 0 1 1 48
> (XEN) 05 000 00 0 0 0 0 0 1 1 50
> (XEN) 06 000 00 0 0 0 0 0 1 1 58
> (XEN) 07 000 00 0 0 0 0 0 1 1 60
> (XEN) 08 000 00 0 0 0 0 0 1 1 29
> (XEN) 09 000 00 0 1 0 0 0 1 1 A7
> (XEN) 0a 000 00 0 0 0 0 0 1 1 78
> (XEN) 0b 000 00 0 0 0 0 0 1 1 88
> (XEN) 0c 000 00 0 0 0 0 0 1 1 D4
> (XEN) 0d 000 00 1 0 0 0 0 1 1 98
> (XEN) 0e 000 00 0 0 0 0 0 1 1 A0
> (XEN) 0f 000 00 0 0 0 0 0 1 1 A8
> (XEN) 10 000 00 0 1 0 1 0 1 1 AE
> (XEN) 11 000 00 1 1 0 1 0 1 1 C0
> (XEN) 12 000 00 1 1 0 1 0 1 1 C8
> (XEN) 13 000 00 0 1 0 1 0 1 1 F1
> (XEN) 14 000 00 1 1 0 1 0 1 1 61
> (XEN) 15 0CA 0A 1 0 0 0 0 1 2 71

This entry is definitely bogus (delivery mode is SMI, which is not
allowed in an IO-APIC RTE), but as it is masked it _shouldn't_
cause any harm.

> (XEN) 16 000 00 1 1 0 1 0 1 1 32
> (XEN) 17 000 00 0 1 0 1 0 1 1 AC

So we'll need to see the PIC (8259A) masks too. IRQ12 definitely
appears to get touched a lot (judging by the vector it uses), so while
this shouldn't be the case I would nevertheless consider the possibility
of a window where the 8259A interrupt gets temporarily unmasked.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel

Xen devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.