Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Xen: Devel

cpuidle and un-eoid interrupts at the local apic

 

 

First page Previous page 1 2 Next page Last page  View All Xen devel RSS feed   Index | Next | Previous | View Threaded


andrew.cooper3 at citrix

May 31, 2013, 1:32 PM

Post #1 of 27 (63 views)
Permalink
cpuidle and un-eoid interrupts at the local apic

Recently our automated testing system has caught a curious assertion
while testing Xen 4.1.5 on a HaswellDT system.

(XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1030
(XEN) ----[ Xen-4.1.5 x86_64 debug=n Not tainted ]----
(XEN) CPU: 0
(XEN) RIP: e008:[<ffff82c48016b2b4>] do_IRQ+0x514/0x750
(XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor
(XEN) rax: 000000000000002f rbx: ffff830249841e80 rcx: ffff82c4803127c0
(XEN) rdx: 0000000000000004 rsi: 0000000000000027 rdi: 0000000000000001
(XEN) rbp: 0000000000001e00 rsp: ffff82c4802bfd48 r8: ffff82c480312abc
(XEN) r9: ffff8302498a5948 r10: 0000000000000009 r11: ffff8302498c6c80
(XEN) r12: ffff830243b07f50 r13: ffff8300a24f8000 r14: 00000af8373788e3
(XEN) r15: ffff830249841e80 cr0: 000000008005003b cr4: 00000000001026f0
(XEN) cr3: 00000002479e6000 cr2: 00000000e6d3c090
(XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008
(XEN) Xen stack trace from rsp=ffff82c4802bfd48:
(XEN) ffff830249841eb4 ffff82c480312ec0 000000000000001e 0000001e00000000
(XEN) 0000000000000000 00000000498a5670 ffff830249841d80 ffff830249840080
(XEN) ffff830249841db4 0000000000000000 ffff8302498a55e0 ffff8302498a5670
(XEN) ffff8300a24f8000 00000af8373788e3 00000af83736b8ed ffff82c480162ca0
(XEN) 00000af83736b8ed 00000af8373788e3 ffff8300a24f8000 ffff8302498a5670
(XEN) ffff8302498a55e0 0000000000000000 ffff8302498c6c80 0000000000000009
(XEN) ffff8302498a5948 ffff82c480313000 0000000000007f40 0000000000000001
(XEN) 0000000000000000 0000000000000000 00000af80db652fd 0000002700000000
(XEN) ffff82c4801a50a0 000000000000e008 0000000000000246 ffff82c4802bfe78
(XEN) 0000000000000000 ffff8302498a5670 ffff82c4801a6a56 ffffffffffffffff
(XEN) ffff830249818000 0000000000000000 ffff8300a24f8000 ffff82c480122c11
(XEN) 00000af839021119 0000000000000000 0000000000000000 00000000802bff18
(XEN) 0000025c0000013b ffff82c4802e7580 ffff82c4802bff18 ffff8300a2838000
(XEN) ffff82c4802f61a0 ffff8300a24f8000 0000000000000002 00000af837304b45
(XEN) ffff82c48015b67a 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 00000000ee8a3f8c 0000000000000001
(XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
(XEN) 0000000000000000 0000000000000000 00000000ee8a3f74 0000000000000af8
(XEN) 0000000000000001 0000010000000000 00000000c01013a7 0000000000000061
(XEN) 0000000000000246 00000000ee8a3f70 0000000000000069 0000000000000000
(XEN) Xen call trace:
(XEN) [<ffff82c48016b2b4>] do_IRQ+0x514/0x750
(XEN) 15[<ffff82c480162ca0>] common_interrupt+0x20/0x30
(XEN) 32[<ffff82c4801a50a0>] lapic_timer_nop+0x0/0x10
(XEN) 38[<ffff82c4801a6a56>] acpi_processor_idle+0x376/0x740
(XEN) 43[<ffff82c480122c11>] do_block+0x71/0xd0
(XEN) 56[<ffff82c48015b67a>] idle_loop+0x1a/0x50
(XEN)
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 0:
(XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1030
(XEN) ****************************************

And the disassembly before the assertion:

ffff82c48016b29f: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
ffff82c48016b2a6: 00
ffff82c48016b2a7: 0f b6 44 11 ff movzbl -0x1(%rcx,%rdx,1),%eax
ffff82c48016b2ac: 39 c6 cmp %eax,%esi
ffff82c48016b2ae: 0f 8f 5c ff ff ff jg ffff82c48016b210 <do_IRQ+0x470>
ffff82c48016b2b4: 0f 0b ud2


Xen has been woken up by an interrupt of vector 0x27, but has a vector
0x2f on the top of the pending EOI stack for the local APIC.

I have put in more debugging to dump the LAPIC state of the two
interesting vectors and the IOAPIC state, but I have no idea if/when the
problem might reoccur.

My understanding of LAPIC priority leads me to think that Xen really
shouldn't be woken up by a lower priority vector if a higher priority
one is still un-eoi'd. There is not yet sufficient information to tell
whether this is truely the case, or that Xen has simply gotten confused
about which vectors it eoi'd.

Having said that, we do keep line level interrupts un-eoi'd for extended
periods while guests service the interrupt. Given that vectors are
chosen at random, we could get into a situation where a line interrupt
has a vector 0xdf and stays pending for 150ms (which I measured as a
not-overly-uncommon mean-time-till-eoi for line level interrupt). This
would starve any other guest interrupts for an extended period.

Given directed-eoi support in the past few generations of processor, the
requirement for the pending EOI stack has disappeared as far as I am
aware. Would it be sensible idea in general to make use of the pending
eoi stack conditional on not having/using directed EOI support?

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


JBeulich at suse

Jun 3, 2013, 7:30 AM

Post #2 of 27 (54 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

>>> On 31.05.13 at 22:32, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
> Xen has been woken up by an interrupt of vector 0x27, but has a vector
> 0x2f on the top of the pending EOI stack for the local APIC.
>
> I have put in more debugging to dump the LAPIC state of the two
> interesting vectors and the IOAPIC state, but I have no idea if/when the
> problem might reoccur.
>
> My understanding of LAPIC priority leads me to think that Xen really
> shouldn't be woken up by a lower priority vector if a higher priority
> one is still un-eoi'd. There is not yet sufficient information to tell
> whether this is truely the case, or that Xen has simply gotten confused
> about which vectors it eoi'd.

Considering that this was on a Haswell, and got so far not reported
by anyone else, I wonder whether that's related to some effect of
(or flaw in) APIC virtualization. But of course without knowing the
state of the LAPIC, that's hard to tell for sure. The more that a stray
ack_APIC_irq() could lead to the same effect, and that EDX (holding
"sp") has a value of 4 - quite a few lower priority vectors awaiting
an EOI considering that vector group 2x is the lowest possible one
(i.e. the other entries on the stack ought to have even larger
vector numbers).

> Having said that, we do keep line level interrupts un-eoi'd for extended
> periods while guests service the interrupt. Given that vectors are
> chosen at random, we could get into a situation where a line interrupt
> has a vector 0xdf and stays pending for 150ms (which I measured as a
> not-overly-uncommon mean-time-till-eoi for line level interrupt). This
> would starve any other guest interrupts for an extended period.
>
> Given directed-eoi support in the past few generations of processor, the
> requirement for the pending EOI stack has disappeared as far as I am
> aware. Would it be sensible idea in general to make use of the pending
> eoi stack conditional on not having/using directed EOI support?

We don't use ACKTYPE_EOI in that case: setup_IO_APIC() only sets
ioapic_level_type.ack to irq_complete_move (consumed by
pirq_acktype()) when ioapic_ack_new, and directed EOI implies
!ioapic_ack_new (see verify_local_APIC()). The only other case of
using ACKTYPE_EOI is for non-maskable MSIs.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


abc at digithi

Jul 31, 2013, 1:30 AM

Post #3 of 27 (50 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

Hello all,

I have also a Haswell system. I am running XenServer 6.2 (with Xen
4.1.5) on it and I am experiencing the same issue. Do you already have a
solution for this problem ?

Best regards
Thimo

(XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
irq.c:1027^M
(XEN) ----[ Xen-4.1.5.debug x86_64 debug=y Not tainted ]----^M
(XEN) CPU: 1^M
(XEN) RIP: e008:[<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
(XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor^M
(XEN) rax: 0000000000000001 rbx: ffff83081f080f00 rcx:
ffff83081f05b340^M
(XEN) rdx: 0000000000000001 rsi: 000000000000002b rdi:
0000000000000001^M
(XEN) rbp: ffff83081f057d88 rsp: ffff83081f057d18 r8: ffff83081f05b63c^M
(XEN) r9: 000070044fb97100 r10: ffff8300b858c060 r11:
000020f3f5a4dea5^M
(XEN) r12: 000000000000002b r13: ffff83081f004e80 r14:
000000000000001d^M
(XEN) r15: 0000000000000002 cr0: 000000008005003b cr4:
00000000001026f0^M
(XEN) cr3: 000000045915f000 cr2: 0000000000150008^M
(XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008^M
(XEN) Xen stack trace from rsp=ffff83081f057d18:^M
(XEN) 000000000000001d 000000000000001d ffff83081f080f00
0000000000000000^M
(XEN) 00000000ffffffea ffff83081f080f00 0000000000000000
0000000000000000^M
(XEN) ffffffffffffffff ffff83081f057f18 ffff83081f06bb00
ffff83081f06bb90^M
(XEN) ffff8300b858c000 0000000000000002 00007cf7e0fa8247
ffff82c480161a66^M
(XEN) 0000000000000002 ffff8300b858c000 ffff83081f06bb90
ffff83081f06bb00^M
(XEN) ffff83081f057ef0 ffff83081f057f18 000020f3f5a4dea5
ffff8300b858c060^M
(XEN) 000070044fb97100 ffff83081f05bb80 0000000000007f40
0000000000000001^M
(XEN) 0000000000000000 000020f3c755a972 ffff83081f06bb90
0000002b00000000^M
(XEN) ffff82c4801a21f0 000000000000e008 0000000000000246
ffff83081f057e48^M
(XEN) 000000000000e010 ffff83081f057ef0 ffff82c4801a3dc4
000020f3f595c09c^M
(XEN) 000020f3f596987e ffff8306383e3010 ffff83081f05b100
ffffffffffffffff^M
(XEN) 0000000000000001 0000000000000001 ffffffffffffffff
ffff83081f057f18^M
(XEN) 00000000802d4680 0000000000000000 0000000000000000
ffff82c4802d4680^M
(XEN) 000002a80000024b ffff8300b8586000 ffff83081f057f18
ffff8300b8586000^M
(XEN) ffff8300b858c000 ffff8300b858c000 0000000000000002
ffff83081f057f10^M
(XEN) ffff82c48015a261 ffff82c480126ccd 0000000000000001
ffff83081f057d18^M
(XEN) 0000000000000000 0000000000000000 0000000000000000
0000000000000000^M
(XEN) 0000000000000000 0000000000000000 0000000000000246
ffff88001a8093a0^M
(XEN) 0000000100885e0f 000000000000000f 0000000000000000
ffffffff802063aa^M
(XEN) 0000000000000001 00000000deadbeef 00000000deadbeef
0000010000000000^M
(XEN) Xen call trace:^M
(XEN) [<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
(XEN) [<ffff82c480161a66>] common_interrupt+0x26/0x30^M
(XEN) [<ffff82c4801a21f0>] lapic_timer_nop+0x0/0x6^M
(XEN) [<ffff82c48015a261>] idle_loop+0x48/0x59^M
(XEN) ^M
(XEN) ^M
(XEN) ****************************************^M
(XEN) Panic on CPU 1:^M
(XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
irq.c:1027^M
(XEN) ****************************************^M
(XEN) ^M
(XEN) Reboot in five seconds...^M

Am 31.05.2013 22:32, schrieb Andrew Cooper:
> Recently our automated testing system has caught a curious assertion
> while testing Xen 4.1.5 on a HaswellDT system.
>
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1030
> (XEN) ----[ Xen-4.1.5 x86_64 debug=n Not tainted ]----
> (XEN) CPU: 0
> (XEN) RIP: e008:[<ffff82c48016b2b4>] do_IRQ+0x514/0x750
> (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor
> (XEN) rax: 000000000000002f rbx: ffff830249841e80 rcx: ffff82c4803127c0
> (XEN) rdx: 0000000000000004 rsi: 0000000000000027 rdi: 0000000000000001
> (XEN) rbp: 0000000000001e00 rsp: ffff82c4802bfd48 r8: ffff82c480312abc
> (XEN) r9: ffff8302498a5948 r10: 0000000000000009 r11: ffff8302498c6c80
> (XEN) r12: ffff830243b07f50 r13: ffff8300a24f8000 r14: 00000af8373788e3
> (XEN) r15: ffff830249841e80 cr0: 000000008005003b cr4: 00000000001026f0
> (XEN) cr3: 00000002479e6000 cr2: 00000000e6d3c090
> (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008
> (XEN) Xen stack trace from rsp=ffff82c4802bfd48:
> (XEN) ffff830249841eb4 ffff82c480312ec0 000000000000001e 0000001e00000000
> (XEN) 0000000000000000 00000000498a5670 ffff830249841d80 ffff830249840080
> (XEN) ffff830249841db4 0000000000000000 ffff8302498a55e0 ffff8302498a5670
> (XEN) ffff8300a24f8000 00000af8373788e3 00000af83736b8ed ffff82c480162ca0
> (XEN) 00000af83736b8ed 00000af8373788e3 ffff8300a24f8000 ffff8302498a5670
> (XEN) ffff8302498a55e0 0000000000000000 ffff8302498c6c80 0000000000000009
> (XEN) ffff8302498a5948 ffff82c480313000 0000000000007f40 0000000000000001
> (XEN) 0000000000000000 0000000000000000 00000af80db652fd 0000002700000000
> (XEN) ffff82c4801a50a0 000000000000e008 0000000000000246 ffff82c4802bfe78
> (XEN) 0000000000000000 ffff8302498a5670 ffff82c4801a6a56 ffffffffffffffff
> (XEN) ffff830249818000 0000000000000000 ffff8300a24f8000 ffff82c480122c11
> (XEN) 00000af839021119 0000000000000000 0000000000000000 00000000802bff18
> (XEN) 0000025c0000013b ffff82c4802e7580 ffff82c4802bff18 ffff8300a2838000
> (XEN) ffff82c4802f61a0 ffff8300a24f8000 0000000000000002 00000af837304b45
> (XEN) ffff82c48015b67a 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 00000000ee8a3f8c 0000000000000001
> (XEN) 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN) 0000000000000000 0000000000000000 00000000ee8a3f74 0000000000000af8
> (XEN) 0000000000000001 0000010000000000 00000000c01013a7 0000000000000061
> (XEN) 0000000000000246 00000000ee8a3f70 0000000000000069 0000000000000000
> (XEN) Xen call trace:
> (XEN) [<ffff82c48016b2b4>] do_IRQ+0x514/0x750
> (XEN) 15[<ffff82c480162ca0>] common_interrupt+0x20/0x30
> (XEN) 32[<ffff82c4801a50a0>] lapic_timer_nop+0x0/0x10
> (XEN) 38[<ffff82c4801a6a56>] acpi_processor_idle+0x376/0x740
> (XEN) 43[<ffff82c480122c11>] do_block+0x71/0xd0
> (XEN) 56[<ffff82c48015b67a>] idle_loop+0x1a/0x50
> (XEN)
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at irq.c:1030
> (XEN) ****************************************
>
> And the disassembly before the assertion:
>
> ffff82c48016b29f: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
> ffff82c48016b2a6: 00
> ffff82c48016b2a7: 0f b6 44 11 ff movzbl -0x1(%rcx,%rdx,1),%eax
> ffff82c48016b2ac: 39 c6 cmp %eax,%esi
> ffff82c48016b2ae: 0f 8f 5c ff ff ff jg ffff82c48016b210 <do_IRQ+0x470>
> ffff82c48016b2b4: 0f 0b ud2
>
>
> Xen has been woken up by an interrupt of vector 0x27, but has a vector
> 0x2f on the top of the pending EOI stack for the local APIC.
>
> I have put in more debugging to dump the LAPIC state of the two
> interesting vectors and the IOAPIC state, but I have no idea if/when the
> problem might reoccur.
>
> My understanding of LAPIC priority leads me to think that Xen really
> shouldn't be woken up by a lower priority vector if a higher priority
> one is still un-eoi'd. There is not yet sufficient information to tell
> whether this is truely the case, or that Xen has simply gotten confused
> about which vectors it eoi'd.
>
> Having said that, we do keep line level interrupts un-eoi'd for extended
> periods while guests service the interrupt. Given that vectors are
> chosen at random, we could get into a situation where a line interrupt
> has a vector 0xdf and stays pending for 150ms (which I measured as a
> not-overly-uncommon mean-time-till-eoi for line level interrupt). This
> would starve any other guest interrupts for an extended period.
>
> Given directed-eoi support in the past few generations of processor, the
> requirement for the pending EOI stack has disappeared as far as I am
> aware. Would it be sensible idea in general to make use of the pending
> eoi stack conditional on not having/using directed EOI support?
>
> ~Andrew
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xen.org/xen-devel


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


andrew.cooper3 at citrix

Jul 31, 2013, 2:47 AM

Post #4 of 27 (49 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

On 31/07/13 09:30, Thimo E. wrote:
> Hello all,
>
> I have also a Haswell system. I am running XenServer 6.2 (with Xen
> 4.1.5) on it and I am experiencing the same issue. Do you already have
> a solution for this problem ?
>
> Best regards
> Thimo

Hi,

We are still none the wiser on this issue. I have a debugging patch to
get more information, but the problem hasn't reoccurred since. This is
now 2 crashes on Xen 4.1 and a single crash on Xen 4.2 that I have seen.

For the benefit of anyone else who runs over this issue in the meantime,
the patch (against Xen-4.3) is attached.

Thimo: I shall put a new version of the XenServer 6.2 Xen with the
debugging patch on the forum thread.

~Andrew

>
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
> irq.c:1027^M
> (XEN) ----[ Xen-4.1.5.debug x86_64 debug=y Not tainted ]----^M
> (XEN) CPU: 1^M
> (XEN) RIP: e008:[<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
> (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor^M
> (XEN) rax: 0000000000000001 rbx: ffff83081f080f00 rcx:
> ffff83081f05b340^M
> (XEN) rdx: 0000000000000001 rsi: 000000000000002b rdi:
> 0000000000000001^M
> (XEN) rbp: ffff83081f057d88 rsp: ffff83081f057d18 r8:
> ffff83081f05b63c^M
> (XEN) r9: 000070044fb97100 r10: ffff8300b858c060 r11:
> 000020f3f5a4dea5^M
> (XEN) r12: 000000000000002b r13: ffff83081f004e80 r14:
> 000000000000001d^M
> (XEN) r15: 0000000000000002 cr0: 000000008005003b cr4:
> 00000000001026f0^M
> (XEN) cr3: 000000045915f000 cr2: 0000000000150008^M
> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008^M
> (XEN) Xen stack trace from rsp=ffff83081f057d18:^M
> (XEN) 000000000000001d 000000000000001d ffff83081f080f00
> 0000000000000000^M
> (XEN) 00000000ffffffea ffff83081f080f00 0000000000000000
> 0000000000000000^M
> (XEN) ffffffffffffffff ffff83081f057f18 ffff83081f06bb00
> ffff83081f06bb90^M
> (XEN) ffff8300b858c000 0000000000000002 00007cf7e0fa8247
> ffff82c480161a66^M
> (XEN) 0000000000000002 ffff8300b858c000 ffff83081f06bb90
> ffff83081f06bb00^M
> (XEN) ffff83081f057ef0 ffff83081f057f18 000020f3f5a4dea5
> ffff8300b858c060^M
> (XEN) 000070044fb97100 ffff83081f05bb80 0000000000007f40
> 0000000000000001^M
> (XEN) 0000000000000000 000020f3c755a972 ffff83081f06bb90
> 0000002b00000000^M
> (XEN) ffff82c4801a21f0 000000000000e008 0000000000000246
> ffff83081f057e48^M
> (XEN) 000000000000e010 ffff83081f057ef0 ffff82c4801a3dc4
> 000020f3f595c09c^M
> (XEN) 000020f3f596987e ffff8306383e3010 ffff83081f05b100
> ffffffffffffffff^M
> (XEN) 0000000000000001 0000000000000001 ffffffffffffffff
> ffff83081f057f18^M
> (XEN) 00000000802d4680 0000000000000000 0000000000000000
> ffff82c4802d4680^M
> (XEN) 000002a80000024b ffff8300b8586000 ffff83081f057f18
> ffff8300b8586000^M
> (XEN) ffff8300b858c000 ffff8300b858c000 0000000000000002
> ffff83081f057f10^M
> (XEN) ffff82c48015a261 ffff82c480126ccd 0000000000000001
> ffff83081f057d18^M
> (XEN) 0000000000000000 0000000000000000 0000000000000000
> 0000000000000000^M
> (XEN) 0000000000000000 0000000000000000 0000000000000246
> ffff88001a8093a0^M
> (XEN) 0000000100885e0f 000000000000000f 0000000000000000
> ffffffff802063aa^M
> (XEN) 0000000000000001 00000000deadbeef 00000000deadbeef
> 0000010000000000^M
> (XEN) Xen call trace:^M
> (XEN) [<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
> (XEN) [<ffff82c480161a66>] common_interrupt+0x26/0x30^M
> (XEN) [<ffff82c4801a21f0>] lapic_timer_nop+0x0/0x6^M
> (XEN) [<ffff82c48015a261>] idle_loop+0x48/0x59^M
> (XEN) ^M
> (XEN) ^M
> (XEN) ****************************************^M
> (XEN) Panic on CPU 1:^M
> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
> irq.c:1027^M
> (XEN) ****************************************^M
> (XEN) ^M
> (XEN) Reboot in five seconds...^M
>
> Am 31.05.2013 22:32, schrieb Andrew Cooper:
>> Recently our automated testing system has caught a curious assertion
>> while testing Xen 4.1.5 on a HaswellDT system.
>>
>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>> irq.c:1030
>> (XEN) ----[ Xen-4.1.5 x86_64 debug=n Not tainted ]----
>> (XEN) CPU: 0
>> (XEN) RIP: e008:[<ffff82c48016b2b4>] do_IRQ+0x514/0x750
>> (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor
>> (XEN) rax: 000000000000002f rbx: ffff830249841e80 rcx:
>> ffff82c4803127c0
>> (XEN) rdx: 0000000000000004 rsi: 0000000000000027 rdi:
>> 0000000000000001
>> (XEN) rbp: 0000000000001e00 rsp: ffff82c4802bfd48 r8:
>> ffff82c480312abc
>> (XEN) r9: ffff8302498a5948 r10: 0000000000000009 r11:
>> ffff8302498c6c80
>> (XEN) r12: ffff830243b07f50 r13: ffff8300a24f8000 r14:
>> 00000af8373788e3
>> (XEN) r15: ffff830249841e80 cr0: 000000008005003b cr4:
>> 00000000001026f0
>> (XEN) cr3: 00000002479e6000 cr2: 00000000e6d3c090
>> (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008
>> (XEN) Xen stack trace from rsp=ffff82c4802bfd48:
>> (XEN) ffff830249841eb4 ffff82c480312ec0 000000000000001e
>> 0000001e00000000
>> (XEN) 0000000000000000 00000000498a5670 ffff830249841d80
>> ffff830249840080
>> (XEN) ffff830249841db4 0000000000000000 ffff8302498a55e0
>> ffff8302498a5670
>> (XEN) ffff8300a24f8000 00000af8373788e3 00000af83736b8ed
>> ffff82c480162ca0
>> (XEN) 00000af83736b8ed 00000af8373788e3 ffff8300a24f8000
>> ffff8302498a5670
>> (XEN) ffff8302498a55e0 0000000000000000 ffff8302498c6c80
>> 0000000000000009
>> (XEN) ffff8302498a5948 ffff82c480313000 0000000000007f40
>> 0000000000000001
>> (XEN) 0000000000000000 0000000000000000 00000af80db652fd
>> 0000002700000000
>> (XEN) ffff82c4801a50a0 000000000000e008 0000000000000246
>> ffff82c4802bfe78
>> (XEN) 0000000000000000 ffff8302498a5670 ffff82c4801a6a56
>> ffffffffffffffff
>> (XEN) ffff830249818000 0000000000000000 ffff8300a24f8000
>> ffff82c480122c11
>> (XEN) 00000af839021119 0000000000000000 0000000000000000
>> 00000000802bff18
>> (XEN) 0000025c0000013b ffff82c4802e7580 ffff82c4802bff18
>> ffff8300a2838000
>> (XEN) ffff82c4802f61a0 ffff8300a24f8000 0000000000000002
>> 00000af837304b45
>> (XEN) ffff82c48015b67a 0000000000000000 0000000000000000
>> 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 00000000ee8a3f8c
>> 0000000000000001
>> (XEN) 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000
>> (XEN) 0000000000000000 0000000000000000 00000000ee8a3f74
>> 0000000000000af8
>> (XEN) 0000000000000001 0000010000000000 00000000c01013a7
>> 0000000000000061
>> (XEN) 0000000000000246 00000000ee8a3f70 0000000000000069
>> 0000000000000000
>> (XEN) Xen call trace:
>> (XEN) [<ffff82c48016b2b4>] do_IRQ+0x514/0x750
>> (XEN) 15[<ffff82c480162ca0>] common_interrupt+0x20/0x30
>> (XEN) 32[<ffff82c4801a50a0>] lapic_timer_nop+0x0/0x10
>> (XEN) 38[<ffff82c4801a6a56>] acpi_processor_idle+0x376/0x740
>> (XEN) 43[<ffff82c480122c11>] do_block+0x71/0xd0
>> (XEN) 56[<ffff82c48015b67a>] idle_loop+0x1a/0x50
>> (XEN)
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 0:
>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>> irq.c:1030
>> (XEN) ****************************************
>>
>> And the disassembly before the assertion:
>>
>> ffff82c48016b29f: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
>> ffff82c48016b2a6: 00
>> ffff82c48016b2a7: 0f b6 44 11 ff movzbl
>> -0x1(%rcx,%rdx,1),%eax
>> ffff82c48016b2ac: 39 c6 cmp %eax,%esi
>> ffff82c48016b2ae: 0f 8f 5c ff ff ff jg
>> ffff82c48016b210 <do_IRQ+0x470>
>> ffff82c48016b2b4: 0f 0b ud2
>>
>>
>> Xen has been woken up by an interrupt of vector 0x27, but has a vector
>> 0x2f on the top of the pending EOI stack for the local APIC.
>>
>> I have put in more debugging to dump the LAPIC state of the two
>> interesting vectors and the IOAPIC state, but I have no idea if/when the
>> problem might reoccur.
>>
>> My understanding of LAPIC priority leads me to think that Xen really
>> shouldn't be woken up by a lower priority vector if a higher priority
>> one is still un-eoi'd. There is not yet sufficient information to tell
>> whether this is truely the case, or that Xen has simply gotten confused
>> about which vectors it eoi'd.
>>
>> Having said that, we do keep line level interrupts un-eoi'd for extended
>> periods while guests service the interrupt. Given that vectors are
>> chosen at random, we could get into a situation where a line interrupt
>> has a vector 0xdf and stays pending for 150ms (which I measured as a
>> not-overly-uncommon mean-time-till-eoi for line level interrupt). This
>> would starve any other guest interrupts for an extended period.
>>
>> Given directed-eoi support in the past few generations of processor, the
>> requirement for the pending EOI stack has disappeared as far as I am
>> aware. Would it be sensible idea in general to make use of the pending
>> eoi stack conditional on not having/using directed EOI support?
>>
>> ~Andrew
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel [at] lists
>> http://lists.xen.org/xen-devel
>
Attachments: ca-107844-debug.patch (2.85 KB)


abc at digithi

Aug 2, 2013, 3:50 PM

Post #5 of 27 (36 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

Hi,

I've postet it already in the forum thread, but to keep all of you up to
date for this issue I am copying the logfile into this thread, too:

XenServer crash again, attached you'll find the output with the verbose
messages Andrew inserted into the code.

Best regards
Thimo


Am 31.07.2013 11:47, schrieb Andrew Cooper:
> On 31/07/13 09:30, Thimo E. wrote:
>> Hello all,
>>
>> I have also a Haswell system. I am running XenServer 6.2 (with Xen
>> 4.1.5) on it and I am experiencing the same issue. Do you already have
>> a solution for this problem ?
>>
>> Best regards
>> Thimo
> Hi,
>
> We are still none the wiser on this issue. I have a debugging patch to
> get more information, but the problem hasn't reoccurred since. This is
> now 2 crashes on Xen 4.1 and a single crash on Xen 4.2 that I have seen.
>
> For the benefit of anyone else who runs over this issue in the meantime,
> the patch (against Xen-4.3) is attached.
>
> Thimo: I shall put a new version of the XenServer 6.2 Xen with the
> debugging patch on the forum thread.
>
> ~Andrew
>
>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>> irq.c:1027^M
>> (XEN) ----[ Xen-4.1.5.debug x86_64 debug=y Not tainted ]----^M
>> (XEN) CPU: 1^M
>> (XEN) RIP: e008:[<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
>> (XEN) RFLAGS: 0000000000010002 CONTEXT: hypervisor^M
>> (XEN) rax: 0000000000000001 rbx: ffff83081f080f00 rcx:
>> ffff83081f05b340^M
>> (XEN) rdx: 0000000000000001 rsi: 000000000000002b rdi:
>> 0000000000000001^M
>> (XEN) rbp: ffff83081f057d88 rsp: ffff83081f057d18 r8:
>> ffff83081f05b63c^M
>> (XEN) r9: 000070044fb97100 r10: ffff8300b858c060 r11:
>> 000020f3f5a4dea5^M
>> (XEN) r12: 000000000000002b r13: ffff83081f004e80 r14:
>> 000000000000001d^M
>> (XEN) r15: 0000000000000002 cr0: 000000008005003b cr4:
>> 00000000001026f0^M
>> (XEN) cr3: 000000045915f000 cr2: 0000000000150008^M
>> (XEN) ds: 0000 es: 0000 fs: 0000 gs: 0000 ss: e010 cs: e008^M
>> (XEN) Xen stack trace from rsp=ffff83081f057d18:^M
>> (XEN) 000000000000001d 000000000000001d ffff83081f080f00
>> 0000000000000000^M
>> (XEN) 00000000ffffffea ffff83081f080f00 0000000000000000
>> 0000000000000000^M
>> (XEN) ffffffffffffffff ffff83081f057f18 ffff83081f06bb00
>> ffff83081f06bb90^M
>> (XEN) ffff8300b858c000 0000000000000002 00007cf7e0fa8247
>> ffff82c480161a66^M
>> (XEN) 0000000000000002 ffff8300b858c000 ffff83081f06bb90
>> ffff83081f06bb00^M
>> (XEN) ffff83081f057ef0 ffff83081f057f18 000020f3f5a4dea5
>> ffff8300b858c060^M
>> (XEN) 000070044fb97100 ffff83081f05bb80 0000000000007f40
>> 0000000000000001^M
>> (XEN) 0000000000000000 000020f3c755a972 ffff83081f06bb90
>> 0000002b00000000^M
>> (XEN) ffff82c4801a21f0 000000000000e008 0000000000000246
>> ffff83081f057e48^M
>> (XEN) 000000000000e010 ffff83081f057ef0 ffff82c4801a3dc4
>> 000020f3f595c09c^M
>> (XEN) 000020f3f596987e ffff8306383e3010 ffff83081f05b100
>> ffffffffffffffff^M
>> (XEN) 0000000000000001 0000000000000001 ffffffffffffffff
>> ffff83081f057f18^M
>> (XEN) 00000000802d4680 0000000000000000 0000000000000000
>> ffff82c4802d4680^M
>> (XEN) 000002a80000024b ffff8300b8586000 ffff83081f057f18
>> ffff8300b8586000^M
>> (XEN) ffff8300b858c000 ffff8300b858c000 0000000000000002
>> ffff83081f057f10^M
>> (XEN) ffff82c48015a261 ffff82c480126ccd 0000000000000001
>> ffff83081f057d18^M
>> (XEN) 0000000000000000 0000000000000000 0000000000000000
>> 0000000000000000^M
>> (XEN) 0000000000000000 0000000000000000 0000000000000246
>> ffff88001a8093a0^M
>> (XEN) 0000000100885e0f 000000000000000f 0000000000000000
>> ffffffff802063aa^M
>> (XEN) 0000000000000001 00000000deadbeef 00000000deadbeef
>> 0000010000000000^M
>> (XEN) Xen call trace:^M
>> (XEN) [<ffff82c480169662>] do_IRQ+0x3ba/0x6d9^M
>> (XEN) [<ffff82c480161a66>] common_interrupt+0x26/0x30^M
>> (XEN) [<ffff82c4801a21f0>] lapic_timer_nop+0x0/0x6^M
>> (XEN) [<ffff82c48015a261>] idle_loop+0x48/0x59^M
>> (XEN) ^M
>> (XEN) ^M
>> (XEN) ****************************************^M
>> (XEN) Panic on CPU 1:^M
>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>> irq.c:1027^M
>> (XEN) ****************************************^M
>> (XEN) ^M
>> (XEN) Reboot in five seconds...^M
>>
>> Am 31.05.2013 22:32, schrieb Andrew Cooper:
>>> Recently our automated testing system has caught a curious assertion
>>> while testing Xen 4.1.5 on a HaswellDT system.
>>>
>>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>>> irq.c:1030
>>> (XEN) ----[ Xen-4.1.5 x86_64 debug=n Not tainted ]----
>>> (XEN) CPU: 0
>>> (XEN) RIP: e008:[<ffff82c48016b2b4>] do_IRQ+0x514/0x750
>>> (XEN) RFLAGS: 0000000000010093 CONTEXT: hypervisor
>>> (XEN) rax: 000000000000002f rbx: ffff830249841e80 rcx:
>>> ffff82c4803127c0
>>> (XEN) rdx: 0000000000000004 rsi: 0000000000000027 rdi:
>>> 0000000000000001
>>> (XEN) rbp: 0000000000001e00 rsp: ffff82c4802bfd48 r8:
>>> ffff82c480312abc
>>> (XEN) r9: ffff8302498a5948 r10: 0000000000000009 r11:
>>> ffff8302498c6c80
>>> (XEN) r12: ffff830243b07f50 r13: ffff8300a24f8000 r14:
>>> 00000af8373788e3
>>> (XEN) r15: ffff830249841e80 cr0: 000000008005003b cr4:
>>> 00000000001026f0
>>> (XEN) cr3: 00000002479e6000 cr2: 00000000e6d3c090
>>> (XEN) ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0000 cs: e008
>>> (XEN) Xen stack trace from rsp=ffff82c4802bfd48:
>>> (XEN) ffff830249841eb4 ffff82c480312ec0 000000000000001e
>>> 0000001e00000000
>>> (XEN) 0000000000000000 00000000498a5670 ffff830249841d80
>>> ffff830249840080
>>> (XEN) ffff830249841db4 0000000000000000 ffff8302498a55e0
>>> ffff8302498a5670
>>> (XEN) ffff8300a24f8000 00000af8373788e3 00000af83736b8ed
>>> ffff82c480162ca0
>>> (XEN) 00000af83736b8ed 00000af8373788e3 ffff8300a24f8000
>>> ffff8302498a5670
>>> (XEN) ffff8302498a55e0 0000000000000000 ffff8302498c6c80
>>> 0000000000000009
>>> (XEN) ffff8302498a5948 ffff82c480313000 0000000000007f40
>>> 0000000000000001
>>> (XEN) 0000000000000000 0000000000000000 00000af80db652fd
>>> 0000002700000000
>>> (XEN) ffff82c4801a50a0 000000000000e008 0000000000000246
>>> ffff82c4802bfe78
>>> (XEN) 0000000000000000 ffff8302498a5670 ffff82c4801a6a56
>>> ffffffffffffffff
>>> (XEN) ffff830249818000 0000000000000000 ffff8300a24f8000
>>> ffff82c480122c11
>>> (XEN) 00000af839021119 0000000000000000 0000000000000000
>>> 00000000802bff18
>>> (XEN) 0000025c0000013b ffff82c4802e7580 ffff82c4802bff18
>>> ffff8300a2838000
>>> (XEN) ffff82c4802f61a0 ffff8300a24f8000 0000000000000002
>>> 00000af837304b45
>>> (XEN) ffff82c48015b67a 0000000000000000 0000000000000000
>>> 0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 00000000ee8a3f8c
>>> 0000000000000001
>>> (XEN) 0000000000000000 0000000000000000 0000000000000000
>>> 0000000000000000
>>> (XEN) 0000000000000000 0000000000000000 00000000ee8a3f74
>>> 0000000000000af8
>>> (XEN) 0000000000000001 0000010000000000 00000000c01013a7
>>> 0000000000000061
>>> (XEN) 0000000000000246 00000000ee8a3f70 0000000000000069
>>> 0000000000000000
>>> (XEN) Xen call trace:
>>> (XEN) [<ffff82c48016b2b4>] do_IRQ+0x514/0x750
>>> (XEN) 15[<ffff82c480162ca0>] common_interrupt+0x20/0x30
>>> (XEN) 32[<ffff82c4801a50a0>] lapic_timer_nop+0x0/0x10
>>> (XEN) 38[<ffff82c4801a6a56>] acpi_processor_idle+0x376/0x740
>>> (XEN) 43[<ffff82c480122c11>] do_block+0x71/0xd0
>>> (XEN) 56[<ffff82c48015b67a>] idle_loop+0x1a/0x50
>>> (XEN)
>>> (XEN)
>>> (XEN) ****************************************
>>> (XEN) Panic on CPU 0:
>>> (XEN) Assertion '(sp == 0) || (peoi[sp-1].vector < vector)' failed at
>>> irq.c:1030
>>> (XEN) ****************************************
>>>
>>> And the disassembly before the assertion:
>>>
>>> ffff82c48016b29f: 48 8d 14 85 00 00 00 lea 0x0(,%rax,4),%rdx
>>> ffff82c48016b2a6: 00
>>> ffff82c48016b2a7: 0f b6 44 11 ff movzbl
>>> -0x1(%rcx,%rdx,1),%eax
>>> ffff82c48016b2ac: 39 c6 cmp %eax,%esi
>>> ffff82c48016b2ae: 0f 8f 5c ff ff ff jg
>>> ffff82c48016b210 <do_IRQ+0x470>
>>> ffff82c48016b2b4: 0f 0b ud2
>>>
>>>
>>> Xen has been woken up by an interrupt of vector 0x27, but has a vector
>>> 0x2f on the top of the pending EOI stack for the local APIC.
>>>
>>> I have put in more debugging to dump the LAPIC state of the two
>>> interesting vectors and the IOAPIC state, but I have no idea if/when the
>>> problem might reoccur.
>>>
>>> My understanding of LAPIC priority leads me to think that Xen really
>>> shouldn't be woken up by a lower priority vector if a higher priority
>>> one is still un-eoi'd. There is not yet sufficient information to tell
>>> whether this is truely the case, or that Xen has simply gotten confused
>>> about which vectors it eoi'd.
>>>
>>> Having said that, we do keep line level interrupts un-eoi'd for extended
>>> periods while guests service the interrupt. Given that vectors are
>>> chosen at random, we could get into a situation where a line interrupt
>>> has a vector 0xdf and stays pending for 150ms (which I measured as a
>>> not-overly-uncommon mean-time-till-eoi for line level interrupt). This
>>> would starve any other guest interrupts for an extended period.
>>>
>>> Given directed-eoi support in the past few generations of processor, the
>>> requirement for the pending EOI stack has disappeared as far as I am
>>> aware. Would it be sensible idea in general to make use of the pending
>>> eoi stack conditional on not having/using directed EOI support?
>>>
>>> ~Andrew
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel [at] lists
>>> http://lists.xen.org/xen-devel
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xen.org/xen-devel
Attachments: 20130803-crash.log (9.30 KB)


andrew.cooper3 at citrix

Aug 2, 2013, 4:32 PM

Post #6 of 27 (36 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

On 02/08/2013 23:50, Thimo E. wrote:
> Hi,
>
> I've postet it already in the forum thread, but to keep all of you up
> to date for this issue I am copying the logfile into this thread, too:
>
> XenServer crash again, attached you'll find the output with the
> verbose messages Andrew inserted into the code.
>
> Best regards
> Thimo

So I can see that I did screw up the debugging patch a tad, but the
information is still salvageable.

Adjusted from my "interesting" idea of printk formatting,

(XEN) **Pending EOI error
(XEN) irq 29, vector 0x2e
(XEN) s[0] irq 29, vec 0x2e, ready 0, ISR 1, TMR 0, IRR 0
(XEN) All LAPIC state:
(XEN) [vector] ISR TMR IRR
(XEN) [1f:01] 00000000 00000000 00000000
(XEN) [3f:20] 00016384 4095716568 00000000
(XEN) [5f:40] 00000000 4041382474 00000000
(XEN) [7f:60] 00000000 3967325758 00000000
(XEN) [9f:80] 00000000 2123395250 00000000
(XEN) [bf:a0] 00000000 1502837374 00000000
(XEN) [df:c0] 00000000 4270415335 00000000
(XEN) [ff:e0] 00000000 00000000 00000000

So Xen has been interrupted by an interrupt which it believes it has
already seen, and is outstanding on the PendingEOI stack, waiting for
Dom0 to actually deal with.

The In Service Register indicates (given the hex/dec snafu) that only
vector 0x2e is in service.

I will update my debugging patch with some extra information tomorrow.

~Andrew


JBeulich at suse

Aug 5, 2013, 5:45 AM

Post #7 of 27 (34 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

>>> On 03.08.13 at 01:32, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
> Adjusted from my "interesting" idea of printk formatting,
>
> (XEN) **Pending EOI error
> (XEN) irq 29, vector 0x2e
> (XEN) s[0] irq 29, vec 0x2e, ready 0, ISR 1, TMR 0, IRR 0
> (XEN) All LAPIC state:
> (XEN) [vector] ISR TMR IRR
> (XEN) [1f:01] 00000000 00000000 00000000
> (XEN) [3f:20] 00016384 4095716568 00000000
> (XEN) [5f:40] 00000000 4041382474 00000000
> (XEN) [7f:60] 00000000 3967325758 00000000
> (XEN) [9f:80] 00000000 2123395250 00000000
> (XEN) [bf:a0] 00000000 1502837374 00000000
> (XEN) [df:c0] 00000000 4270415335 00000000
> (XEN) [ff:e0] 00000000 00000000 00000000
>
> So Xen has been interrupted by an interrupt which it believes it has
> already seen, and is outstanding on the PendingEOI stack, waiting for
> Dom0 to actually deal with.

And which hence should be masked. Is this perhaps a non-maskable
MSI, and the device (erroneously?) issues a new interrupts before
the old one was really finished with?

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


andrew.cooper3 at citrix

Aug 5, 2013, 7:51 AM

Post #8 of 27 (34 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

On 05/08/13 13:45, Jan Beulich wrote:
>>>> On 03.08.13 at 01:32, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
>> Adjusted from my "interesting" idea of printk formatting,
>>
>> (XEN) **Pending EOI error
>> (XEN) irq 29, vector 0x2e
>> (XEN) s[0] irq 29, vec 0x2e, ready 0, ISR 1, TMR 0, IRR 0
>> (XEN) All LAPIC state:
>> (XEN) [vector] ISR TMR IRR
>> (XEN) [1f:01] 00000000 00000000 00000000
>> (XEN) [3f:20] 00016384 4095716568 00000000
>> (XEN) [5f:40] 00000000 4041382474 00000000
>> (XEN) [7f:60] 00000000 3967325758 00000000
>> (XEN) [9f:80] 00000000 2123395250 00000000
>> (XEN) [bf:a0] 00000000 1502837374 00000000
>> (XEN) [df:c0] 00000000 4270415335 00000000
>> (XEN) [ff:e0] 00000000 00000000 00000000
>>
>> So Xen has been interrupted by an interrupt which it believes it has
>> already seen, and is outstanding on the PendingEOI stack, waiting for
>> Dom0 to actually deal with.
> And which hence should be masked. Is this perhaps a non-maskable
> MSI, and the device (erroneously?) issues a new interrupts before
> the old one was really finished with?
>
> Jan
>

All of these crashes are coming out of mwait_idle, so the cpu in
question has literally just been in an lower power state.

I am wondering whether there is some caching issue where an update to
the Pending EOI stack pointer got "lost", but this seems like a little
too specific to be reasonably explained as a caching issue.

A new debugging patch is on its way (Sorry - it has been a very busy few
days)

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


abc at digithi

Aug 9, 2013, 2:27 PM

Post #9 of 27 (31 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

Next crash occured, debugging output included.

One Remark: Over the last days (besides many linux PV guests) 1 Windows
Guest (with PV drivers) was running, today I've started another Windows
guest and during 3 hours two crashed occured, coincidence ?

Best regards
Thimo

(XEN) **Pending EOI error
(XEN) irq 29, vector 0x24
(XEN) s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR
00000000
(XEN) All LAPIC state:
(XEN) [vector] ISR TMR IRR
(XEN) [1f:00] 00000000 00000000 00000000
(XEN) [3f:20] 00000010 76efa12e 00000000
(XEN) [5f:40] 00000000 e6f0f2fc 00000000
(XEN) [7f:60] 00000000 32d096ca 00000000
(XEN) [9f:80] 00000000 78fcf87a 00000000
(XEN) [bf:a0] 00000000 f9b9fe4e 00000000
(XEN) [df:c0] 00000000 ffdfe7ab 00000000
(XEN) [ff:e0] 00000000 00000000 00000000
(XEN) Peoi stack trace records:
(XEN) Pushed {sp 0, irq 29, vec 0x24}
(XEN) Poped entry {sp 1, irq 29, vec 0x24}
(XEN) Marked {sp 0, irq 29, vec 0x24} ready
(XEN) Pushed {sp 0, irq 29, vec 0x24}
(XEN) Poped entry {sp 1, irq 29, vec 0x24}
(XEN) Marked {sp 0, irq 29, vec 0x24} ready
(XEN) Pushed {sp 0, irq 29, vec 0x24}
(XEN) Poped entry {sp 1, irq 29, vec 0x24}
(XEN) Marked {sp 0, irq 29, vec 0x24} ready
(XEN) Pushed {sp 0, irq 29, vec 0x24}
(XEN) Poped entry {sp 1, irq 29, vec 0x24}
(XEN) Marked {sp 0, irq 29, vec 0x24} ready
(XEN) Pushed {sp 0, irq 29, vec 0x24}
(XEN) Poped entry {sp 1, irq 29, vec 0x24}
(XEN) Marked {sp 0, irq 29, vec 0x24} ready
(XEN) Pushed {sp 0, irq 29, vec 0x24}
(XEN) Poped entry {sp 1, irq 29, vec 0x24}
(XEN) Marked {sp 0, irq 29, vec 0x24} ready
(XEN) Pushed {sp 0, irq 29, vec 0x24}
(XEN) Poped entry {sp 1, irq 29, vec 0x24}
(XEN) Marked {sp 0, irq 29, vec 0x24} ready
(XEN) Pushed {sp 0, irq 29, vec 0x24}
(XEN) Poped entry {sp 1, irq 29, vec 0x24}
(XEN) Marked {sp 0, irq 29, vec 0x24} ready
(XEN) Pushed {sp 0, irq 29, vec 0x24}
(XEN) Poped entry {sp 1, irq 29, vec 0x24}
(XEN) Marked {sp 0, irq 29, vec 0x24} ready
(XEN) Pushed {sp 0, irq 29, vec 0x24}
(XEN) Poped entry {sp 1, irq 29, vec 0x24}
(XEN) Marked {sp 0, irq 29, vec 0x24} ready
(XEN) Pushed {sp 0, irq 29, vec 0x24}
(XEN) Poped entry {sp 1, irq 29, vec 0x24}
(XEN) Guest interrupt information:
(XEN) IRQ: 0 affinity:1 vec:f0 type=IO-APIC-edge status=00000000
mapped, unbound
(XEN) IRQ: 1 affinity:1 vec:38 type=IO-APIC-edge status=00000050
in-flight=0 domain-list=0: 1(----),
(XEN) IRQ: 2 affinity:f vec:00 type=XT-PIC status=00000000 mapped,
unbound
(XEN) IRQ: 3 affinity:1 vec:40 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 4 affinity:1 vec:48 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 5 affinity:1 vec:50 type=IO-APIC-edge status=00000050
in-flight=0 domain-list=0: 5(----),
(XEN) IRQ: 6 affinity:1 vec:58 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 7 affinity:1 vec:60 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 8 affinity:1 vec:68 type=IO-APIC-edge status=00000050
in-flight=0 domain-list=0: 8(----),
(XEN) IRQ: 9 affinity:1 vec:70 type=IO-APIC-level status=00000050
in-flight=0 domain-list=0: 9(----),
(XEN) IRQ: 10 affinity:1 vec:78 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 11 affinity:1 vec:88 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 12 affinity:1 vec:90 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 13 affinity:1 vec:98 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 14 affinity:1 vec:a0 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 15 affinity:1 vec:a8 type=IO-APIC-edge status=00000002
mapped, unbound
(XEN) IRQ: 16 affinity:1 vec:db type=IO-APIC-level status=00000010
in-flight=0 domain-list=0: 16(----),
(XEN) IRQ: 18 affinity:1 vec:2c type=IO-APIC-level status=00000010
in-flight=0 domain-list=0: 18(----),
(XEN) IRQ: 19 affinity:1 vec:51 type=IO-APIC-level status=00000002
mapped, unbound
(XEN) IRQ: 20 affinity:1 vec:29 type=IO-APIC-level status=00000002
mapped, unbound
(XEN) IRQ: 22 affinity:1 vec:bb type=IO-APIC-level status=00000050
in-flight=0 domain-list=0: 22(----),
(XEN) IRQ: 23 affinity:8 vec:c2 type=IO-APIC-level status=00000050
in-flight=0 domain-list=0: 23(----),
(XEN) IRQ: 24 affinity:1 vec:28 type=DMA_MSI status=00000000 mapped,
unbound
(XEN) IRQ: 25 affinity:1 vec:30 type=DMA_MSI status=00000000 mapped,
unbound
(XEN) IRQ: 26 affinity:f vec:c0 type=PCI-MSI status=00000002 mapped,
unbound
(XEN) IRQ: 27 affinity:f vec:c8 type=PCI-MSI status=00000002 mapped,
unbound
(XEN) IRQ: 28 affinity:f vec:d0 type=PCI-MSI status=00000002 mapped,
unbound
(XEN) IRQ: 29 affinity:2 vec:24 type=PCI-MSI status=00000010
in-flight=0 domain-list=0:276(----),
(XEN) IRQ: 30 affinity:4 vec:93 type=PCI-MSI status=00000050
in-flight=0 domain-list=0:275(----),
(XEN) IRQ: 31 affinity:2 vec:4a type=PCI-MSI status=00000050
in-flight=0 domain-list=0:274(----),
(XEN) IRQ: 32 affinity:2 vec:73 type=PCI-MSI status=00000050
in-flight=0 domain-list=0:273(----),
(XEN) IRQ: 33 affinity:1 vec:49 type=PCI-MSI status=00000050
in-flight=0 domain-list=0:272(----),
(XEN) IRQ: 34 affinity:8 vec:5f type=PCI-MSI status=00000050
in-flight=0 domain-list=0:271(----),
(XEN) IO-APIC interrupt information:
(XEN) IRQ 0 Vec240:
(XEN) Apic 0x00, Pin 2: vec=f0 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 1 Vec 56:
(XEN) Apic 0x00, Pin 1: vec=38 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 3 Vec 64:
(XEN) Apic 0x00, Pin 3: vec=40 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 4 Vec 72:
(XEN) Apic 0x00, Pin 4: vec=48 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 5 Vec 80:
(XEN) Apic 0x00, Pin 5: vec=50 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 6 Vec 88:
(XEN) Apic 0x00, Pin 6: vec=58 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 7 Vec 96:
(XEN) Apic 0x00, Pin 7: vec=60 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 8 Vec104:
(XEN) Apic 0x00, Pin 8: vec=68 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 9 Vec112:
(XEN) Apic 0x00, Pin 9: vec=70 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=L mask=0 dest_id:0
(XEN) IRQ 10 Vec120:
(XEN) Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 11 Vec136:
(XEN) Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 12 Vec144:
(XEN) Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 13 Vec152:
(XEN) Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 14 Vec160:
(XEN) Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 15 Vec168:
(XEN) Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0
polarity=0 irr=0 trig=E mask=0 dest_id:0
(XEN) IRQ 16 Vec219:
(XEN) Apic 0x00, Pin 16: vec=db delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN) IRQ 18 Vec 44:
(XEN) Apic 0x00, Pin 18: vec=2c delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN) IRQ 19 Vec 81:
(XEN) Apic 0x00, Pin 19: vec=51 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN) IRQ 20 Vec 41:
(XEN) Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=1 dest_id:0
(XEN) IRQ 22 Vec187:
(XEN) Apic 0x00, Pin 22: vec=bb delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN) IRQ 23 Vec194:
(XEN) Apic 0x00, Pin 23: vec=c2 delivery=LoPri dest=L status=0
polarity=1 irr=0 trig=L mask=0 dest_id:0
(XEN) number of MP IRQ sources: 15.
(XEN) number of IO-APIC #2 registers: 24.
(XEN) testing the IO APIC.......................
(XEN) IO APIC #2......
(XEN) .... register #00: 02000000
(XEN) ....... : physical APIC id: 02
(XEN) ....... : Delivery Type: 0
(XEN) ....... : LTS : 0
(XEN) .... register #01: 00170020
(XEN) ....... : max redirection entries: 0017
(XEN) ....... : PRQ implemented: 0
(XEN) ....... : IO APIC version: 0020
(XEN) .... IRQ redirection table:
(XEN) NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
(XEN) 00 000 00 1 0 0 0 0 0 0 00
(XEN) 01 000 00 0 0 0 0 0 1 1 38
(XEN) 02 000 00 0 0 0 0 0 1 1 F0
(XEN) 03 000 00 0 0 0 0 0 1 1 40
(XEN) 04 000 00 0 0 0 0 0 1 1 48
(XEN) 05 000 00 0 0 0 0 0 1 1 50
(XEN) 06 000 00 0 0 0 0 0 1 1 58
(XEN) 07 000 00 0 0 0 0 0 1 1 60
(XEN) 08 000 00 0 0 0 0 0 1 1 68
(XEN) 09 000 00 0 1 0 0 0 1 1 70
(XEN) 0a 000 00 0 0 0 0 0 1 1 78
(XEN) 0b 000 00 0 0 0 0 0 1 1 88
(XEN) 0c 000 00 0 0 0 0 0 1 1 90
(XEN) 0d 000 00 0 0 0 0 0 1 1 98
(XEN) 0e 000 00 0 0 0 0 0 1 1 A0
(XEN) 0f 000 00 0 0 0 0 0 1 1 A8
(XEN) 10 000 00 0 1 0 1 0 1 1 DB
(XEN) 11 000 00 1 0 0 0 0 0 0 00
(XEN) 12 000 00 0 1 0 1 0 1 1 2C
(XEN) 13 000 00 1 1 0 1 0 1 1 51
(XEN) 14 000 00 1 1 0 1 0 1 1 29
(XEN) 15 07A 0A 1 0 0 0 0 0 2 B4
(XEN) 16 000 00 0 1 0 1 0 1 1 BB
(XEN) 17 000 00 0 1 0 1 0 1 1 C2
(XEN) Using vector-based indexing
(XEN) IRQ to pin mappings:
(XEN) IRQ240 -> 0:2
(XEN) IRQ56 -> 0:1
(XEN) IRQ64 -> 0:3
(XEN) IRQ72 -> 0:4
(XEN) IRQ80 -> 0:5
(XEN) IRQ88 -> 0:6
(XEN) IRQ96 -> 0:7
(XEN) IRQ104 -> 0:8
(XEN) IRQ112 -> 0:9
(XEN) IRQ120 -> 0:10
(XEN) IRQ136 -> 0:11
(XEN) IRQ144 -> 0:12
(XEN) IRQ152 -> 0:13
(XEN) IRQ160 -> 0:14
(XEN) IRQ168 -> 0:15
(XEN) IRQ219 -> 0:16
(XEN) IRQ44 -> 0:18
(XEN) IRQ81 -> 0:19
(XEN) IRQ41 -> 0:20
(XEN) IRQ187 -> 0:22
(XEN) IRQ194 -> 0:23
(XEN) .................................... done.
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 1:
(XEN) CA-107844****************************************
(XEN)
(XEN) Reboot in five seconds...
(XEN) Executing crash image


Am 05.08.2013 16:51, schrieb Andrew Cooper:
> All of these crashes are coming out of mwait_idle, so the cpu in
> question has literally just been in an lower power state.
>
> I am wondering whether there is some caching issue where an update to
> the Pending EOI stack pointer got "lost", but this seems like a little
> too specific to be reasonably explained as a caching issue.
>
> A new debugging patch is on its way (Sorry - it has been a very busy few
> days)
>
> ~Andrew
>


andrew.cooper3 at citrix

Aug 9, 2013, 2:40 PM

Post #10 of 27 (31 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

On 09/08/13 22:27, Thimo E. wrote:
> Next crash occured, debugging output included.
>
> One Remark: Over the last days (besides many linux PV guests) 1
> Windows Guest (with PV drivers) was running, today I've started
> another Windows guest and during 3 hours two crashed occured,
> coincidence ?
>
> Best regards
> Thimo

So according to my debugging, we really have just pushed the same irq
which we have subsequently seen again unexpectedly.

This bug has only ever been seen on Haswell hardware, and appears linked
to running HVM guests.

So either there is an erroneous ACK the LAPIC which is clearing the ISR
before the PEOI stack is expecting (which I obviously see, looking at
the code), or something more funky is going on with the hardware.

CC'ing in the Intel maintainers: Do you have any ideas? Could this be
related to APICv?

~Andrew

>
> (XEN) **Pending EOI error
> (XEN) irq 29, vector 0x24
> (XEN) s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000,
> IRR 00000000
> (XEN) All LAPIC state:
> (XEN) [vector] ISR TMR IRR
> (XEN) [1f:00] 00000000 00000000 00000000
> (XEN) [3f:20] 00000010 76efa12e 00000000
> (XEN) [5f:40] 00000000 e6f0f2fc 00000000
> (XEN) [7f:60] 00000000 32d096ca 00000000
> (XEN) [9f:80] 00000000 78fcf87a 00000000
> (XEN) [bf:a0] 00000000 f9b9fe4e 00000000
> (XEN) [df:c0] 00000000 ffdfe7ab 00000000
> (XEN) [ff:e0] 00000000 00000000 00000000
> (XEN) Peoi stack trace records:
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Guest interrupt information:
> (XEN) IRQ: 0 affinity:1 vec:f0 type=IO-APIC-edge
> status=00000000 mapped, unbound
> (XEN) IRQ: 1 affinity:1 vec:38 type=IO-APIC-edge
> status=00000050 in-flight=0 domain-list=0: 1(----),
> (XEN) IRQ: 2 affinity:f vec:00 type=XT-PIC
> status=00000000 mapped, unbound
> (XEN) IRQ: 3 affinity:1 vec:40 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 4 affinity:1 vec:48 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 5 affinity:1 vec:50 type=IO-APIC-edge
> status=00000050 in-flight=0 domain-list=0: 5(----),
> (XEN) IRQ: 6 affinity:1 vec:58 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 7 affinity:1 vec:60 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 8 affinity:1 vec:68 type=IO-APIC-edge
> status=00000050 in-flight=0 domain-list=0: 8(----),
> (XEN) IRQ: 9 affinity:1 vec:70 type=IO-APIC-level
> status=00000050 in-flight=0 domain-list=0: 9(----),
> (XEN) IRQ: 10 affinity:1 vec:78 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 11 affinity:1 vec:88 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 12 affinity:1 vec:90 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 13 affinity:1 vec:98 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 14 affinity:1 vec:a0 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 15 affinity:1 vec:a8 type=IO-APIC-edge
> status=00000002 mapped, unbound
> (XEN) IRQ: 16 affinity:1 vec:db type=IO-APIC-level
> status=00000010 in-flight=0 domain-list=0: 16(----),
> (XEN) IRQ: 18 affinity:1 vec:2c type=IO-APIC-level
> status=00000010 in-flight=0 domain-list=0: 18(----),
> (XEN) IRQ: 19 affinity:1 vec:51 type=IO-APIC-level
> status=00000002 mapped, unbound
> (XEN) IRQ: 20 affinity:1 vec:29 type=IO-APIC-level
> status=00000002 mapped, unbound
> (XEN) IRQ: 22 affinity:1 vec:bb type=IO-APIC-level
> status=00000050 in-flight=0 domain-list=0: 22(----),
> (XEN) IRQ: 23 affinity:8 vec:c2 type=IO-APIC-level
> status=00000050 in-flight=0 domain-list=0: 23(----),
> (XEN) IRQ: 24 affinity:1 vec:28 type=DMA_MSI
> status=00000000 mapped, unbound
> (XEN) IRQ: 25 affinity:1 vec:30 type=DMA_MSI
> status=00000000 mapped, unbound
> (XEN) IRQ: 26 affinity:f vec:c0 type=PCI-MSI
> status=00000002 mapped, unbound
> (XEN) IRQ: 27 affinity:f vec:c8 type=PCI-MSI
> status=00000002 mapped, unbound
> (XEN) IRQ: 28 affinity:f vec:d0 type=PCI-MSI
> status=00000002 mapped, unbound
> (XEN) IRQ: 29 affinity:2 vec:24 type=PCI-MSI
> status=00000010 in-flight=0 domain-list=0:276(----),
> (XEN) IRQ: 30 affinity:4 vec:93 type=PCI-MSI
> status=00000050 in-flight=0 domain-list=0:275(----),
> (XEN) IRQ: 31 affinity:2 vec:4a type=PCI-MSI
> status=00000050 in-flight=0 domain-list=0:274(----),
> (XEN) IRQ: 32 affinity:2 vec:73 type=PCI-MSI
> status=00000050 in-flight=0 domain-list=0:273(----),
> (XEN) IRQ: 33 affinity:1 vec:49 type=PCI-MSI
> status=00000050 in-flight=0 domain-list=0:272(----),
> (XEN) IRQ: 34 affinity:8 vec:5f type=PCI-MSI
> status=00000050 in-flight=0 domain-list=0:271(----),
> (XEN) IO-APIC interrupt information:
> (XEN) IRQ 0 Vec240:
> (XEN) Apic 0x00, Pin 2: vec=f0 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 1 Vec 56:
> (XEN) Apic 0x00, Pin 1: vec=38 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 3 Vec 64:
> (XEN) Apic 0x00, Pin 3: vec=40 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 4 Vec 72:
> (XEN) Apic 0x00, Pin 4: vec=48 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 5 Vec 80:
> (XEN) Apic 0x00, Pin 5: vec=50 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 6 Vec 88:
> (XEN) Apic 0x00, Pin 6: vec=58 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 7 Vec 96:
> (XEN) Apic 0x00, Pin 7: vec=60 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 8 Vec104:
> (XEN) Apic 0x00, Pin 8: vec=68 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 9 Vec112:
> (XEN) Apic 0x00, Pin 9: vec=70 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=L mask=0 dest_id:0
> (XEN) IRQ 10 Vec120:
> (XEN) Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 11 Vec136:
> (XEN) Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 12 Vec144:
> (XEN) Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 13 Vec152:
> (XEN) Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 14 Vec160:
> (XEN) Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 15 Vec168:
> (XEN) Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 16 Vec219:
> (XEN) Apic 0x00, Pin 16: vec=db delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN) IRQ 18 Vec 44:
> (XEN) Apic 0x00, Pin 18: vec=2c delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN) IRQ 19 Vec 81:
> (XEN) Apic 0x00, Pin 19: vec=51 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN) IRQ 20 Vec 41:
> (XEN) Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN) IRQ 22 Vec187:
> (XEN) Apic 0x00, Pin 22: vec=bb delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN) IRQ 23 Vec194:
> (XEN) Apic 0x00, Pin 23: vec=c2 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN) number of MP IRQ sources: 15.
> (XEN) number of IO-APIC #2 registers: 24.
> (XEN) testing the IO APIC.......................
> (XEN) IO APIC #2......
> (XEN) .... register #00: 02000000
> (XEN) ....... : physical APIC id: 02
> (XEN) ....... : Delivery Type: 0
> (XEN) ....... : LTS : 0
> (XEN) .... register #01: 00170020
> (XEN) ....... : max redirection entries: 0017
> (XEN) ....... : PRQ implemented: 0
> (XEN) ....... : IO APIC version: 0020
> (XEN) .... IRQ redirection table:
> (XEN) NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
> (XEN) 00 000 00 1 0 0 0 0 0 0 00
> (XEN) 01 000 00 0 0 0 0 0 1 1 38
> (XEN) 02 000 00 0 0 0 0 0 1 1 F0
> (XEN) 03 000 00 0 0 0 0 0 1 1 40
> (XEN) 04 000 00 0 0 0 0 0 1 1 48
> (XEN) 05 000 00 0 0 0 0 0 1 1 50
> (XEN) 06 000 00 0 0 0 0 0 1 1 58
> (XEN) 07 000 00 0 0 0 0 0 1 1 60
> (XEN) 08 000 00 0 0 0 0 0 1 1 68
> (XEN) 09 000 00 0 1 0 0 0 1 1 70
> (XEN) 0a 000 00 0 0 0 0 0 1 1 78
> (XEN) 0b 000 00 0 0 0 0 0 1 1 88
> (XEN) 0c 000 00 0 0 0 0 0 1 1 90
> (XEN) 0d 000 00 0 0 0 0 0 1 1 98
> (XEN) 0e 000 00 0 0 0 0 0 1 1 A0
> (XEN) 0f 000 00 0 0 0 0 0 1 1 A8
> (XEN) 10 000 00 0 1 0 1 0 1 1 DB
> (XEN) 11 000 00 1 0 0 0 0 0 0 00
> (XEN) 12 000 00 0 1 0 1 0 1 1 2C
> (XEN) 13 000 00 1 1 0 1 0 1 1 51
> (XEN) 14 000 00 1 1 0 1 0 1 1 29
> (XEN) 15 07A 0A 1 0 0 0 0 0 2 B4
> (XEN) 16 000 00 0 1 0 1 0 1 1 BB
> (XEN) 17 000 00 0 1 0 1 0 1 1 C2
> (XEN) Using vector-based indexing
> (XEN) IRQ to pin mappings:
> (XEN) IRQ240 -> 0:2
> (XEN) IRQ56 -> 0:1
> (XEN) IRQ64 -> 0:3
> (XEN) IRQ72 -> 0:4
> (XEN) IRQ80 -> 0:5
> (XEN) IRQ88 -> 0:6
> (XEN) IRQ96 -> 0:7
> (XEN) IRQ104 -> 0:8
> (XEN) IRQ112 -> 0:9
> (XEN) IRQ120 -> 0:10
> (XEN) IRQ136 -> 0:11
> (XEN) IRQ144 -> 0:12
> (XEN) IRQ152 -> 0:13
> (XEN) IRQ160 -> 0:14
> (XEN) IRQ168 -> 0:15
> (XEN) IRQ219 -> 0:16
> (XEN) IRQ44 -> 0:18
> (XEN) IRQ81 -> 0:19
> (XEN) IRQ41 -> 0:20
> (XEN) IRQ187 -> 0:22
> (XEN) IRQ194 -> 0:23
> (XEN) .................................... done.
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) CA-107844****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> (XEN) Executing crash image
>
>
> Am 05.08.2013 16:51, schrieb Andrew Cooper:
>> All of these crashes are coming out of mwait_idle, so the cpu in
>> question has literally just been in an lower power state.
>>
>> I am wondering whether there is some caching issue where an update to
>> the Pending EOI stack pointer got "lost", but this seems like a little
>> too specific to be reasonably explained as a caching issue.
>>
>> A new debugging patch is on its way (Sorry - it has been a very busy few
>> days)
>>
>> ~Andrew
>>


andrew.cooper3 at citrix

Aug 9, 2013, 2:44 PM

Post #11 of 27 (31 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

On 09/08/13 22:40, Andrew Cooper wrote:
> On 09/08/13 22:27, Thimo E. wrote:
>> Next crash occured, debugging output included.
>>
>> One Remark: Over the last days (besides many linux PV guests) 1
>> Windows Guest (with PV drivers) was running, today I've started
>> another Windows guest and during 3 hours two crashed occured,
>> coincidence ?
>>
>> Best regards
>> Thimo
>
> So according to my debugging, we really have just pushed the same irq
> which we have subsequently seen again unexpectedly.
>
> This bug has only ever been seen on Haswell hardware, and appears
> linked to running HVM guests.
>
> So either there is an erroneous ACK the LAPIC which is clearing the
> ISR before the PEOI stack is expecting (which I

"can't"

Apologies for the confusion.

~Andrew

> obviously see, looking at the code), or something more funky is going
> on with the hardware.
>
> CC'ing in the Intel maintainers: Do you have any ideas? Could this
> be related to APICv?
>
> ~Andrew
>
>>
>> (XEN) **Pending EOI error
>> (XEN) irq 29, vector 0x24
>> (XEN) s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000,
>> IRR 00000000
>> (XEN) All LAPIC state:
>> (XEN) [vector] ISR TMR IRR
>> (XEN) [1f:00] 00000000 00000000 00000000
>> (XEN) [3f:20] 00000010 76efa12e 00000000
>> (XEN) [5f:40] 00000000 e6f0f2fc 00000000
>> (XEN) [7f:60] 00000000 32d096ca 00000000
>> (XEN) [9f:80] 00000000 78fcf87a 00000000
>> (XEN) [bf:a0] 00000000 f9b9fe4e 00000000
>> (XEN) [df:c0] 00000000 ffdfe7ab 00000000
>> (XEN) [ff:e0] 00000000 00000000 00000000
>> (XEN) Peoi stack trace records:
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Guest interrupt information:
>> (XEN) IRQ: 0 affinity:1 vec:f0 type=IO-APIC-edge
>> status=00000000 mapped, unbound
>> (XEN) IRQ: 1 affinity:1 vec:38 type=IO-APIC-edge
>> status=00000050 in-flight=0 domain-list=0: 1(----),
>> (XEN) IRQ: 2 affinity:f vec:00 type=XT-PIC
>> status=00000000 mapped, unbound
>> (XEN) IRQ: 3 affinity:1 vec:40 type=IO-APIC-edge
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 4 affinity:1 vec:48 type=IO-APIC-edge
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 5 affinity:1 vec:50 type=IO-APIC-edge
>> status=00000050 in-flight=0 domain-list=0: 5(----),
>> (XEN) IRQ: 6 affinity:1 vec:58 type=IO-APIC-edge
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 7 affinity:1 vec:60 type=IO-APIC-edge
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 8 affinity:1 vec:68 type=IO-APIC-edge
>> status=00000050 in-flight=0 domain-list=0: 8(----),
>> (XEN) IRQ: 9 affinity:1 vec:70 type=IO-APIC-level
>> status=00000050 in-flight=0 domain-list=0: 9(----),
>> (XEN) IRQ: 10 affinity:1 vec:78 type=IO-APIC-edge
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 11 affinity:1 vec:88 type=IO-APIC-edge
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 12 affinity:1 vec:90 type=IO-APIC-edge
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 13 affinity:1 vec:98 type=IO-APIC-edge
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 14 affinity:1 vec:a0 type=IO-APIC-edge
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 15 affinity:1 vec:a8 type=IO-APIC-edge
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 16 affinity:1 vec:db type=IO-APIC-level
>> status=00000010 in-flight=0 domain-list=0: 16(----),
>> (XEN) IRQ: 18 affinity:1 vec:2c type=IO-APIC-level
>> status=00000010 in-flight=0 domain-list=0: 18(----),
>> (XEN) IRQ: 19 affinity:1 vec:51 type=IO-APIC-level
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 20 affinity:1 vec:29 type=IO-APIC-level
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 22 affinity:1 vec:bb type=IO-APIC-level
>> status=00000050 in-flight=0 domain-list=0: 22(----),
>> (XEN) IRQ: 23 affinity:8 vec:c2 type=IO-APIC-level
>> status=00000050 in-flight=0 domain-list=0: 23(----),
>> (XEN) IRQ: 24 affinity:1 vec:28 type=DMA_MSI
>> status=00000000 mapped, unbound
>> (XEN) IRQ: 25 affinity:1 vec:30 type=DMA_MSI
>> status=00000000 mapped, unbound
>> (XEN) IRQ: 26 affinity:f vec:c0 type=PCI-MSI
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 27 affinity:f vec:c8 type=PCI-MSI
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 28 affinity:f vec:d0 type=PCI-MSI
>> status=00000002 mapped, unbound
>> (XEN) IRQ: 29 affinity:2 vec:24 type=PCI-MSI
>> status=00000010 in-flight=0 domain-list=0:276(----),
>> (XEN) IRQ: 30 affinity:4 vec:93 type=PCI-MSI
>> status=00000050 in-flight=0 domain-list=0:275(----),
>> (XEN) IRQ: 31 affinity:2 vec:4a type=PCI-MSI
>> status=00000050 in-flight=0 domain-list=0:274(----),
>> (XEN) IRQ: 32 affinity:2 vec:73 type=PCI-MSI
>> status=00000050 in-flight=0 domain-list=0:273(----),
>> (XEN) IRQ: 33 affinity:1 vec:49 type=PCI-MSI
>> status=00000050 in-flight=0 domain-list=0:272(----),
>> (XEN) IRQ: 34 affinity:8 vec:5f type=PCI-MSI
>> status=00000050 in-flight=0 domain-list=0:271(----),
>> (XEN) IO-APIC interrupt information:
>> (XEN) IRQ 0 Vec240:
>> (XEN) Apic 0x00, Pin 2: vec=f0 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 1 Vec 56:
>> (XEN) Apic 0x00, Pin 1: vec=38 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 3 Vec 64:
>> (XEN) Apic 0x00, Pin 3: vec=40 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 4 Vec 72:
>> (XEN) Apic 0x00, Pin 4: vec=48 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 5 Vec 80:
>> (XEN) Apic 0x00, Pin 5: vec=50 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 6 Vec 88:
>> (XEN) Apic 0x00, Pin 6: vec=58 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 7 Vec 96:
>> (XEN) Apic 0x00, Pin 7: vec=60 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 8 Vec104:
>> (XEN) Apic 0x00, Pin 8: vec=68 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 9 Vec112:
>> (XEN) Apic 0x00, Pin 9: vec=70 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=L mask=0 dest_id:0
>> (XEN) IRQ 10 Vec120:
>> (XEN) Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 11 Vec136:
>> (XEN) Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 12 Vec144:
>> (XEN) Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 13 Vec152:
>> (XEN) Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 14 Vec160:
>> (XEN) Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 15 Vec168:
>> (XEN) Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 16 Vec219:
>> (XEN) Apic 0x00, Pin 16: vec=db delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN) IRQ 18 Vec 44:
>> (XEN) Apic 0x00, Pin 18: vec=2c delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN) IRQ 19 Vec 81:
>> (XEN) Apic 0x00, Pin 19: vec=51 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=1 dest_id:0
>> (XEN) IRQ 20 Vec 41:
>> (XEN) Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=1 dest_id:0
>> (XEN) IRQ 22 Vec187:
>> (XEN) Apic 0x00, Pin 22: vec=bb delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN) IRQ 23 Vec194:
>> (XEN) Apic 0x00, Pin 23: vec=c2 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN) number of MP IRQ sources: 15.
>> (XEN) number of IO-APIC #2 registers: 24.
>> (XEN) testing the IO APIC.......................
>> (XEN) IO APIC #2......
>> (XEN) .... register #00: 02000000
>> (XEN) ....... : physical APIC id: 02
>> (XEN) ....... : Delivery Type: 0
>> (XEN) ....... : LTS : 0
>> (XEN) .... register #01: 00170020
>> (XEN) ....... : max redirection entries: 0017
>> (XEN) ....... : PRQ implemented: 0
>> (XEN) ....... : IO APIC version: 0020
>> (XEN) .... IRQ redirection table:
>> (XEN) NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
>> (XEN) 00 000 00 1 0 0 0 0 0 0 00
>> (XEN) 01 000 00 0 0 0 0 0 1 1 38
>> (XEN) 02 000 00 0 0 0 0 0 1 1 F0
>> (XEN) 03 000 00 0 0 0 0 0 1 1 40
>> (XEN) 04 000 00 0 0 0 0 0 1 1 48
>> (XEN) 05 000 00 0 0 0 0 0 1 1 50
>> (XEN) 06 000 00 0 0 0 0 0 1 1 58
>> (XEN) 07 000 00 0 0 0 0 0 1 1 60
>> (XEN) 08 000 00 0 0 0 0 0 1 1 68
>> (XEN) 09 000 00 0 1 0 0 0 1 1 70
>> (XEN) 0a 000 00 0 0 0 0 0 1 1 78
>> (XEN) 0b 000 00 0 0 0 0 0 1 1 88
>> (XEN) 0c 000 00 0 0 0 0 0 1 1 90
>> (XEN) 0d 000 00 0 0 0 0 0 1 1 98
>> (XEN) 0e 000 00 0 0 0 0 0 1 1 A0
>> (XEN) 0f 000 00 0 0 0 0 0 1 1 A8
>> (XEN) 10 000 00 0 1 0 1 0 1 1 DB
>> (XEN) 11 000 00 1 0 0 0 0 0 0 00
>> (XEN) 12 000 00 0 1 0 1 0 1 1 2C
>> (XEN) 13 000 00 1 1 0 1 0 1 1 51
>> (XEN) 14 000 00 1 1 0 1 0 1 1 29
>> (XEN) 15 07A 0A 1 0 0 0 0 0 2 B4
>> (XEN) 16 000 00 0 1 0 1 0 1 1 BB
>> (XEN) 17 000 00 0 1 0 1 0 1 1 C2
>> (XEN) Using vector-based indexing
>> (XEN) IRQ to pin mappings:
>> (XEN) IRQ240 -> 0:2
>> (XEN) IRQ56 -> 0:1
>> (XEN) IRQ64 -> 0:3
>> (XEN) IRQ72 -> 0:4
>> (XEN) IRQ80 -> 0:5
>> (XEN) IRQ88 -> 0:6
>> (XEN) IRQ96 -> 0:7
>> (XEN) IRQ104 -> 0:8
>> (XEN) IRQ112 -> 0:9
>> (XEN) IRQ120 -> 0:10
>> (XEN) IRQ136 -> 0:11
>> (XEN) IRQ144 -> 0:12
>> (XEN) IRQ152 -> 0:13
>> (XEN) IRQ160 -> 0:14
>> (XEN) IRQ168 -> 0:15
>> (XEN) IRQ219 -> 0:16
>> (XEN) IRQ44 -> 0:18
>> (XEN) IRQ81 -> 0:19
>> (XEN) IRQ41 -> 0:20
>> (XEN) IRQ187 -> 0:22
>> (XEN) IRQ194 -> 0:23
>> (XEN) .................................... done.
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 1:
>> (XEN) CA-107844****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>> (XEN) Executing crash image
>>
>>
>> Am 05.08.2013 16:51, schrieb Andrew Cooper:
>>> All of these crashes are coming out of mwait_idle, so the cpu in
>>> question has literally just been in an lower power state.
>>>
>>> I am wondering whether there is some caching issue where an update to
>>> the Pending EOI stack pointer got "lost", but this seems like a little
>>> too specific to be reasonably explained as a caching issue.
>>>
>>> A new debugging patch is on its way (Sorry - it has been a very busy few
>>> days)
>>>
>>> ~Andrew
>>>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xen.org/xen-devel


abc at digithi

Aug 11, 2013, 10:46 AM

Post #12 of 27 (28 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

Hello again,

attached you'll find another crash dump from today. Don't know if it
gives you more information than the last one.

Just FYI, this is a system with an Intel Mainboard (H87 chipset) and a
Core i5-4670 CPU.

Best regards
Thimo

Am 09.08.2013 23:44, schrieb Andrew Cooper:
> On 09/08/13 22:40, Andrew Cooper wrote:
>>
>> So according to my debugging, we really have just pushed the same irq
>> which we have subsequently seen again unexpectedly.
>>
>> This bug has only ever been seen on Haswell hardware, and appears
>> linked to running HVM guests.
>>
>> So either there is an erroneous ACK the LAPIC which is clearing the
>> ISR before the PEOI stack is expecting (which I
>
> "can't"
>
> Apologies for the confusion.
>
> ~Andrew
>
>> obviously see, looking at the code), or something more funky is going
>> on with the hardware.
>>
>> CC'ing in the Intel maintainers: Do you have any ideas? Could this
>> be related to APICv?
>>
>> ~Andrew
> _______________________________________________
> Xen-devel mailing list
> Xen-devel [at] lists
> http://lists.xen.org/xen-devel
Attachments: crash20130811.txt (10.4 KB)


yang.z.zhang at intel

Aug 11, 2013, 10:50 PM

Post #13 of 27 (17 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

Andrew Cooper wrote on 2013-08-10:
> On 09/08/13 22:27, Thimo E. wrote:
>
>
> Next crash occured, debugging output included.
>
>
> One Remark: Over the last days (besides many linux PV guests) 1
> Windows Guest (with PV drivers) was running, today I've started
> another Windows guest and during 3 hours two crashed occured, coincidence ?
>
> Best regards
> Thimo
>
>
>
> So according to my debugging, we really have just pushed the same irq
> which we have subsequently seen again unexpectedly.
>
> This bug has only ever been seen on Haswell hardware, and appears
> linked to running HVM guests.
>
> So either there is an erroneous ACK the LAPIC which is clearing the
> ISR before the PEOI stack is expecting (which I obviously see, looking
> at the code), or something more funky is going on with the hardware.
>
> CC'ing in the Intel maintainers: Do you have any ideas? Could this
> be related to APICv?
Does your machine support APIC-v?

>
> ~Andrew
>
>
>
>
> (XEN) **Pending EOI error (XEN) irq 29, vector 0x24 (XEN) s[0]
> irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR 00000000
> (XEN) All LAPIC state: (XEN) [vector] ISR TMR IRR
> (XEN) [1f:00] 00000000 00000000 00000000 (XEN) [3f:20] 00000010
> 76efa12e 00000000 (XEN) [5f:40] 00000000 e6f0f2fc 00000000 (XEN)
> [7f:60] 00000000 32d096ca 00000000 (XEN) [9f:80] 00000000 78fcf87a
> 00000000 (XEN) [bf:a0] 00000000 f9b9fe4e 00000000 (XEN) [df:c0]
> 00000000 ffdfe7ab 00000000 (XEN) [ff:e0] 00000000 00000000 00000000
> (XEN) Peoi stack trace records: (XEN) Pushed {sp 0, irq 29, vec
> 0x24} (XEN) Poped entry {sp 1, irq 29, vec 0x24} (XEN) Marked {sp
> 0, irq 29, vec 0x24} ready (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24} (XEN) Marked {sp 0, irq
> 29, vec 0x24} ready (XEN) Pushed {sp 0, irq 29, vec 0x24} (XEN)
> Poped entry {sp 1, irq 29, vec 0x24} (XEN) Marked {sp 0, irq 29, vec
> 0x24} ready (XEN) Pushed {sp 0, irq 29, vec 0x24} (XEN) Poped
> entry {sp 1, irq 29, vec 0x24} (XEN) Marked {sp 0, irq 29, vec 0x24}
> ready (XEN) Pushed {sp 0, irq 29, vec 0x24} (XEN) Poped entry {sp
> 1, irq 29, vec 0x24} (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24} (XEN) Poped entry {sp 1, irq
> 29, vec 0x24} (XEN) Marked {sp 0, irq 29, vec 0x24} ready (XEN)
> Pushed {sp 0, irq 29, vec 0x24} (XEN) Poped entry {sp 1, irq 29, vec
> 0x24} (XEN) Marked {sp 0, irq 29, vec 0x24} ready (XEN) Pushed {sp
> 0, irq 29, vec 0x24} (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready (XEN) Pushed {sp 0,
> irq 29, vec 0x24} (XEN) Poped entry {sp 1, irq 29, vec 0x24} (XEN)
> Marked {sp 0, irq 29, vec 0x24} ready (XEN) Pushed {sp 0, irq 29, vec
> 0x24} (XEN) Poped entry {sp 1, irq 29, vec 0x24} (XEN) Marked {sp
> 0, irq 29, vec 0x24} ready (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24} (XEN) Guest interrupt
> information: (XEN) IRQ: 0 affinity:1 vec:f0 type=IO-APIC-edge
> status=00000000 mapped, unbound (XEN) IRQ: 1 affinity:1 vec:38
> type=IO-APIC-edge status=00000050 in-flight=0 domain-list=0: 1(----),
> (XEN) IRQ: 2 affinity:f vec:00 type=XT-PIC status=00000000 mapped,
> unbound (XEN) IRQ: 3 affinity:1 vec:40 type=IO-APIC-edge
> status=00000002 mapped, unbound (XEN) IRQ: 4 affinity:1 vec:48
> type=IO-APIC-edge status=00000002 mapped, unbound (XEN) IRQ: 5
> affinity:1 vec:50 type=IO-APIC-edge status=00000050 in-flight=0
> domain-list=0: 5(----), (XEN) IRQ: 6 affinity:1 vec:58
> type=IO-APIC-edge status=00000002 mapped, unbound (XEN) IRQ: 7
> affinity:1 vec:60 type=IO-APIC-edge status=00000002 mapped, unbound
> (XEN) IRQ: 8 affinity:1 vec:68 type=IO-APIC-edge status=00000050
> in-flight=0 domain-list=0: 8(----), (XEN) IRQ: 9 affinity:1
> vec:70 type=IO-APIC-level status=00000050 in-flight=0 domain-list=0:
> 9(----), (XEN) IRQ: 10 affinity:1 vec:78 type=IO-APIC-edge
> status=00000002 mapped, unbound (XEN) IRQ: 11 affinity:1 vec:88
> type=IO-APIC-edge status=00000002 mapped, unbound (XEN) IRQ: 12
> affinity:1 vec:90 type=IO-APIC-edge status=00000002 mapped, unbound
> (XEN) IRQ: 13 affinity:1 vec:98 type=IO-APIC-edge status=00000002
> mapped, unbound (XEN) IRQ: 14 affinity:1 vec:a0 type=IO-APIC-edge
> status=00000002 mapped, unbound (XEN) IRQ: 15 affinity:1 vec:a8
> type=IO-APIC-edge status=00000002 mapped, unbound (XEN) IRQ: 16
> affinity:1 vec:db type=IO-APIC-level status=00000010 in-flight=0
> domain-list=0: 16(----), (XEN) IRQ: 18 affinity:1 vec:2c
> type=IO-APIC-level status=00000010 in-flight=0 domain-list=0: 18(----),
> (XEN) IRQ: 19 affinity:1 vec:51 type=IO-APIC-level status=00000002
> mapped, unbound (XEN) IRQ: 20 affinity:1 vec:29 type=IO-APIC-level
> status=00000002 mapped, unbound (XEN) IRQ: 22 affinity:1 vec:bb
> type=IO-APIC-level status=00000050 in-flight=0 domain-list=0: 22(----),
> (XEN) IRQ: 23 affinity:8 vec:c2 type=IO-APIC-level status=00000050
> in-flight=0 domain-list=0: 23(----), (XEN) IRQ: 24 affinity:1
> vec:28 type=DMA_MSI status=00000000 mapped, unbound (XEN) IRQ: 25
> affinity:1 vec:30 type=DMA_MSI status=00000000 mapped, unbound (XEN)
> IRQ: 26 affinity:f vec:c0 type=PCI-MSI status=00000002 mapped, unbound
> (XEN) IRQ: 27 affinity:f vec:c8 type=PCI-MSI status=00000002
> mapped, unbound (XEN) IRQ: 28 affinity:f vec:d0 type=PCI-MSI
> status=00000002 mapped, unbound (XEN) IRQ: 29 affinity:2 vec:24
> type=PCI-MSI status=00000010 in-flight=0 domain-list=0:276(----), (XEN)
> IRQ: 30 affinity:4 vec:93 type=PCI-MSI status=00000050 in-flight=0
> domain-list=0:275(----), (XEN) IRQ: 31 affinity:2 vec:4a
> type=PCI-MSI status=00000050 in-flight=0 domain-list=0:274(----), (XEN)
> IRQ: 32 affinity:2 vec:73 type=PCI-MSI status=00000050 in-flight=0
> domain-list=0:273(----), (XEN) IRQ: 33 affinity:1 vec:49
> type=PCI-MSI status=00000050 in-flight=0 domain-list=0:272(----), (XEN)
> IRQ: 34 affinity:8 vec:5f type=PCI-MSI status=00000050 in-flight=0
> domain-list=0:271(----), (XEN) IO-APIC interrupt information: (XEN)
> IRQ 0 Vec240: (XEN) Apic 0x00, Pin 2: vec=f0 delivery=LoPri
> dest=L status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ
> 1 Vec 56: (XEN) Apic 0x00, Pin 1: vec=38 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 3 Vec
> 64: (XEN) Apic 0x00, Pin 3: vec=40 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 4 Vec
> 72: (XEN) Apic 0x00, Pin 4: vec=48 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 5 Vec
> 80: (XEN) Apic 0x00, Pin 5: vec=50 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 6 Vec
> 88: (XEN) Apic 0x00, Pin 6: vec=58 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 7 Vec
> 96: (XEN) Apic 0x00, Pin 7: vec=60 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 8
> Vec104: (XEN) Apic 0x00, Pin 8: vec=68 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 9
> Vec112: (XEN) Apic 0x00, Pin 9: vec=70 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=L mask=0 dest_id:0 (XEN) IRQ 10
> Vec120: (XEN) Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 11
> Vec136: (XEN) Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 12
> Vec144: (XEN) Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 13
> Vec152: (XEN) Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 14
> Vec160: (XEN) Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 15
> Vec168: (XEN) Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L
> status=0 polarity=0 irr=0 trig=E mask=0 dest_id:0 (XEN) IRQ 16
> Vec219: (XEN) Apic 0x00, Pin 16: vec=db delivery=LoPri dest=L
> status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0 (XEN) IRQ 18 Vec
> 44: (XEN) Apic 0x00, Pin 18: vec=2c delivery=LoPri dest=L
> status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0 (XEN) IRQ 19 Vec
> 81: (XEN) Apic 0x00, Pin 19: vec=51 delivery=LoPri dest=L
> status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0 (XEN) IRQ 20 Vec
> 41: (XEN) Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L
> status=0 polarity=1 irr=0 trig=L mask=1 dest_id:0 (XEN) IRQ 22
> Vec187: (XEN) Apic 0x00, Pin 22: vec=bb delivery=LoPri dest=L
> status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0 (XEN) IRQ 23
> Vec194: (XEN) Apic 0x00, Pin 23: vec=c2 delivery=LoPri dest=L
> status=0 polarity=1 irr=0 trig=L mask=0 dest_id:0 (XEN) number of MP
> IRQ sources: 15. (XEN) number of IO-APIC #2 registers: 24. (XEN)
> testing the IO APIC....................... (XEN) IO APIC #2......
> (XEN) .... register #00: 02000000 (XEN) ....... : physical APIC id:
> 02 (XEN) ....... : Delivery Type: 0 (XEN) ....... : LTS
> : 0 (XEN) .... register #01: 00170020 (XEN) ....... : max
> redirection entries: 0017 (XEN) ....... : PRQ implemented: 0 (XEN)
> ....... : IO APIC version: 0020 (XEN) .... IRQ redirection table:
> (XEN) NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect: (XEN) 00 000
> 00 1 0 0 0 0 0 0 00 (XEN) 01 000 00 0 0 0
> 0 0 1 1 38 (XEN) 02 000 00 0 0 0 0 0 1 1
> F0 (XEN) 03 000 00 0 0 0 0 0 1 1 40 (XEN) 04
> 000 00 0 0 0 0 0 1 1 48 (XEN) 05 000 00 0 0
> 0 0 0 1 1 50 (XEN) 06 000 00 0 0 0 0 0 1
> 1 58 (XEN) 07 000 00 0 0 0 0 0 1 1 60 (XEN)
> 08 000 00 0 0 0 0 0 1 1 68 (XEN) 09 000 00 0 1
> 0 0 0 1 1 70 (XEN) 0a 000 00 0 0 0 0 0 1
> 1 78 (XEN) 0b 000 00 0 0 0 0 0 1 1 88 (XEN)
> 0c 000 00 0 0 0 0 0 1 1 90 (XEN) 0d 000 00 0
> 0 0 0 0 1 1 98 (XEN) 0e 000 00 0 0 0 0 0
> 1 1 A0 (XEN) 0f 000 00 0 0 0 0 0 1 1 A8
> (XEN) 10 000 00 0 1 0 1 0 1 1 DB (XEN) 11 000 00
> 1 0 0 0 0 0 0 00 (XEN) 12 000 00 0 1 0 1
> 0 1 1 2C (XEN) 13 000 00 1 1 0 1 0 1 1
> 51 (XEN) 14 000 00 1 1 0 1 0 1 1 29 (XEN) 15 07A
> 0A 1 0 0 0 0 0 2 B4 (XEN) 16 000 00 0 1 0
> 1 0 1 1 BB (XEN) 17 000 00 0 1 0 1 0 1 1
> C2 (XEN) Using vector-based indexing (XEN) IRQ to pin mappings:
> (XEN) IRQ240 -> 0:2 (XEN) IRQ56 -> 0:1 (XEN) IRQ64 -> 0:3 (XEN)
> IRQ72 -> 0:4 (XEN) IRQ80 -> 0:5 (XEN) IRQ88 -> 0:6 (XEN) IRQ96 -> 0:7
> (XEN) IRQ104 -> 0:8 (XEN) IRQ112 -> 0:9 (XEN) IRQ120 -> 0:10 (XEN)
> IRQ136 -> 0:11 (XEN) IRQ144 -> 0:12 (XEN) IRQ152 -> 0:13 (XEN) IRQ160
> -> 0:14 (XEN) IRQ168 -> 0:15 (XEN) IRQ219 -> 0:16 (XEN) IRQ44 -> 0:18
> (XEN) IRQ81 -> 0:19 (XEN) IRQ41 -> 0:20 (XEN) IRQ187 -> 0:22 (XEN)
> IRQ194 -> 0:23 (XEN) .................................... done. (XEN)
> (XEN) **************************************** (XEN) Panic on CPU 1:
> (XEN) CA-107844**************************************** (XEN) (XEN)
> Reboot in five seconds... (XEN) Executing crash image
>
>
> Am 05.08.2013 16:51, schrieb Andrew Cooper:
>
> All of these crashes are coming out of mwait_idle, so the cpu in
> question has literally just been in an lower power state.
>
> I am wondering whether there is some caching issue where an update to
> the Pending EOI stack pointer got "lost", but this seems like a little
> too specific to be reasonably explained as a caching issue.
>
> A new debugging patch is on its way (Sorry - it has been a very busy
> few days)
>
> ~Andrew
>
>


Best regards,
Yang



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


yang.z.zhang at intel

Aug 11, 2013, 11:02 PM

Post #14 of 27 (17 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

Hi Thimo,

Can you provide the xen boot log?

Best regards,
Yang

From: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] On Behalf Of Thimo E.
Sent: Monday, August 12, 2013 1:47 AM
To: Andrew Cooper
Cc: Keir Fraser; Jan Beulich; Dong, Eddie; Xen-develList; Nakajima, Jun; Zhang, Xiantao
Subject: Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic

Hello again,

attached you'll find another crash dump from today. Don't know if it gives you more information than the last one.

Just FYI, this is a system with an Intel Mainboard (H87 chipset) and a Core i5-4670 CPU.

Best regards
Thimo

Am 09.08.2013 23:44, schrieb Andrew Cooper:
On 09/08/13 22:40, Andrew Cooper wrote:

So according to my debugging, we really have just pushed the same irq which we have subsequently seen again unexpectedly.

This bug has only ever been seen on Haswell hardware, and appears linked to running HVM guests.

So either there is an erroneous ACK the LAPIC which is clearing the ISR before the PEOI stack is expecting (which I

"can't"

Apologies for the confusion.

~Andrew


obviously see, looking at the code), or something more funky is going on with the hardware.

CC'ing in the Intel maintainers: Do you have any ideas? Could this be related to APICv?

~Andrew

_______________________________________________

Xen-devel mailing list

Xen-devel [at] lists<mailto:Xen-devel [at] lists>

http://lists.xen.org/xen-devel


JBeulich at suse

Aug 12, 2013, 1:20 AM

Post #15 of 27 (16 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

>>> On 09.08.13 at 23:27, "Thimo E." <abc [at] digithi> wrote:
> (XEN) **Pending EOI error
> (XEN) irq 29, vector 0x24
> (XEN) s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR 00000000
> (XEN) All LAPIC state:
> (XEN) [vector] ISR TMR IRR
> (XEN) [1f:00] 00000000 00000000 00000000
> (XEN) [3f:20] 00000010 76efa12e 00000000
> (XEN) [5f:40] 00000000 e6f0f2fc 00000000
> (XEN) [7f:60] 00000000 32d096ca 00000000
> (XEN) [9f:80] 00000000 78fcf87a 00000000
> (XEN) [bf:a0] 00000000 f9b9fe4e 00000000
> (XEN) [df:c0] 00000000 ffdfe7ab 00000000
> (XEN) [ff:e0] 00000000 00000000 00000000
> (XEN) Peoi stack trace records:

Mind providing (a link to) the patch that was used here, so that
one can make sense of the printed information (and perhaps
also suggest adjustments to that debugging code)? Nothing I
was able to find on the list fully matches the output above...

Jan

> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
> (XEN) Pushed {sp 0, irq 29, vec 0x24}
> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
> (XEN) Guest interrupt information:
> (XEN) IRQ: 0 affinity:1 vec:f0 type=IO-APIC-edge status=00000000
> mapped, unbound
> (XEN) IRQ: 1 affinity:1 vec:38 type=IO-APIC-edge status=00000050
> in-flight=0 domain-list=0: 1(----),
> (XEN) IRQ: 2 affinity:f vec:00 type=XT-PIC status=00000000 mapped,
> unbound
> (XEN) IRQ: 3 affinity:1 vec:40 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 4 affinity:1 vec:48 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 5 affinity:1 vec:50 type=IO-APIC-edge status=00000050
> in-flight=0 domain-list=0: 5(----),
> (XEN) IRQ: 6 affinity:1 vec:58 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 7 affinity:1 vec:60 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 8 affinity:1 vec:68 type=IO-APIC-edge status=00000050
> in-flight=0 domain-list=0: 8(----),
> (XEN) IRQ: 9 affinity:1 vec:70 type=IO-APIC-level status=00000050
> in-flight=0 domain-list=0: 9(----),
> (XEN) IRQ: 10 affinity:1 vec:78 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 11 affinity:1 vec:88 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 12 affinity:1 vec:90 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 13 affinity:1 vec:98 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 14 affinity:1 vec:a0 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 15 affinity:1 vec:a8 type=IO-APIC-edge status=00000002
> mapped, unbound
> (XEN) IRQ: 16 affinity:1 vec:db type=IO-APIC-level status=00000010
> in-flight=0 domain-list=0: 16(----),
> (XEN) IRQ: 18 affinity:1 vec:2c type=IO-APIC-level status=00000010
> in-flight=0 domain-list=0: 18(----),
> (XEN) IRQ: 19 affinity:1 vec:51 type=IO-APIC-level status=00000002
> mapped, unbound
> (XEN) IRQ: 20 affinity:1 vec:29 type=IO-APIC-level status=00000002
> mapped, unbound
> (XEN) IRQ: 22 affinity:1 vec:bb type=IO-APIC-level status=00000050
> in-flight=0 domain-list=0: 22(----),
> (XEN) IRQ: 23 affinity:8 vec:c2 type=IO-APIC-level status=00000050
> in-flight=0 domain-list=0: 23(----),
> (XEN) IRQ: 24 affinity:1 vec:28 type=DMA_MSI status=00000000 mapped,
> unbound
> (XEN) IRQ: 25 affinity:1 vec:30 type=DMA_MSI status=00000000 mapped,
> unbound
> (XEN) IRQ: 26 affinity:f vec:c0 type=PCI-MSI status=00000002 mapped,
> unbound
> (XEN) IRQ: 27 affinity:f vec:c8 type=PCI-MSI status=00000002 mapped,
> unbound
> (XEN) IRQ: 28 affinity:f vec:d0 type=PCI-MSI status=00000002 mapped,
> unbound
> (XEN) IRQ: 29 affinity:2 vec:24 type=PCI-MSI status=00000010
> in-flight=0 domain-list=0:276(----),
> (XEN) IRQ: 30 affinity:4 vec:93 type=PCI-MSI status=00000050
> in-flight=0 domain-list=0:275(----),
> (XEN) IRQ: 31 affinity:2 vec:4a type=PCI-MSI status=00000050
> in-flight=0 domain-list=0:274(----),
> (XEN) IRQ: 32 affinity:2 vec:73 type=PCI-MSI status=00000050
> in-flight=0 domain-list=0:273(----),
> (XEN) IRQ: 33 affinity:1 vec:49 type=PCI-MSI status=00000050
> in-flight=0 domain-list=0:272(----),
> (XEN) IRQ: 34 affinity:8 vec:5f type=PCI-MSI status=00000050
> in-flight=0 domain-list=0:271(----),
> (XEN) IO-APIC interrupt information:
> (XEN) IRQ 0 Vec240:
> (XEN) Apic 0x00, Pin 2: vec=f0 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 1 Vec 56:
> (XEN) Apic 0x00, Pin 1: vec=38 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 3 Vec 64:
> (XEN) Apic 0x00, Pin 3: vec=40 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 4 Vec 72:
> (XEN) Apic 0x00, Pin 4: vec=48 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 5 Vec 80:
> (XEN) Apic 0x00, Pin 5: vec=50 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 6 Vec 88:
> (XEN) Apic 0x00, Pin 6: vec=58 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 7 Vec 96:
> (XEN) Apic 0x00, Pin 7: vec=60 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 8 Vec104:
> (XEN) Apic 0x00, Pin 8: vec=68 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 9 Vec112:
> (XEN) Apic 0x00, Pin 9: vec=70 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=L mask=0 dest_id:0
> (XEN) IRQ 10 Vec120:
> (XEN) Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 11 Vec136:
> (XEN) Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 12 Vec144:
> (XEN) Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 13 Vec152:
> (XEN) Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 14 Vec160:
> (XEN) Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 15 Vec168:
> (XEN) Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0
> polarity=0 irr=0 trig=E mask=0 dest_id:0
> (XEN) IRQ 16 Vec219:
> (XEN) Apic 0x00, Pin 16: vec=db delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN) IRQ 18 Vec 44:
> (XEN) Apic 0x00, Pin 18: vec=2c delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN) IRQ 19 Vec 81:
> (XEN) Apic 0x00, Pin 19: vec=51 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN) IRQ 20 Vec 41:
> (XEN) Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=1 dest_id:0
> (XEN) IRQ 22 Vec187:
> (XEN) Apic 0x00, Pin 22: vec=bb delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN) IRQ 23 Vec194:
> (XEN) Apic 0x00, Pin 23: vec=c2 delivery=LoPri dest=L status=0
> polarity=1 irr=0 trig=L mask=0 dest_id:0
> (XEN) number of MP IRQ sources: 15.
> (XEN) number of IO-APIC #2 registers: 24.
> (XEN) testing the IO APIC.......................
> (XEN) IO APIC #2......
> (XEN) .... register #00: 02000000
> (XEN) ....... : physical APIC id: 02
> (XEN) ....... : Delivery Type: 0
> (XEN) ....... : LTS : 0
> (XEN) .... register #01: 00170020
> (XEN) ....... : max redirection entries: 0017
> (XEN) ....... : PRQ implemented: 0
> (XEN) ....... : IO APIC version: 0020
> (XEN) .... IRQ redirection table:
> (XEN) NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
> (XEN) 00 000 00 1 0 0 0 0 0 0 00
> (XEN) 01 000 00 0 0 0 0 0 1 1 38
> (XEN) 02 000 00 0 0 0 0 0 1 1 F0
> (XEN) 03 000 00 0 0 0 0 0 1 1 40
> (XEN) 04 000 00 0 0 0 0 0 1 1 48
> (XEN) 05 000 00 0 0 0 0 0 1 1 50
> (XEN) 06 000 00 0 0 0 0 0 1 1 58
> (XEN) 07 000 00 0 0 0 0 0 1 1 60
> (XEN) 08 000 00 0 0 0 0 0 1 1 68
> (XEN) 09 000 00 0 1 0 0 0 1 1 70
> (XEN) 0a 000 00 0 0 0 0 0 1 1 78
> (XEN) 0b 000 00 0 0 0 0 0 1 1 88
> (XEN) 0c 000 00 0 0 0 0 0 1 1 90
> (XEN) 0d 000 00 0 0 0 0 0 1 1 98
> (XEN) 0e 000 00 0 0 0 0 0 1 1 A0
> (XEN) 0f 000 00 0 0 0 0 0 1 1 A8
> (XEN) 10 000 00 0 1 0 1 0 1 1 DB
> (XEN) 11 000 00 1 0 0 0 0 0 0 00
> (XEN) 12 000 00 0 1 0 1 0 1 1 2C
> (XEN) 13 000 00 1 1 0 1 0 1 1 51
> (XEN) 14 000 00 1 1 0 1 0 1 1 29
> (XEN) 15 07A 0A 1 0 0 0 0 0 2 B4
> (XEN) 16 000 00 0 1 0 1 0 1 1 BB
> (XEN) 17 000 00 0 1 0 1 0 1 1 C2
> (XEN) Using vector-based indexing
> (XEN) IRQ to pin mappings:
> (XEN) IRQ240 -> 0:2
> (XEN) IRQ56 -> 0:1
> (XEN) IRQ64 -> 0:3
> (XEN) IRQ72 -> 0:4
> (XEN) IRQ80 -> 0:5
> (XEN) IRQ88 -> 0:6
> (XEN) IRQ96 -> 0:7
> (XEN) IRQ104 -> 0:8
> (XEN) IRQ112 -> 0:9
> (XEN) IRQ120 -> 0:10
> (XEN) IRQ136 -> 0:11
> (XEN) IRQ144 -> 0:12
> (XEN) IRQ152 -> 0:13
> (XEN) IRQ160 -> 0:14
> (XEN) IRQ168 -> 0:15
> (XEN) IRQ219 -> 0:16
> (XEN) IRQ44 -> 0:18
> (XEN) IRQ81 -> 0:19
> (XEN) IRQ41 -> 0:20
> (XEN) IRQ187 -> 0:22
> (XEN) IRQ194 -> 0:23
> (XEN) .................................... done.
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 1:
> (XEN) CA-107844****************************************
> (XEN)
> (XEN) Reboot in five seconds...
> (XEN) Executing crash image
>
>
> Am 05.08.2013 16:51, schrieb Andrew Cooper:
>> All of these crashes are coming out of mwait_idle, so the cpu in
>> question has literally just been in an lower power state.
>>
>> I am wondering whether there is some caching issue where an update to
>> the Pending EOI stack pointer got "lost", but this seems like a little
>> too specific to be reasonably explained as a caching issue.
>>
>> A new debugging patch is on its way (Sorry - it has been a very busy few
>> days)
>>
>> ~Andrew
>>



_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


yang.z.zhang at intel

Aug 12, 2013, 1:49 AM

Post #16 of 27 (16 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

Hi Thimo,
From your previous experience and log, it shows:

1. The interrupt that triggers the issue is a MSI.

2. MSI are treated as edge-triggered interrupts nomally, except when there is no way to mask the device. In this case, your previous log indicates the device is unmaskable(What special device are you using?Modern PCI devcie should be maskable).

3. The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.

4. The status of IRQ 29 is 10 which means the guest already issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should be no pending EOI in the EOI stack. If possible, can you add some debug message in the guest EOI code path(like _irq_guest_eoi())) to track the EOI?

5. Both of the log show when the issue occured, most of the other interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it a coincidence? Or it happened only on the special condition like heavy of IRQ migration?Perhaps you can disable irq balance in dom0 and pin the IRQ manually.

6. I guess the interrupt remapping is enabled in your machine. Can you try to disable IR to see whether it still reproduceable?
Also, please provide the whole Xen log.

Best regards,
Yang

From: xen-devel-bounces [at] lists [mailto:xen-devel-bounces [at] lists] On Behalf Of Thimo E.
Sent: Monday, August 12, 2013 1:47 AM
To: Andrew Cooper
Cc: Keir Fraser; Jan Beulich; Dong, Eddie; Xen-develList; Nakajima, Jun; Zhang, Xiantao
Subject: Re: [Xen-devel] cpuidle and un-eoid interrupts at the local apic

Hello again,

attached you'll find another crash dump from today. Don't know if it gives you more information than the last one.

Just FYI, this is a system with an Intel Mainboard (H87 chipset) and a Core i5-4670 CPU.

Best regards
Thimo

Am 09.08.2013 23:44, schrieb Andrew Cooper:
On 09/08/13 22:40, Andrew Cooper wrote:

So according to my debugging, we really have just pushed the same irq which we have subsequently seen again unexpectedly.

This bug has only ever been seen on Haswell hardware, and appears linked to running HVM guests.

So either there is an erroneous ACK the LAPIC which is clearing the ISR before the PEOI stack is expecting (which I

"can't"

Apologies for the confusion.

~Andrew


obviously see, looking at the code), or something more funky is going on with the hardware.

CC'ing in the Intel maintainers: Do you have any ideas? Could this be related to APICv?

~Andrew

_______________________________________________

Xen-devel mailing list

Xen-devel [at] lists<mailto:Xen-devel [at] lists>

http://lists.xen.org/xen-devel


JBeulich at suse

Aug 12, 2013, 1:57 AM

Post #17 of 27 (16 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

>>> On 12.08.13 at 10:49, "Zhang, Yang Z" <yang.z.zhang [at] intel> wrote:
> 5. Both of the log show when the issue occured, most of the other
> interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it a
> coincidence? Or it happened only on the special condition like heavy of IRQ
> migration?Perhaps you can disable irq balance in dom0 and pin the IRQ
> manually.

Since guest IRQs' affinities track the vCPU's placement on pCPU-s,
suppressing IRQ movement would not only require IRQ balancing
to be suppressed in the respective domain, but also that the vCPU
be bound to a single pCPU.

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


andrew.cooper3 at citrix

Aug 12, 2013, 2:10 AM

Post #18 of 27 (16 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

On 11/08/13 18:46, Thimo E. wrote:
> Hello again,
>
> attached you'll find another crash dump from today. Don't know if it
> gives you more information than the last one.
>
> Just FYI, this is a system with an Intel Mainboard (H87 chipset) and a
> Core i5-4670 CPU.
>
> Best regards
> Thimo

It is still saying the same. irq 29 should already be in-service at the
LAPIC (because it is present on the PEOI stack), but isn't, and we
subsequently get reinterrupted with it, causing the assertion to fail.

~Andrew


andrew.cooper3 at citrix

Aug 12, 2013, 2:28 AM

Post #19 of 27 (16 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

On 12/08/13 09:20, Jan Beulich wrote:
>>>> On 09.08.13 at 23:27, "Thimo E." <abc [at] digithi> wrote:
>> (XEN) **Pending EOI error
>> (XEN) irq 29, vector 0x24
>> (XEN) s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR 00000000
>> (XEN) All LAPIC state:
>> (XEN) [vector] ISR TMR IRR
>> (XEN) [1f:00] 00000000 00000000 00000000
>> (XEN) [3f:20] 00000010 76efa12e 00000000
>> (XEN) [5f:40] 00000000 e6f0f2fc 00000000
>> (XEN) [7f:60] 00000000 32d096ca 00000000
>> (XEN) [9f:80] 00000000 78fcf87a 00000000
>> (XEN) [bf:a0] 00000000 f9b9fe4e 00000000
>> (XEN) [df:c0] 00000000 ffdfe7ab 00000000
>> (XEN) [ff:e0] 00000000 00000000 00000000
>> (XEN) Peoi stack trace records:
> Mind providing (a link to) the patch that was used here, so that
> one can make sense of the printed information (and perhaps
> also suggest adjustments to that debugging code)? Nothing I
> was able to find on the list fully matches the output above...
>
> Jan

Attached

~Andrew

>
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Marked {sp 0, irq 29, vec 0x24} ready
>> (XEN) Pushed {sp 0, irq 29, vec 0x24}
>> (XEN) Poped entry {sp 1, irq 29, vec 0x24}
>> (XEN) Guest interrupt information:
>> (XEN) IRQ: 0 affinity:1 vec:f0 type=IO-APIC-edge status=00000000
>> mapped, unbound
>> (XEN) IRQ: 1 affinity:1 vec:38 type=IO-APIC-edge status=00000050
>> in-flight=0 domain-list=0: 1(----),
>> (XEN) IRQ: 2 affinity:f vec:00 type=XT-PIC status=00000000 mapped,
>> unbound
>> (XEN) IRQ: 3 affinity:1 vec:40 type=IO-APIC-edge status=00000002
>> mapped, unbound
>> (XEN) IRQ: 4 affinity:1 vec:48 type=IO-APIC-edge status=00000002
>> mapped, unbound
>> (XEN) IRQ: 5 affinity:1 vec:50 type=IO-APIC-edge status=00000050
>> in-flight=0 domain-list=0: 5(----),
>> (XEN) IRQ: 6 affinity:1 vec:58 type=IO-APIC-edge status=00000002
>> mapped, unbound
>> (XEN) IRQ: 7 affinity:1 vec:60 type=IO-APIC-edge status=00000002
>> mapped, unbound
>> (XEN) IRQ: 8 affinity:1 vec:68 type=IO-APIC-edge status=00000050
>> in-flight=0 domain-list=0: 8(----),
>> (XEN) IRQ: 9 affinity:1 vec:70 type=IO-APIC-level status=00000050
>> in-flight=0 domain-list=0: 9(----),
>> (XEN) IRQ: 10 affinity:1 vec:78 type=IO-APIC-edge status=00000002
>> mapped, unbound
>> (XEN) IRQ: 11 affinity:1 vec:88 type=IO-APIC-edge status=00000002
>> mapped, unbound
>> (XEN) IRQ: 12 affinity:1 vec:90 type=IO-APIC-edge status=00000002
>> mapped, unbound
>> (XEN) IRQ: 13 affinity:1 vec:98 type=IO-APIC-edge status=00000002
>> mapped, unbound
>> (XEN) IRQ: 14 affinity:1 vec:a0 type=IO-APIC-edge status=00000002
>> mapped, unbound
>> (XEN) IRQ: 15 affinity:1 vec:a8 type=IO-APIC-edge status=00000002
>> mapped, unbound
>> (XEN) IRQ: 16 affinity:1 vec:db type=IO-APIC-level status=00000010
>> in-flight=0 domain-list=0: 16(----),
>> (XEN) IRQ: 18 affinity:1 vec:2c type=IO-APIC-level status=00000010
>> in-flight=0 domain-list=0: 18(----),
>> (XEN) IRQ: 19 affinity:1 vec:51 type=IO-APIC-level status=00000002
>> mapped, unbound
>> (XEN) IRQ: 20 affinity:1 vec:29 type=IO-APIC-level status=00000002
>> mapped, unbound
>> (XEN) IRQ: 22 affinity:1 vec:bb type=IO-APIC-level status=00000050
>> in-flight=0 domain-list=0: 22(----),
>> (XEN) IRQ: 23 affinity:8 vec:c2 type=IO-APIC-level status=00000050
>> in-flight=0 domain-list=0: 23(----),
>> (XEN) IRQ: 24 affinity:1 vec:28 type=DMA_MSI status=00000000 mapped,
>> unbound
>> (XEN) IRQ: 25 affinity:1 vec:30 type=DMA_MSI status=00000000 mapped,
>> unbound
>> (XEN) IRQ: 26 affinity:f vec:c0 type=PCI-MSI status=00000002 mapped,
>> unbound
>> (XEN) IRQ: 27 affinity:f vec:c8 type=PCI-MSI status=00000002 mapped,
>> unbound
>> (XEN) IRQ: 28 affinity:f vec:d0 type=PCI-MSI status=00000002 mapped,
>> unbound
>> (XEN) IRQ: 29 affinity:2 vec:24 type=PCI-MSI status=00000010
>> in-flight=0 domain-list=0:276(----),
>> (XEN) IRQ: 30 affinity:4 vec:93 type=PCI-MSI status=00000050
>> in-flight=0 domain-list=0:275(----),
>> (XEN) IRQ: 31 affinity:2 vec:4a type=PCI-MSI status=00000050
>> in-flight=0 domain-list=0:274(----),
>> (XEN) IRQ: 32 affinity:2 vec:73 type=PCI-MSI status=00000050
>> in-flight=0 domain-list=0:273(----),
>> (XEN) IRQ: 33 affinity:1 vec:49 type=PCI-MSI status=00000050
>> in-flight=0 domain-list=0:272(----),
>> (XEN) IRQ: 34 affinity:8 vec:5f type=PCI-MSI status=00000050
>> in-flight=0 domain-list=0:271(----),
>> (XEN) IO-APIC interrupt information:
>> (XEN) IRQ 0 Vec240:
>> (XEN) Apic 0x00, Pin 2: vec=f0 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 1 Vec 56:
>> (XEN) Apic 0x00, Pin 1: vec=38 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 3 Vec 64:
>> (XEN) Apic 0x00, Pin 3: vec=40 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 4 Vec 72:
>> (XEN) Apic 0x00, Pin 4: vec=48 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 5 Vec 80:
>> (XEN) Apic 0x00, Pin 5: vec=50 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 6 Vec 88:
>> (XEN) Apic 0x00, Pin 6: vec=58 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 7 Vec 96:
>> (XEN) Apic 0x00, Pin 7: vec=60 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 8 Vec104:
>> (XEN) Apic 0x00, Pin 8: vec=68 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 9 Vec112:
>> (XEN) Apic 0x00, Pin 9: vec=70 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=L mask=0 dest_id:0
>> (XEN) IRQ 10 Vec120:
>> (XEN) Apic 0x00, Pin 10: vec=78 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 11 Vec136:
>> (XEN) Apic 0x00, Pin 11: vec=88 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 12 Vec144:
>> (XEN) Apic 0x00, Pin 12: vec=90 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 13 Vec152:
>> (XEN) Apic 0x00, Pin 13: vec=98 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 14 Vec160:
>> (XEN) Apic 0x00, Pin 14: vec=a0 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 15 Vec168:
>> (XEN) Apic 0x00, Pin 15: vec=a8 delivery=LoPri dest=L status=0
>> polarity=0 irr=0 trig=E mask=0 dest_id:0
>> (XEN) IRQ 16 Vec219:
>> (XEN) Apic 0x00, Pin 16: vec=db delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN) IRQ 18 Vec 44:
>> (XEN) Apic 0x00, Pin 18: vec=2c delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN) IRQ 19 Vec 81:
>> (XEN) Apic 0x00, Pin 19: vec=51 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=1 dest_id:0
>> (XEN) IRQ 20 Vec 41:
>> (XEN) Apic 0x00, Pin 20: vec=29 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=1 dest_id:0
>> (XEN) IRQ 22 Vec187:
>> (XEN) Apic 0x00, Pin 22: vec=bb delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN) IRQ 23 Vec194:
>> (XEN) Apic 0x00, Pin 23: vec=c2 delivery=LoPri dest=L status=0
>> polarity=1 irr=0 trig=L mask=0 dest_id:0
>> (XEN) number of MP IRQ sources: 15.
>> (XEN) number of IO-APIC #2 registers: 24.
>> (XEN) testing the IO APIC.......................
>> (XEN) IO APIC #2......
>> (XEN) .... register #00: 02000000
>> (XEN) ....... : physical APIC id: 02
>> (XEN) ....... : Delivery Type: 0
>> (XEN) ....... : LTS : 0
>> (XEN) .... register #01: 00170020
>> (XEN) ....... : max redirection entries: 0017
>> (XEN) ....... : PRQ implemented: 0
>> (XEN) ....... : IO APIC version: 0020
>> (XEN) .... IRQ redirection table:
>> (XEN) NR Log Phy Mask Trig IRR Pol Stat Dest Deli Vect:
>> (XEN) 00 000 00 1 0 0 0 0 0 0 00
>> (XEN) 01 000 00 0 0 0 0 0 1 1 38
>> (XEN) 02 000 00 0 0 0 0 0 1 1 F0
>> (XEN) 03 000 00 0 0 0 0 0 1 1 40
>> (XEN) 04 000 00 0 0 0 0 0 1 1 48
>> (XEN) 05 000 00 0 0 0 0 0 1 1 50
>> (XEN) 06 000 00 0 0 0 0 0 1 1 58
>> (XEN) 07 000 00 0 0 0 0 0 1 1 60
>> (XEN) 08 000 00 0 0 0 0 0 1 1 68
>> (XEN) 09 000 00 0 1 0 0 0 1 1 70
>> (XEN) 0a 000 00 0 0 0 0 0 1 1 78
>> (XEN) 0b 000 00 0 0 0 0 0 1 1 88
>> (XEN) 0c 000 00 0 0 0 0 0 1 1 90
>> (XEN) 0d 000 00 0 0 0 0 0 1 1 98
>> (XEN) 0e 000 00 0 0 0 0 0 1 1 A0
>> (XEN) 0f 000 00 0 0 0 0 0 1 1 A8
>> (XEN) 10 000 00 0 1 0 1 0 1 1 DB
>> (XEN) 11 000 00 1 0 0 0 0 0 0 00
>> (XEN) 12 000 00 0 1 0 1 0 1 1 2C
>> (XEN) 13 000 00 1 1 0 1 0 1 1 51
>> (XEN) 14 000 00 1 1 0 1 0 1 1 29
>> (XEN) 15 07A 0A 1 0 0 0 0 0 2 B4
>> (XEN) 16 000 00 0 1 0 1 0 1 1 BB
>> (XEN) 17 000 00 0 1 0 1 0 1 1 C2
>> (XEN) Using vector-based indexing
>> (XEN) IRQ to pin mappings:
>> (XEN) IRQ240 -> 0:2
>> (XEN) IRQ56 -> 0:1
>> (XEN) IRQ64 -> 0:3
>> (XEN) IRQ72 -> 0:4
>> (XEN) IRQ80 -> 0:5
>> (XEN) IRQ88 -> 0:6
>> (XEN) IRQ96 -> 0:7
>> (XEN) IRQ104 -> 0:8
>> (XEN) IRQ112 -> 0:9
>> (XEN) IRQ120 -> 0:10
>> (XEN) IRQ136 -> 0:11
>> (XEN) IRQ144 -> 0:12
>> (XEN) IRQ152 -> 0:13
>> (XEN) IRQ160 -> 0:14
>> (XEN) IRQ168 -> 0:15
>> (XEN) IRQ219 -> 0:16
>> (XEN) IRQ44 -> 0:18
>> (XEN) IRQ81 -> 0:19
>> (XEN) IRQ41 -> 0:20
>> (XEN) IRQ187 -> 0:22
>> (XEN) IRQ194 -> 0:23
>> (XEN) .................................... done.
>> (XEN)
>> (XEN) ****************************************
>> (XEN) Panic on CPU 1:
>> (XEN) CA-107844****************************************
>> (XEN)
>> (XEN) Reboot in five seconds...
>> (XEN) Executing crash image
>>
>>
>> Am 05.08.2013 16:51, schrieb Andrew Cooper:
>>> All of these crashes are coming out of mwait_idle, so the cpu in
>>> question has literally just been in an lower power state.
>>>
>>> I am wondering whether there is some caching issue where an update to
>>> the Pending EOI stack pointer got "lost", but this seems like a little
>>> too specific to be reasonably explained as a caching issue.
>>>
>>> A new debugging patch is on its way (Sorry - it has been a very busy few
>>> days)
>>>
>>> ~Andrew
>>>
>
Attachments: ca-107844-debug.patch (5.74 KB)


JBeulich at suse

Aug 12, 2013, 3:05 AM

Post #20 of 27 (16 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

>>> On 12.08.13 at 11:28, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
> On 12/08/13 09:20, Jan Beulich wrote:
>>>>> On 09.08.13 at 23:27, "Thimo E." <abc [at] digithi> wrote:
>>> (XEN) **Pending EOI error
>>> (XEN) irq 29, vector 0x24
>>> (XEN) s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR
> 00000000
>>> (XEN) All LAPIC state:
>>> (XEN) [vector] ISR TMR IRR
>>> (XEN) [1f:00] 00000000 00000000 00000000
>>> (XEN) [3f:20] 00000010 76efa12e 00000000
>>> (XEN) [5f:40] 00000000 e6f0f2fc 00000000
>>> (XEN) [7f:60] 00000000 32d096ca 00000000
>>> (XEN) [9f:80] 00000000 78fcf87a 00000000
>>> (XEN) [bf:a0] 00000000 f9b9fe4e 00000000
>>> (XEN) [df:c0] 00000000 ffdfe7ab 00000000
>>> (XEN) [ff:e0] 00000000 00000000 00000000
>>> (XEN) Peoi stack trace records:
>> Mind providing (a link to) the patch that was used here, so that
>> one can make sense of the printed information (and perhaps
>> also suggest adjustments to that debugging code)? Nothing I
>> was able to find on the list fully matches the output above...
>
> Attached

Thanks. Actually, the second case he sent has an interesting
difference:

(XEN) s[0] irq 29, vec 0x26, ready 0, ISR 00000001, TMR 00000000, IRR 00000001

i.e. we in fact have _three_ instance of the interrupt (two in-service,
and one request). I don't see an explanation for this other than
buggy hardware. Sadly we still don't know what device it is that is
behaving that way (including the confirmation that it's a non-
maskable MSI one).

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


andrew.cooper3 at citrix

Aug 12, 2013, 3:27 AM

Post #21 of 27 (16 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

On 12/08/13 11:05, Jan Beulich wrote:
>>>> On 12.08.13 at 11:28, Andrew Cooper <andrew.cooper3 [at] citrix> wrote:
>> On 12/08/13 09:20, Jan Beulich wrote:
>>>>>> On 09.08.13 at 23:27, "Thimo E." <abc [at] digithi> wrote:
>>>> (XEN) **Pending EOI error
>>>> (XEN) irq 29, vector 0x24
>>>> (XEN) s[0] irq 29, vec 0x24, ready 0, ISR 00000001, TMR 00000000, IRR
>> 00000000
>>>> (XEN) All LAPIC state:
>>>> (XEN) [vector] ISR TMR IRR
>>>> (XEN) [1f:00] 00000000 00000000 00000000
>>>> (XEN) [3f:20] 00000010 76efa12e 00000000
>>>> (XEN) [5f:40] 00000000 e6f0f2fc 00000000
>>>> (XEN) [7f:60] 00000000 32d096ca 00000000
>>>> (XEN) [9f:80] 00000000 78fcf87a 00000000
>>>> (XEN) [bf:a0] 00000000 f9b9fe4e 00000000
>>>> (XEN) [df:c0] 00000000 ffdfe7ab 00000000
>>>> (XEN) [ff:e0] 00000000 00000000 00000000
>>>> (XEN) Peoi stack trace records:
>>> Mind providing (a link to) the patch that was used here, so that
>>> one can make sense of the printed information (and perhaps
>>> also suggest adjustments to that debugging code)? Nothing I
>>> was able to find on the list fully matches the output above...
>> Attached
> Thanks. Actually, the second case he sent has an interesting
> difference:
>
> (XEN) s[0] irq 29, vec 0x26, ready 0, ISR 00000001, TMR 00000000, IRR 00000001
>
> i.e. we in fact have _three_ instance of the interrupt (two in-service,
> and one request). I don't see an explanation for this other than
> buggy hardware. Sadly we still don't know what device it is that is
> behaving that way (including the confirmation that it's a non-
> maskable MSI one).
>
> Jan
>

On the XenServer hardware where we have seen this issue, the problematic
interrupt was from:

00:19.0 Ethernet controller: Intel Corporation Ethernet Connection
I217-LM (rev 02)
Subsystem: Intel Corporation Device 0000
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort-
<MAbort- >SERR- <PERR- INTx-
Latency: 0
Interrupt: pin A routed to IRQ 1275
Region 0: Memory at c2700000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at c273e000 (32-bit, non-prefetchable) [size=4K]
Region 2: I/O ports at 7080 [size=32]
Capabilities: [c8] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee00318 Data: 0000
Capabilities: [e0] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: e1000e
Kernel modules: e1000e

I am still attempting to reproduce the issue, but we haven’t seen it
again since my email at the root of this thread.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel [at] lists
http://lists.xen.org/xen-devel


abc at digithi

Aug 12, 2013, 4:52 AM

Post #22 of 27 (16 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

Hello Yang,

attached you'll find the kernel dmesg, xen dmesg, lspci and output of
/proc/interrupts. If you want to see further logfiles, please let me know.

The processor is a Core i5-4670. The board is an Intel DH87MC
Mainboard. I am really not sure if it supports APICv, but VT-d is
supported enabled enabled.


> 4.The status of IRQ 29 is 10 which means the guest already issues the
> EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should
> be no pending EOI in the EOI stack. If possible, can you add some
> debug message in the guest EOI code path(like _irq_guest_eoi())) to
> track the EOI?
>
I don't see the IRQ29 in /proc/interrupts, what I see is:
cat xen-dmesg.txt |grep "29": (XEN) allocated vector 29 for irq 20
cat dmesg.txt | grep "eth0": [ 23.152355] e1000e 0000:00:19.0: PCI INT
A -> GSI 20 (level, low) -> IRQ 20
[ 23.330408] e1000e
0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection

So is the ethernet irq the bad one ? That is an Onboard Intel network
adapter.

> 6.I guess the interrupt remapping is enabled in your machine. Can you
> try to disable IR to see whether it still reproduceable?
>
Just to be sure, your proposal is to try the parameter "no-intremap" ?

Best regards
Thimo

Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
>
> Hi Thimo,
>
> From your previous experience and log, it shows:
>
> 1.The interrupt that triggers the issue is a MSI.
>
> 2.MSI are treated as edge-triggered interrupts nomally, except when
> there is no way to mask the device. In this case, your previous log
> indicates the device is unmaskable(What special device are you
> using?Modern PCI devcie should be maskable).
>
> 3.The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.
>
> 4.The status of IRQ 29 is 10 which means the guest already issues the
> EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should
> be no pending EOI in the EOI stack. If possible, can you add some
> debug message in the guest EOI code path(like _irq_guest_eoi())) to
> track the EOI?
>
> 5.Both of the log show when the issue occured, most of the other
> interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it
> a coincidence? Or it happened only on the special condition like heavy
> of IRQ migration?Perhaps you can disable irq balance in dom0 and pin
> the IRQ manually.
>
|6.I guess the interrupt remapping is enabled in your machine. Can you
try to disable IR to see whether it still reproduceable?
>
> Also, please provide the whole Xen log.
>
> Best regards,
>
> Yang
>
Attachments: dmesg.txt (37.1 KB)
  lspci.txt (1.54 KB)
  lspci-vv.txt (28.6 KB)
  proc_interrupts.txt (3.47 KB)
  xen-dmesg.txt (11.5 KB)


andrew.cooper3 at citrix

Aug 12, 2013, 5:04 AM

Post #23 of 27 (16 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

On 12/08/13 12:52, Thimo E wrote:
> Hello Yang,
>
> attached you'll find the kernel dmesg, xen dmesg, lspci and output of
> /proc/interrupts. If you want to see further logfiles, please let me know.
>
> The processor is a Core i5-4670. The board is an Intel DH87MC
> Mainboard. I am really not sure if it supports APICv, but VT-d is
> supported enabled enabled.
>
>
>> 4. The status of IRQ 29 is 10 which means the guest already
>> issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so
>> there should be no pending EOI in the EOI stack. If possible, can you
>> add some debug message in the guest EOI code path(like
>> _irq_guest_eoi())) to track the EOI?
>>
> I don't see the IRQ29 in /proc/interrupts, what I see is:
> cat xen-dmesg.txt |grep "29": (XEN) allocated vector 29 for irq 20
> cat dmesg.txt | grep "eth0": [ 23.152355] e1000e 0000:00:19.0: PCI
> INT A -> GSI 20 (level, low) -> IRQ 20
> [ 23.330408]
> e1000e 0000:00:19.0: eth0: Intel(R) PRO/1000 Network Connection
>
> So is the ethernet irq the bad one ? That is an Onboard Intel network
> adapter.

That would be consistent with the crash seen with our hardware in XenServer

>
>> 6. I guess the interrupt remapping is enabled in your machine.
>> Can you try to disable IR to see whether it still reproduceable?
>>
>>
>>
> Just to be sure, your proposal is to try the parameter "no-intremap" ?

specifically, iommu=no-intremap

>
> Best regards
> Thimo

~Andrew

>
> Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
>>
>> Hi Thimo,
>>
>> From your previous experience and log, it shows:
>>
>> 1. The interrupt that triggers the issue is a MSI.
>>
>> 2. MSI are treated as edge-triggered interrupts nomally, except
>> when there is no way to mask the device. In this case, your previous
>> log indicates the device is unmaskable(What special device are you
>> using?Modern PCI devcie should be maskable).
>>
>> 3. The IRQ 29 is belong to dom0, it seems it is not a HVM
>> related issue.
>>
>> 4. The status of IRQ 29 is 10 which means the guest already
>> issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so
>> there should be no pending EOI in the EOI stack. If possible, can you
>> add some debug message in the guest EOI code path(like
>> _irq_guest_eoi())) to track the EOI?
>>
>> 5. Both of the log show when the issue occured, most of the
>> other interrupts which owned by dom0 were in IRQ_MOVE_PENDING status.
>> Is it a coincidence? Or it happened only on the special condition
>> like heavy of IRQ migration?Perhaps you can disable irq balance in
>> dom0 and pin the IRQ manually.
>>
> |6. I guess the interrupt remapping is enabled in your machine.
> Can you try to disable IR to see whether it still reproduceable?
>>
>> Also, please provide the whole Xen log.
>>
>>
>>
>> Best regards,
>>
>> Yang
>>
>


abc at digithi

Aug 12, 2013, 6:54 AM

Post #24 of 27 (16 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

Hello Yang,

and attached the next crash dump which occured today, only some minutes
after I've created the logfiles I've sent in the mail just before.
Perhaps together with the logfiles of the former mail it gives you a
better understand of what is going on.

I've disabled Interrupt remapping now.

> 4.....
> can you add some debug message in the guest EOI code path(like
_irq_guest_eoi())) to track the EOI?
@Andrew: Is it possible for you to integrate the requested changes from
Yang into your Xen debugging version ?

Best regards
Thimo

Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
>
> Hi Thimo,
>
> From your previous experience and log, it shows:
>
> 1.The interrupt that triggers the issue is a MSI.
>
> 2.MSI are treated as edge-triggered interrupts nomally, except when
> there is no way to mask the device. In this case, your previous log
> indicates the device is unmaskable(What special device are you
> using?Modern PCI devcie should be maskable).
>
> 3.The IRQ 29 is belong to dom0, it seems it is not a HVM related issue.
>
> 4.The status of IRQ 29 is 10 which means the guest already issues the
> EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so there should
> be no pending EOI in the EOI stack. If possible, can you add some
> debug message in the guest EOI code path(like _irq_guest_eoi())) to
> track the EOI?
>
> 5.Both of the log show when the issue occured, most of the other
> interrupts which owned by dom0 were in IRQ_MOVE_PENDING status. Is it
> a coincidence? Or it happened only on the special condition like heavy
> of IRQ migration?Perhaps you can disable irq balance in dom0 and pin
> the IRQ manually.
>
|6.I guess the interrupt remapping is enabled in your machine. Can you
try to disable IR to see whether it still reproduceable?
>
> Also, please provide the whole Xen log.
>
> Best regards,
>
> Yang
>
Attachments: crash20130812.txt (10.8 KB)


andrew.cooper3 at citrix

Aug 12, 2013, 7:06 AM

Post #25 of 27 (16 views)
Permalink
Re: cpuidle and un-eoid interrupts at the local apic [In reply to]

On 12/08/13 14:54, Thimo E wrote:
> Hello Yang,
>
> and attached the next crash dump which occured today, only some
> minutes after I've created the logfiles I've sent in the mail just before.
> Perhaps together with the logfiles of the former mail it gives you a
> better understand of what is going on.
>
> I've disabled Interrupt remapping now.
>
> > 4.....
> > can you add some debug message in the guest EOI code path(like
> _irq_guest_eoi())) to track the EOI?
> @Andrew: Is it possible for you to integrate the requested changes
> from Yang into your Xen debugging version ?

I already have. That would be "Marked {foo} ready" debugging in the
PEOI stack section.

~Andrew

>
> Best regards
> Thimo
>
> Am 12.08.2013 10:49, schrieb Zhang, Yang Z:
>>
>> Hi Thimo,
>>
>> From your previous experience and log, it shows:
>>
>> 1. The interrupt that triggers the issue is a MSI.
>>
>> 2. MSI are treated as edge-triggered interrupts nomally, except
>> when there is no way to mask the device. In this case, your previous
>> log indicates the device is unmaskable(What special device are you
>> using?Modern PCI devcie should be maskable).
>>
>> 3. The IRQ 29 is belong to dom0, it seems it is not a HVM
>> related issue.
>>
>> 4. The status of IRQ 29 is 10 which means the guest already
>> issues the EOI because the bit IRQ_GUEST_EOI_PENDING is cleared, so
>> there should be no pending EOI in the EOI stack. If possible, can you
>> add some debug message in the guest EOI code path(like
>> _irq_guest_eoi())) to track the EOI?
>>
>> 5. Both of the log show when the issue occured, most of the
>> other interrupts which owned by dom0 were in IRQ_MOVE_PENDING status.
>> Is it a coincidence? Or it happened only on the special condition
>> like heavy of IRQ migration?Perhaps you can disable irq balance in
>> dom0 and pin the IRQ manually.
>>
> |6. I guess the interrupt remapping is enabled in your machine.
> Can you try to disable IR to see whether it still reproduceable?
>>
>> Also, please provide the whole Xen log.
>>
>>
>>
>> Best regards,
>>
>> Yang
>>
>

First page Previous page 1 2 Next page Last page  View All Xen devel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.