Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

Unhandled IRQs on AMD E-450

 

 

First page Previous page 1 2 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded


jeroen.vandenkeybus at gmail

Nov 29, 2011, 1:44 PM

Post #1 of 40 (6785 views)
Permalink
Unhandled IRQs on AMD E-450

On an Asus E45M1-M PRO (AMD E-450) board with 64-bit Linux 3.0.0
(Ubuntu) and 3.2.0, I regularly get (more detailed logs at the end):


Nov 28 04:35:29 zacate kernel: [29581.259926] irq 16: nobody cared
(try booting with the "irqpoll" option)
Nov 28 04:35:29 zacate kernel: [29581.259945] Pid: 0, comm: swapper
Tainted: P 3.0.0-13-generic #22-Ubuntu
...
Nov 28 04:35:29 zacate kernel: [29581.260171] handlers:
Nov 28 04:35:29 zacate kernel: [29581.260204] [<ffffffffa0085ee0>] irq_handler
Nov 28 04:35:29 zacate kernel: [29581.260216] [<ffffffffa048efe0>] azx_interrupt
Nov 28 04:35:29 zacate kernel: [29581.260223] Disabling IRQ #16


Nov 24 21:25:41 zacate kernel: [ 190.503838] irq 19: nobody cared
(try booting with the "irqpoll" option)
Nov 24 21:25:41 zacate kernel: [ 190.503856] Pid: 0, comm: swapper
Tainted: P 3.0.0-13-generic #22-Ubuntu
...
Nov 24 21:25:41 zacate kernel: [ 190.504052] handlers:
Nov 24 21:25:41 zacate kernel: [ 190.504085] [<ffffffffa0001f40>]
ahci_interrupt
Nov 24 21:25:41 zacate kernel: [ 190.504101] [<ffffffffa004e6c0>] e1000_intr
Nov 24 21:25:41 zacate kernel: [ 190.504108] Disabling IRQ #19


I also tried with an untainted 3.2.0-rc2 kernel, in which I also
disabled threadirqs:


Nov 24 20:50:41 zacate kernel: [ 57.366678] irq 19: nobody cared
(try booting with the "irqpoll" option)
Nov 24 20:50:41 zacate kernel: [ 57.366690] Pid: 0, comm: swapper
Not tainted 3.2.0-rc2 #5


The affected IRQ lines in /proc/interrupts:

16: 333 771 IO-APIC-fasteoi firewire_ohci, hda_intel
...
19: 39128 15165 IO-APIC-fasteoi ahci, eth1
40: 25641 59 PCI-MSI-edge eth0
41: 0 0 PCI-MSI-edge xhci_hcd
42: 0 0 PCI-MSI-edge xhci_hcd
43: 0 0 PCI-MSI-edge xhci_hcd
44: 2 404 PCI-MSI-edge hda_intel
45: 0 3 PCI-MSI-edge fglrx[0]@PCI:0:1:0


The dmesg lines directly pertaining to IRQ 16 and IRQ 19

[ 0.328032] pci 0000:00:15.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 0.328056] pci 0000:00:15.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 0.328077] pci 0000:00:15.2: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 0.328127] pci 0000:00:15.3: PCI INT A -> GSI 16 (level, low) -> IRQ 16
[ 2.671164] firewire_ohci 0000:05:02.0: PCI INT A -> GSI 16 (level,
low) -> IRQ 16
[ 5.074619] HDA Intel 0000:00:14.2: PCI INT A -> GSI 16 (level,
low) -> IRQ 16

[ 2.010643] ahci 0000:00:11.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 2.073026] xhci_hcd 0000:06:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 2.090881] xhci_hcd 0000:06:00.0: irq 19, io mem 0xfe900000
[ 2.091040] xhci_hcd 0000:06:00.0: irq 41 for MSI/MSI-X
[ 2.091050] xhci_hcd 0000:06:00.0: irq 42 for MSI/MSI-X
[ 2.091059] xhci_hcd 0000:06:00.0: irq 43 for MSI/MSI-X
[ 2.115098] e1000 0000:05:01.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 5.041614] HDA Intel 0000:00:01.1: PCI INT B -> GSI 19 (level,
low) -> IRQ 19
[ 5.041711] HDA Intel 0000:00:01.1: irq 44 for MSI/MSI-X


What I noted:

- The problem (IRQ lines 16 and 19 getting disabled) occurs fairly
often, but losing 19 occurs much more frequently than 16.
- The problem with IRQ19 goes away (at least sufficiently long not to
be occurring within 24h) when module e1000 is unloaded.
- The problem persists with and without forced IRQ threading.
- The problem persists with pci=nocsr.
- The problem persists with irqfixup.
- When IRQ19 dies, disk I/O access becomes very slow and unreliable.


What could be going wrong here ? I note that at least 3 devices
(00:15.x, 05:02.0 and 00:14.2) have their IRQ lines routed to IRQ 16,
but I see only 2 handlers in the dmesg log and /proc/interrupts. The
same applies to IRQ 19 (4 devices: 00:11.0, 06:00.0, 05:01.0 and
00:01.0). It is true that some of these ultimately seem to switch to
MSI (06:00.0 and 00:01.0), but so does the video card (00:01.0), which
does not route to any IRQ beforehand.

Before I try finding the problem, I would like to know what a
plausible failure mechanism is, so if anyone could give a hint on
where to start looking...

Thanks for your opinion (and please also CC to my PM address - not
(yet) subscribed to the LKML),


J.


More detailed logs:

Nov 28 04:35:29 zacate kernel: [29581.259926] irq 16: nobody cared
(try booting with the "irqpoll" option)
Nov 28 04:35:29 zacate kernel: [29581.259945] Pid: 0, comm: swapper
Tainted: P 3.0.0-13-generic #22-Ubuntu
Nov 28 04:35:29 zacate kernel: [29581.259952] Call Trace:
Nov 28 04:35:29 zacate kernel: [29581.259958] <IRQ>
[<ffffffff810cf96d>] __report_bad_irq+0x3d/0xe0
Nov 28 04:35:29 zacate kernel: [29581.259986] [<ffffffff810cfd95>]
note_interrupt+0x135/0x180
Nov 28 04:35:29 zacate kernel: [29581.259998] [<ffffffff810cdd89>]
handle_irq_event_percpu+0xa9/0x220
Nov 28 04:35:29 zacate kernel: [29581.260008] [<ffffffff810cdf4e>]
handle_irq_event+0x4e/0x80
Nov 28 04:35:29 zacate kernel: [29581.260019] [<ffffffff810d06c4>]
handle_fasteoi_irq+0x64/0xf0
Nov 28 04:35:29 zacate kernel: [29581.260029] [<ffffffff8100c252>]
handle_irq+0x22/0x40
Nov 28 04:35:29 zacate kernel: [29581.260040] [<ffffffff815f422a>]
do_IRQ+0x5a/0xe0
Nov 28 04:35:29 zacate kernel: [29581.260050] [<ffffffff815ea913>]
common_interrupt+0x13/0x13
Nov 28 04:35:29 zacate kernel: [29581.260056] <EOI>
[<ffffffff813725fb>] ? arch_local_irq_enable+0x8/0xd
Nov 28 04:35:29 zacate kernel: [29581.260079] [<ffffffff810887a5>] ?
sched_clock_idle_wakeup_event+0x15/0x20
Nov 28 04:35:29 zacate kernel: [29581.260089] [<ffffffff813730ed>]
acpi_idle_enter_simple+0xcc/0x102
Nov 28 04:35:29 zacate kernel: [29581.260100] [<ffffffff814ab5c2>]
cpuidle_idle_call+0xa2/0x1d0
Nov 28 04:35:29 zacate kernel: [29581.260112] [<ffffffff8100920b>]
cpu_idle+0xab/0x100
Nov 28 04:35:29 zacate kernel: [29581.260124] [<ffffffff815b858e>]
rest_init+0x72/0x74
Nov 28 04:35:29 zacate kernel: [29581.260134] [<ffffffff81ad0c2b>]
start_kernel+0x3d4/0x3df
Nov 28 04:35:29 zacate kernel: [29581.260144] [<ffffffff81ad0388>]
x86_64_start_reservations+0x132/0x136
Nov 28 04:35:29 zacate kernel: [29581.260156] [<ffffffff81ad0140>] ?
early_idt_handlers+0x140/0x140
Nov 28 04:35:29 zacate kernel: [29581.260165] [<ffffffff81ad0459>]
x86_64_start_kernel+0xcd/0xdc
Nov 28 04:35:29 zacate kernel: [29581.260171] handlers:
Nov 28 04:35:29 zacate kernel: [29581.260204] [<ffffffffa0085ee0>] irq_handler
Nov 28 04:35:29 zacate kernel: [29581.260216] [<ffffffffa048efe0>] azx_interrupt
Nov 28 04:35:29 zacate kernel: [29581.260223] Disabling IRQ #16


Nov 24 21:25:41 zacate kernel: [ 190.503838] irq 19: nobody cared
(try booting with the "irqpoll" option)
Nov 24 21:25:41 zacate kernel: [ 190.503856] Pid: 0, comm: swapper
Tainted: P 3.0.0-13-generic #22-Ubuntu
Nov 24 21:25:41 zacate kernel: [ 190.503864] Call Trace:
Nov 24 21:25:41 zacate kernel: [ 190.503870] <IRQ>
[<ffffffff810cf96d>] __report_bad_irq+0x3d/0xe0
Nov 24 21:25:41 zacate kernel: [ 190.503898] [<ffffffff810cfd95>]
note_interrupt+0x135/0x180
Nov 24 21:25:41 zacate kernel: [ 190.503909] [<ffffffff810cdd89>]
handle_irq_event_percpu+0xa9/0x220
Nov 24 21:25:41 zacate kernel: [ 190.503920] [<ffffffff810cdf4e>]
handle_irq_event+0x4e/0x80
Nov 24 21:25:41 zacate kernel: [ 190.503930] [<ffffffff810d06c4>]
handle_fasteoi_irq+0x64/0xf0
Nov 24 21:25:41 zacate kernel: [ 190.503940] [<ffffffff8100c252>]
handle_irq+0x22/0x40
Nov 24 21:25:41 zacate kernel: [ 190.503952] [<ffffffff815f422a>]
do_IRQ+0x5a/0xe0
Nov 24 21:25:41 zacate kernel: [ 190.503961] [<ffffffff815ea913>]
common_interrupt+0x13/0x13
Nov 24 21:25:41 zacate kernel: [ 190.503967] <EOI>
[<ffffffff81094482>] ? tick_nohz_stop_sched_tick+0x2a2/0x3f0
Nov 24 21:25:41 zacate kernel: [ 190.503992] [<ffffffff810091d5>]
cpu_idle+0x75/0x100
Nov 24 21:25:41 zacate kernel: [ 190.504004] [<ffffffff815b858e>]
rest_init+0x72/0x74
Nov 24 21:25:41 zacate kernel: [ 190.504014] [<ffffffff81ad0c2b>]
start_kernel+0x3d4/0x3df
Nov 24 21:25:41 zacate kernel: [ 190.504024] [<ffffffff81ad0388>]
x86_64_start_reservations+0x132/0x136
Nov 24 21:25:41 zacate kernel: [ 190.504036] [<ffffffff81ad0140>] ?
early_idt_handlers+0x140/0x140
Nov 24 21:25:41 zacate kernel: [ 190.504045] [<ffffffff81ad0459>]
x86_64_start_kernel+0xcd/0xdc
Nov 24 21:25:41 zacate kernel: [ 190.504052] handlers:
Nov 24 21:25:41 zacate kernel: [ 190.504085] [<ffffffffa0001f40>]
ahci_interrupt
Nov 24 21:25:41 zacate kernel: [ 190.504101] [<ffffffffa004e6c0>] e1000_intr
Nov 24 21:25:41 zacate kernel: [ 190.504108] Disabling IRQ #19


Nov 24 20:50:41 zacate kernel: [ 57.366678] irq 19: nobody cared
(try booting with the "irqpoll" option)
Nov 24 20:50:41 zacate kernel: [ 57.366690] Pid: 0, comm: swapper
Not tainted 3.2.0-rc2 #5
Nov 24 20:50:41 zacate kernel: [ 57.366694] Call Trace:
Nov 24 20:50:41 zacate kernel: [ 57.366697] <IRQ>
[<ffffffff810bb9cd>] __report_bad_irq+0x3d/0xe0
Nov 24 20:50:41 zacate kernel: [ 57.366715] [<ffffffff810bbe0d>]
note_interrupt+0x14d/0x210
Nov 24 20:50:41 zacate kernel: [ 57.366721] [<ffffffff810b98a4>]
handle_irq_event_percpu+0xc4/0x290
Nov 24 20:50:41 zacate kernel: [ 57.366728] [<ffffffff810b9ab8>]
handle_irq_event+0x48/0x70
Nov 24 20:50:41 zacate kernel: [ 57.366733] [<ffffffff810bc7fa>]
handle_fasteoi_irq+0x5a/0xe0
Nov 24 20:50:41 zacate kernel: [ 57.366740] [<ffffffff81004012>]
handle_irq+0x22/0x40
Nov 24 20:50:41 zacate kernel: [ 57.366747] [<ffffffff81506b6a>]
do_IRQ+0x5a/0xd0
Nov 24 20:50:41 zacate kernel: [ 57.366753] [<ffffffff814fe72b>]
common_interrupt+0x6b/0x6b
Nov 24 20:50:41 zacate kernel: [ 57.366756] <EOI>
[<ffffffff81009906>] ? native_sched_clock+0x26/0x70
Nov 24 20:50:41 zacate kernel: [ 57.366773] [<ffffffffa00cc0d3>] ?
acpi_idle_enter_simple+0xc5/0x102 [processor]
Nov 24 20:50:41 zacate kernel: [ 57.366781] [<ffffffffa00cc0ce>] ?
acpi_idle_enter_simple+0xc0/0x102 [processor]
Nov 24 20:50:41 zacate kernel: [ 57.366788] [<ffffffff814223a8>]
cpuidle_idle_call+0xb8/0x230
Nov 24 20:50:41 zacate kernel: [ 57.366795] [<ffffffff81001215>]
cpu_idle+0xc5/0x130
Nov 24 20:50:41 zacate kernel: [ 57.366802] [<ffffffff814e2360>]
rest_init+0x94/0xa4
Nov 24 20:50:41 zacate kernel: [ 57.366809] [<ffffffff81aafba4>]
start_kernel+0x3a7/0x3b4
Nov 24 20:50:41 zacate kernel: [ 57.366815] [<ffffffff81aaf322>]
x86_64_start_reservations+0x132/0x136
Nov 24 20:50:41 zacate kernel: [ 57.366821] [<ffffffff81aaf416>]
x86_64_start_kernel+0xf0/0xf7
Nov 24 20:50:41 zacate kernel: [ 57.366824] handlers:
Nov 24 20:50:41 zacate kernel: [ 57.366834] [<ffffffffa0043c10>]
ahci_interrupt
Nov 24 20:50:41 zacate kernel: [ 57.366843] [<ffffffffa006f4f0>] e1000_intr
Nov 24 20:50:41 zacate kernel: [ 57.366847] Disabling IRQ #19
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clemens at ladisch

Nov 30, 2011, 12:30 AM

Post #2 of 40 (6618 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

Jeroen Van den Keybus wrote:
> On an Asus E45M1-M PRO (AMD E-450) board with 64-bit Linux 3.0.0
> (Ubuntu) and 3.2.0, I regularly get (more detailed logs at the end):
>
>
> Nov 28 04:35:29 zacate kernel: [29581.259926] irq 16: nobody cared (try booting with the "irqpoll" option)
> Nov 24 21:25:41 zacate kernel: [ 190.503838] irq 19: nobody cared (try booting with the "irqpoll" option)

> What could be going wrong here ?

* Some buggy driver might not realize that an interrupt came from its
device.
* Some buggy device might raise an interrupt without telling the driver
that it needs attention.
* Some buggy device might raise a wrong interrupt. (This might include
devices that generate a PCI interrupt although they are configured for
MSI.)
* Some buggy interrupt controller might be doing 'interesting' things.

> - The problem with IRQ19 goes away (at least sufficiently long not to
> be occurring within 24h) when module e1000 is unloaded.

This looks like a bug in the e1000 hardware or software.

> I note that at least 3 devices
> (00:15.x, 05:02.0 and 00:14.2) have their IRQ lines routed to IRQ 16,
> but I see only 2 handlers in the dmesg log and /proc/interrupts.

lspci -s 0:15 -vv


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


bp at amd64

Nov 30, 2011, 7:44 AM

Post #3 of 40 (6626 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

+ Shane.

Shane, can you guys take a look at this, sounds like some unfortunate
sharing of AHCI and network IRQ numbers.

Thanks.

On Tue, Nov 29, 2011 at 10:44:42PM +0100, Jeroen Van den Keybus wrote:
> On an Asus E45M1-M PRO (AMD E-450) board with 64-bit Linux 3.0.0
> (Ubuntu) and 3.2.0, I regularly get (more detailed logs at the end):
>
>
> Nov 28 04:35:29 zacate kernel: [29581.259926] irq 16: nobody cared
> (try booting with the "irqpoll" option)
> Nov 28 04:35:29 zacate kernel: [29581.259945] Pid: 0, comm: swapper
> Tainted: P 3.0.0-13-generic #22-Ubuntu
> ...
> Nov 28 04:35:29 zacate kernel: [29581.260171] handlers:
> Nov 28 04:35:29 zacate kernel: [29581.260204] [<ffffffffa0085ee0>] irq_handler
> Nov 28 04:35:29 zacate kernel: [29581.260216] [<ffffffffa048efe0>] azx_interrupt
> Nov 28 04:35:29 zacate kernel: [29581.260223] Disabling IRQ #16
>
>
> Nov 24 21:25:41 zacate kernel: [ 190.503838] irq 19: nobody cared
> (try booting with the "irqpoll" option)
> Nov 24 21:25:41 zacate kernel: [ 190.503856] Pid: 0, comm: swapper
> Tainted: P 3.0.0-13-generic #22-Ubuntu
> ...
> Nov 24 21:25:41 zacate kernel: [ 190.504052] handlers:
> Nov 24 21:25:41 zacate kernel: [ 190.504085] [<ffffffffa0001f40>]
> ahci_interrupt
> Nov 24 21:25:41 zacate kernel: [ 190.504101] [<ffffffffa004e6c0>] e1000_intr
> Nov 24 21:25:41 zacate kernel: [ 190.504108] Disabling IRQ #19
>
>
> I also tried with an untainted 3.2.0-rc2 kernel, in which I also
> disabled threadirqs:
>
>
> Nov 24 20:50:41 zacate kernel: [ 57.366678] irq 19: nobody cared
> (try booting with the "irqpoll" option)
> Nov 24 20:50:41 zacate kernel: [ 57.366690] Pid: 0, comm: swapper
> Not tainted 3.2.0-rc2 #5
>
>
> The affected IRQ lines in /proc/interrupts:
>
> 16: 333 771 IO-APIC-fasteoi firewire_ohci, hda_intel
> ...
> 19: 39128 15165 IO-APIC-fasteoi ahci, eth1
> 40: 25641 59 PCI-MSI-edge eth0
> 41: 0 0 PCI-MSI-edge xhci_hcd
> 42: 0 0 PCI-MSI-edge xhci_hcd
> 43: 0 0 PCI-MSI-edge xhci_hcd
> 44: 2 404 PCI-MSI-edge hda_intel
> 45: 0 3 PCI-MSI-edge fglrx[0]@PCI:0:1:0
>
>
> The dmesg lines directly pertaining to IRQ 16 and IRQ 19
>
> [ 0.328032] pci 0000:00:15.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> [ 0.328056] pci 0000:00:15.1: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> [ 0.328077] pci 0000:00:15.2: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> [ 0.328127] pci 0000:00:15.3: PCI INT A -> GSI 16 (level, low) -> IRQ 16
> [ 2.671164] firewire_ohci 0000:05:02.0: PCI INT A -> GSI 16 (level,
> low) -> IRQ 16
> [ 5.074619] HDA Intel 0000:00:14.2: PCI INT A -> GSI 16 (level,
> low) -> IRQ 16
>
> [ 2.010643] ahci 0000:00:11.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
> [ 2.073026] xhci_hcd 0000:06:00.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
> [ 2.090881] xhci_hcd 0000:06:00.0: irq 19, io mem 0xfe900000
> [ 2.091040] xhci_hcd 0000:06:00.0: irq 41 for MSI/MSI-X
> [ 2.091050] xhci_hcd 0000:06:00.0: irq 42 for MSI/MSI-X
> [ 2.091059] xhci_hcd 0000:06:00.0: irq 43 for MSI/MSI-X
> [ 2.115098] e1000 0000:05:01.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
> [ 5.041614] HDA Intel 0000:00:01.1: PCI INT B -> GSI 19 (level,
> low) -> IRQ 19
> [ 5.041711] HDA Intel 0000:00:01.1: irq 44 for MSI/MSI-X
>
>
> What I noted:
>
> - The problem (IRQ lines 16 and 19 getting disabled) occurs fairly
> often, but losing 19 occurs much more frequently than 16.
> - The problem with IRQ19 goes away (at least sufficiently long not to
> be occurring within 24h) when module e1000 is unloaded.
> - The problem persists with and without forced IRQ threading.
> - The problem persists with pci=nocsr.
> - The problem persists with irqfixup.
> - When IRQ19 dies, disk I/O access becomes very slow and unreliable.
>
>
> What could be going wrong here ? I note that at least 3 devices
> (00:15.x, 05:02.0 and 00:14.2) have their IRQ lines routed to IRQ 16,
> but I see only 2 handlers in the dmesg log and /proc/interrupts. The
> same applies to IRQ 19 (4 devices: 00:11.0, 06:00.0, 05:01.0 and
> 00:01.0). It is true that some of these ultimately seem to switch to
> MSI (06:00.0 and 00:01.0), but so does the video card (00:01.0), which
> does not route to any IRQ beforehand.
>
> Before I try finding the problem, I would like to know what a
> plausible failure mechanism is, so if anyone could give a hint on
> where to start looking...
>
> Thanks for your opinion (and please also CC to my PM address - not
> (yet) subscribed to the LKML),
>
>
> J.
>
>
> More detailed logs:
>
> Nov 28 04:35:29 zacate kernel: [29581.259926] irq 16: nobody cared
> (try booting with the "irqpoll" option)
> Nov 28 04:35:29 zacate kernel: [29581.259945] Pid: 0, comm: swapper
> Tainted: P 3.0.0-13-generic #22-Ubuntu
> Nov 28 04:35:29 zacate kernel: [29581.259952] Call Trace:
> Nov 28 04:35:29 zacate kernel: [29581.259958] <IRQ>
> [<ffffffff810cf96d>] __report_bad_irq+0x3d/0xe0
> Nov 28 04:35:29 zacate kernel: [29581.259986] [<ffffffff810cfd95>]
> note_interrupt+0x135/0x180
> Nov 28 04:35:29 zacate kernel: [29581.259998] [<ffffffff810cdd89>]
> handle_irq_event_percpu+0xa9/0x220
> Nov 28 04:35:29 zacate kernel: [29581.260008] [<ffffffff810cdf4e>]
> handle_irq_event+0x4e/0x80
> Nov 28 04:35:29 zacate kernel: [29581.260019] [<ffffffff810d06c4>]
> handle_fasteoi_irq+0x64/0xf0
> Nov 28 04:35:29 zacate kernel: [29581.260029] [<ffffffff8100c252>]
> handle_irq+0x22/0x40
> Nov 28 04:35:29 zacate kernel: [29581.260040] [<ffffffff815f422a>]
> do_IRQ+0x5a/0xe0
> Nov 28 04:35:29 zacate kernel: [29581.260050] [<ffffffff815ea913>]
> common_interrupt+0x13/0x13
> Nov 28 04:35:29 zacate kernel: [29581.260056] <EOI>
> [<ffffffff813725fb>] ? arch_local_irq_enable+0x8/0xd
> Nov 28 04:35:29 zacate kernel: [29581.260079] [<ffffffff810887a5>] ?
> sched_clock_idle_wakeup_event+0x15/0x20
> Nov 28 04:35:29 zacate kernel: [29581.260089] [<ffffffff813730ed>]
> acpi_idle_enter_simple+0xcc/0x102
> Nov 28 04:35:29 zacate kernel: [29581.260100] [<ffffffff814ab5c2>]
> cpuidle_idle_call+0xa2/0x1d0
> Nov 28 04:35:29 zacate kernel: [29581.260112] [<ffffffff8100920b>]
> cpu_idle+0xab/0x100
> Nov 28 04:35:29 zacate kernel: [29581.260124] [<ffffffff815b858e>]
> rest_init+0x72/0x74
> Nov 28 04:35:29 zacate kernel: [29581.260134] [<ffffffff81ad0c2b>]
> start_kernel+0x3d4/0x3df
> Nov 28 04:35:29 zacate kernel: [29581.260144] [<ffffffff81ad0388>]
> x86_64_start_reservations+0x132/0x136
> Nov 28 04:35:29 zacate kernel: [29581.260156] [<ffffffff81ad0140>] ?
> early_idt_handlers+0x140/0x140
> Nov 28 04:35:29 zacate kernel: [29581.260165] [<ffffffff81ad0459>]
> x86_64_start_kernel+0xcd/0xdc
> Nov 28 04:35:29 zacate kernel: [29581.260171] handlers:
> Nov 28 04:35:29 zacate kernel: [29581.260204] [<ffffffffa0085ee0>] irq_handler
> Nov 28 04:35:29 zacate kernel: [29581.260216] [<ffffffffa048efe0>] azx_interrupt
> Nov 28 04:35:29 zacate kernel: [29581.260223] Disabling IRQ #16
>
>
> Nov 24 21:25:41 zacate kernel: [ 190.503838] irq 19: nobody cared
> (try booting with the "irqpoll" option)
> Nov 24 21:25:41 zacate kernel: [ 190.503856] Pid: 0, comm: swapper
> Tainted: P 3.0.0-13-generic #22-Ubuntu
> Nov 24 21:25:41 zacate kernel: [ 190.503864] Call Trace:
> Nov 24 21:25:41 zacate kernel: [ 190.503870] <IRQ>
> [<ffffffff810cf96d>] __report_bad_irq+0x3d/0xe0
> Nov 24 21:25:41 zacate kernel: [ 190.503898] [<ffffffff810cfd95>]
> note_interrupt+0x135/0x180
> Nov 24 21:25:41 zacate kernel: [ 190.503909] [<ffffffff810cdd89>]
> handle_irq_event_percpu+0xa9/0x220
> Nov 24 21:25:41 zacate kernel: [ 190.503920] [<ffffffff810cdf4e>]
> handle_irq_event+0x4e/0x80
> Nov 24 21:25:41 zacate kernel: [ 190.503930] [<ffffffff810d06c4>]
> handle_fasteoi_irq+0x64/0xf0
> Nov 24 21:25:41 zacate kernel: [ 190.503940] [<ffffffff8100c252>]
> handle_irq+0x22/0x40
> Nov 24 21:25:41 zacate kernel: [ 190.503952] [<ffffffff815f422a>]
> do_IRQ+0x5a/0xe0
> Nov 24 21:25:41 zacate kernel: [ 190.503961] [<ffffffff815ea913>]
> common_interrupt+0x13/0x13
> Nov 24 21:25:41 zacate kernel: [ 190.503967] <EOI>
> [<ffffffff81094482>] ? tick_nohz_stop_sched_tick+0x2a2/0x3f0
> Nov 24 21:25:41 zacate kernel: [ 190.503992] [<ffffffff810091d5>]
> cpu_idle+0x75/0x100
> Nov 24 21:25:41 zacate kernel: [ 190.504004] [<ffffffff815b858e>]
> rest_init+0x72/0x74
> Nov 24 21:25:41 zacate kernel: [ 190.504014] [<ffffffff81ad0c2b>]
> start_kernel+0x3d4/0x3df
> Nov 24 21:25:41 zacate kernel: [ 190.504024] [<ffffffff81ad0388>]
> x86_64_start_reservations+0x132/0x136
> Nov 24 21:25:41 zacate kernel: [ 190.504036] [<ffffffff81ad0140>] ?
> early_idt_handlers+0x140/0x140
> Nov 24 21:25:41 zacate kernel: [ 190.504045] [<ffffffff81ad0459>]
> x86_64_start_kernel+0xcd/0xdc
> Nov 24 21:25:41 zacate kernel: [ 190.504052] handlers:
> Nov 24 21:25:41 zacate kernel: [ 190.504085] [<ffffffffa0001f40>]
> ahci_interrupt
> Nov 24 21:25:41 zacate kernel: [ 190.504101] [<ffffffffa004e6c0>] e1000_intr
> Nov 24 21:25:41 zacate kernel: [ 190.504108] Disabling IRQ #19
>
>
> Nov 24 20:50:41 zacate kernel: [ 57.366678] irq 19: nobody cared
> (try booting with the "irqpoll" option)
> Nov 24 20:50:41 zacate kernel: [ 57.366690] Pid: 0, comm: swapper
> Not tainted 3.2.0-rc2 #5
> Nov 24 20:50:41 zacate kernel: [ 57.366694] Call Trace:
> Nov 24 20:50:41 zacate kernel: [ 57.366697] <IRQ>
> [<ffffffff810bb9cd>] __report_bad_irq+0x3d/0xe0
> Nov 24 20:50:41 zacate kernel: [ 57.366715] [<ffffffff810bbe0d>]
> note_interrupt+0x14d/0x210
> Nov 24 20:50:41 zacate kernel: [ 57.366721] [<ffffffff810b98a4>]
> handle_irq_event_percpu+0xc4/0x290
> Nov 24 20:50:41 zacate kernel: [ 57.366728] [<ffffffff810b9ab8>]
> handle_irq_event+0x48/0x70
> Nov 24 20:50:41 zacate kernel: [ 57.366733] [<ffffffff810bc7fa>]
> handle_fasteoi_irq+0x5a/0xe0
> Nov 24 20:50:41 zacate kernel: [ 57.366740] [<ffffffff81004012>]
> handle_irq+0x22/0x40
> Nov 24 20:50:41 zacate kernel: [ 57.366747] [<ffffffff81506b6a>]
> do_IRQ+0x5a/0xd0
> Nov 24 20:50:41 zacate kernel: [ 57.366753] [<ffffffff814fe72b>]
> common_interrupt+0x6b/0x6b
> Nov 24 20:50:41 zacate kernel: [ 57.366756] <EOI>
> [<ffffffff81009906>] ? native_sched_clock+0x26/0x70
> Nov 24 20:50:41 zacate kernel: [ 57.366773] [<ffffffffa00cc0d3>] ?
> acpi_idle_enter_simple+0xc5/0x102 [processor]
> Nov 24 20:50:41 zacate kernel: [ 57.366781] [<ffffffffa00cc0ce>] ?
> acpi_idle_enter_simple+0xc0/0x102 [processor]
> Nov 24 20:50:41 zacate kernel: [ 57.366788] [<ffffffff814223a8>]
> cpuidle_idle_call+0xb8/0x230
> Nov 24 20:50:41 zacate kernel: [ 57.366795] [<ffffffff81001215>]
> cpu_idle+0xc5/0x130
> Nov 24 20:50:41 zacate kernel: [ 57.366802] [<ffffffff814e2360>]
> rest_init+0x94/0xa4
> Nov 24 20:50:41 zacate kernel: [ 57.366809] [<ffffffff81aafba4>]
> start_kernel+0x3a7/0x3b4
> Nov 24 20:50:41 zacate kernel: [ 57.366815] [<ffffffff81aaf322>]
> x86_64_start_reservations+0x132/0x136
> Nov 24 20:50:41 zacate kernel: [ 57.366821] [<ffffffff81aaf416>]
> x86_64_start_kernel+0xf0/0xf7
> Nov 24 20:50:41 zacate kernel: [ 57.366824] handlers:
> Nov 24 20:50:41 zacate kernel: [ 57.366834] [<ffffffffa0043c10>]
> ahci_interrupt
> Nov 24 20:50:41 zacate kernel: [ 57.366843] [<ffffffffa006f4f0>] e1000_intr
> Nov 24 20:50:41 zacate kernel: [ 57.366847] Disabling IRQ #19
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo [at] vger
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


Shane.Huang at amd

Dec 1, 2011, 12:01 AM

Post #4 of 40 (6619 views)
Permalink
RE: Unhandled IRQs on AMD E-450 [In reply to]

Boris,

> Shane, can you guys take a look at this, sounds like some unfortunate
> sharing of AHCI and network IRQ numbers.

I'm adding Dong who might help on this.


Thanks,
Shane


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeroen.vandenkeybus at gmail

Dec 3, 2011, 12:36 PM

Post #5 of 40 (6619 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

I have tried the following kernel options:

- acpi=noirq acpi_irq_nobalance
- acpi=noirq
- acpi=irq_nobalance
- irqfixup
- pci=nomsi

all on the 3.2.0-rc2 kernel. I also threw out sound modules (based on
snd_hda_intel both on Realtek and the built-in HDMI interfaces. Both
sound interfaces shared the same driver and interrupts (IRQ16 and
IRQ19)).

But to no avail. Both IRQ19 and IRQ16 keep becoming lost after a while.

I suspect that the post to this list made by Alan Stern on 22-10
concerning an Asus E35M1-M PRO refers to the same problem.

I'm adding a full /proc/interrupts and lspci -vv output at the bottom,
all from the 3.0.0 Ubuntu kernel. Feel free to mention any bad guys
you recognize in this log. I could also add the dmesg log, but I fear
it is too big to be appropriate in the list. If wanted, let me know.

One point of interest, though: /proc/interrupts shows ERR: 1. Could
this be related ?

Is there any way of obtaining more output such as IO-APIC register
states to verify that it is indeed a stuck IRQ input line and not an
unsuccesful EOI ack ? I'm still a bit cautious about blaming a stuck
device IRQ as there would have to be already two of them misbehaving.


Rgds,

J.


$ cat /proc/interrupts (Note ERR:1)

CPU0 CPU1
0: 45 1 IO-APIC-edge timer
1: 1 1 IO-APIC-edge i8042
5: 0 0 IO-APIC-edge parport0
7: 1 0 IO-APIC-edge
8: 1 0 IO-APIC-edge rtc0
9: 0 0 IO-APIC-fasteoi acpi
12: 0 4 IO-APIC-edge i8042
16: 1 783 IO-APIC-fasteoi firewire_ohci, hda_intel
17: 3 112 IO-APIC-fasteoi ehci_hcd:usb1,
ehci_hcd:usb2, ehci_hcd:usb3
18: 0 4 IO-APIC-fasteoi ohci_hcd:usb4,
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
19: 700 7243 IO-APIC-fasteoi ahci
40: 269 54 PCI-MSI-edge eth0
41: 0 0 PCI-MSI-edge xhci_hcd
42: 0 0 PCI-MSI-edge xhci_hcd
43: 0 0 PCI-MSI-edge xhci_hcd
44: 1 400 PCI-MSI-edge hda_intel
45: 0 3 PCI-MSI-edge fglrx[0]@PCI:0:1:0
NMI: 0 0 Non-maskable interrupts
LOC: 8508 9855 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
IWI: 0 0 IRQ work interrupts
RES: 4350 2704 Rescheduling interrupts
CAL: 180 287 Function call interrupts
TLB: 413 296 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 1 1 Machine check polls
ERR: 1
MIS: 0


# lspci -vv

00:00.0 Host bridge: Advanced Micro Devices [AMD] Family 14h Processor
Root Complex
Subsystem: ASUSTeK Computer Inc. Device 84e7
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32

00:01.0 VGA compatible controller: ATI Technologies Inc Device 9806
(prog-if 00 [VGA controller])
Subsystem: ASUSTeK Computer Inc. Device 84e7
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 45
Region 0: Memory at c0000000 (32-bit, prefetchable) [size=256M]
Region 1: I/O ports at f000 [size=256]
Region 2: Memory at feb00000 (32-bit, non-prefetchable) [size=256K]
Expansion ROM at <unassigned> [disabled]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<4us, L1 unlimited
ExtTag+ RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend-
LnkCap: Port #0, Speed unknown, Width x0, ASPM
unknown, Latency L0 <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed unknown, Width x0, TrErr- Train-
SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee0300c Data: 4181
Capabilities: [100 v1] Vendor Specific Information: ID=0001
Rev=1 Len=010 <?>
Kernel driver in use: fglrx_pci
Kernel modules: fglrx, radeon

00:01.1 Audio device: ATI Technologies Inc Wrestler HDMI Audio [Radeon
HD 6250/6310]
Subsystem: ASUSTeK Computer Inc. Device 84e7
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 44
Region 0: Memory at feb44000 (32-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Root Complex Integrated Endpoint, MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<4us, L1 unlimited
ExtTag+ RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend-
LnkCap: Port #0, Speed unknown, Width x0, ASPM
unknown, Latency L0 <64ns, L1 <1us
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed unknown, Width x0, TrErr- Train-
SlotClk- DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB
Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee0100c Data: 4179
Capabilities: [100 v1] Vendor Specific Information: ID=0001
Rev=1 Len=010 <?>
Kernel driver in use: HDA Intel
Kernel modules: snd-hda-intel

00:11.0 SATA controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 SATA
Controller [AHCI mode] (rev 40) (prog-if 01 [AHCI 1.0])
Subsystem: ASUSTeK Computer Inc. Device 8496
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32
Interrupt: pin A routed to IRQ 19
Region 0: I/O ports at f140 [size=8]
Region 1: I/O ports at f130 [size=4]
Region 2: I/O ports at f120 [size=8]
Region 3: I/O ports at f110 [size=4]
Region 4: I/O ports at f100 [size=16]
Region 5: Memory at feb4f000 (32-bit, non-prefetchable) [size=1K]
Capabilities: [70] SATA HBA v1.0 InCfgSpace
Capabilities: [a4] PCI Advanced Features
AFCap: TP+ FLR+
AFCtrl: FLR-
AFStatus: TP-
Kernel driver in use: ahci
Kernel modules: ahci

00:12.0 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB
OHCI0 Controller (prog-if 10 [OHCI])
Subsystem: ASUSTeK Computer Inc. Device 8496
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at feb4e000 (32-bit, non-prefetchable) [size=4K]
Kernel driver in use: ohci_hcd

00:12.2 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB
EHCI Controller (prog-if 20 [EHCI])
Subsystem: ASUSTeK Computer Inc. Device 8496
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 17
Region 0: Memory at feb4d000 (32-bit, non-prefetchable) [size=256]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Bridge: PM- B3+
Capabilities: [e4] Debug port: BAR=1 offset=00e0
Kernel driver in use: ehci_hcd

00:13.0 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB
OHCI0 Controller (prog-if 10 [OHCI])
Subsystem: ASUSTeK Computer Inc. Device 8496
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at feb4c000 (32-bit, non-prefetchable) [size=4K]
Kernel driver in use: ohci_hcd

00:13.2 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB
EHCI Controller (prog-if 20 [EHCI])
Subsystem: ASUSTeK Computer Inc. Device 8496
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 17
Region 0: Memory at feb4b000 (32-bit, non-prefetchable) [size=256]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Bridge: PM- B3+
Capabilities: [e4] Debug port: BAR=1 offset=00e0
Kernel driver in use: ehci_hcd

00:14.0 SMBus: ATI Technologies Inc SBx00 SMBus Controller (rev 42)
Subsystem: ASUSTeK Computer Inc. Device 8496
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Kernel driver in use: piix4_smbus
Kernel modules: sp5100_tco, i2c-piix4

00:14.2 Audio device: ATI Technologies Inc SBx00 Azalia (Intel HDA) (rev 40)
Subsystem: ASUSTeK Computer Inc. Device 8445
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=slow >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at feb40000 (64-bit, non-prefetchable) [size=16K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: HDA Intel
Kernel modules: snd-hda-intel

00:14.3 ISA bridge: ATI Technologies Inc SB7x0/SB8x0/SB9x0 LPC host
controller (rev 40)
Subsystem: ASUSTeK Computer Inc. Device 8496
Control: I/O+ Mem+ BusMaster+ SpecCycle+ MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0

00:14.4 PCI bridge: ATI Technologies Inc SBx00 PCI to PCI Bridge (rev
40) (prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop+
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 64
Bus: primary=00, secondary=01, subordinate=01, sec-latency=64
Secondary status: 66MHz- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-

00:14.5 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB
OHCI2 Controller (prog-if 10 [OHCI])
Subsystem: ASUSTeK Computer Inc. Device 8496
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32, Cache Line Size: 64 bytes
Interrupt: pin C routed to IRQ 18
Region 0: Memory at feb4a000 (32-bit, non-prefetchable) [size=4K]
Kernel driver in use: ohci_hcd

00:15.0 PCI bridge: ATI Technologies Inc SB700/SB800/SB900 PCI to PCI
bridge (PCIE port 0) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=02, subordinate=02, sec-latency=0
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Root Port (Slot-), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<64ns, L1 <1us
ExtTag+ RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend-
LnkCap: Port #247, Speed 2.5GT/s, Width x1, ASPM L0s
L1, Latency L0 <64ns, L1 <1us
ClockPM- Surprise- LLActRep+ BwNot+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed unknown, Width x16, TrErr- Train-
SlotClk+ DLActive- BWMgmt- ABWMgmt-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ ARIFwd-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- ARIFwd-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -3.5dB
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [b0] Subsystem: ATI Technologies Inc Device 0000
Capabilities: [b8] HyperTransport: MSI Mapping Enable+ Fixed+
Capabilities: [100 v1] Vendor Specific Information: ID=0001
Rev=1 Len=010 <?>
Kernel driver in use: pcieport
Kernel modules: shpchp

00:15.1 PCI bridge: ATI Technologies Inc SB700/SB800/SB900 PCI to PCI
bridge (PCIE port 1) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
I/O behind bridge: 0000e000-0000efff
Prefetchable memory behind bridge: 00000000d0000000-00000000d00fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Root Port (Slot-), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<64ns, L1 <1us
ExtTag+ RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1,
Latency L0 <64ns, L1 <1us
ClockPM- Surprise- LLActRep+ BwNot+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ ARIFwd-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -3.5dB
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [b0] Subsystem: ATI Technologies Inc Device 0000
Capabilities: [b8] HyperTransport: MSI Mapping Enable+ Fixed+
Capabilities: [100 v1] Vendor Specific Information: ID=0001
Rev=1 Len=010 <?>
Kernel driver in use: pcieport
Kernel modules: shpchp

00:15.2 PCI bridge: ATI Technologies Inc SB900 PCI to PCI bridge (PCIE
port 2) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=04, subordinate=05, sec-latency=0
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fea00000-feafffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Root Port (Slot-), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<64ns, L1 <1us
ExtTag+ RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend-
LnkCap: Port #2, Speed 5GT/s, Width x1, ASPM L0s L1,
Latency L0 <64ns, L1 <1us
ClockPM- Surprise- LLActRep+ BwNot+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
SlotClk+ DLActive+ BWMgmt- ABWMgmt-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ ARIFwd-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -3.5dB
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [b0] Subsystem: ATI Technologies Inc Device 0000
Capabilities: [b8] HyperTransport: MSI Mapping Enable+ Fixed+
Capabilities: [100 v1] Vendor Specific Information: ID=0001
Rev=1 Len=010 <?>
Kernel driver in use: pcieport
Kernel modules: shpchp

00:15.3 PCI bridge: ATI Technologies Inc SB900 PCI to PCI bridge (PCIE
port 3) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=06, subordinate=06, sec-latency=0
Memory behind bridge: fe900000-fe9fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [50] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [58] Express (v2) Root Port (Slot-), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s
<64ns, L1 <1us
ExtTag+ RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend-
LnkCap: Port #3, Speed 5GT/s, Width x1, ASPM L0s L1,
Latency L0 <64ns, L1 <1us
ClockPM- Surprise- LLActRep+ BwNot+
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive+ BWMgmt+ ABWMgmt+
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal-
PMEIntEna- CRSVisible-
RootCap: CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range ABCD, TimeoutDis+ ARIFwd-
DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis- ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -3.5dB
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB
Capabilities: [a0] MSI: Enable- Count=1/1 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [b0] Subsystem: ATI Technologies Inc Device 0000
Capabilities: [b8] HyperTransport: MSI Mapping Enable+ Fixed+
Capabilities: [100 v1] Vendor Specific Information: ID=0001
Rev=1 Len=010 <?>
Kernel driver in use: pcieport
Kernel modules: shpchp

00:16.0 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB
OHCI0 Controller (prog-if 10 [OHCI])
Subsystem: ASUSTeK Computer Inc. Device 8496
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 18
Region 0: Memory at feb49000 (32-bit, non-prefetchable) [size=4K]
Kernel driver in use: ohci_hcd

00:16.2 USB Controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 USB
EHCI Controller (prog-if 20 [EHCI])
Subsystem: ASUSTeK Computer Inc. Device 8496
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32, Cache Line Size: 64 bytes
Interrupt: pin B routed to IRQ 17
Region 0: Memory at feb48000 (32-bit, non-prefetchable) [size=256]
Capabilities: [c0] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA
PME(D0+,D1+,D2+,D3hot+,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Bridge: PM- B3+
Capabilities: [e4] Debug port: BAR=1 offset=00e0
Kernel driver in use: ehci_hcd

00:18.0 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h
Processor Function 0 (rev 43)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.1 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h
Processor Function 1
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.2 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h
Processor Function 2
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.3 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h
Processor Function 3
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Capabilities: [f0] Secure device <?>
Kernel driver in use: k10temp
Kernel modules: k10temp

00:18.4 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h
Processor Function 4
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.5 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h
Processor Function 6
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.6 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h
Processor Function 5
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-

00:18.7 Host bridge: Advanced Micro Devices [AMD] Family 12h/14h
Processor Function 7
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap- 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-

03:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd.
RTL8111/8168B PCI Express Gigabit Ethernet controller (rev 06)
Subsystem: ASUSTeK Computer Inc. P8P67 and other motherboards
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 40
Region 0: I/O ports at e000 [size=256]
Region 2: Memory at d0004000 (64-bit, prefetchable) [size=4K]
Region 4: Memory at d0000000 (64-bit, prefetchable) [size=16K]
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=375mA
PME(D0+,D1+,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit+
Address: 00000000fee0100c Data: 4159
Capabilities: [70] Express (v2) Endpoint, MSI 01
DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s
<512ns, L1 <64us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+
AuxPwr+ TransPend-
LnkCap: Port #0, Speed 2.5GT/s, Width x1, ASPM L0s L1,
Latency L0 <512ns, L1 <64us
ClockPM+ Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train-
SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis+
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB
Capabilities: [b0] MSI-X: Enable- Count=4 Masked-
Vector table: BAR=4 offset=00000000
PBA: BAR=4 offset=00000800
Capabilities: [d0] Vital Product Data
Unknown small resource type 00, will not decode more.
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt-
UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt-
UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
CESta: RxErr+ BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
Capabilities: [140 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Capabilities: [160 v1] Device Serial Number 43-00-00-00-68-4c-e0-00
Kernel driver in use: r8169
Kernel modules: r8169

04:00.0 PCI bridge: ASMedia Technology Inc. Device 1080 (rev 01)
(prog-if 01 [Subtractive decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=04, secondary=05, subordinate=05, sec-latency=32
I/O behind bridge: 0000d000-0000dfff
Memory behind bridge: fea00000-feafffff
Secondary status: 66MHz+ FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort+ <SERR- <PERR+
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities: [c0] Subsystem: ASUSTeK Computer Inc. Device 8489

05:01.0 Ethernet controller: Intel Corporation 82541PI Gigabit
Ethernet Controller (rev 05)
Subsystem: Intel Corporation PRO/1000 GT Desktop Adapter
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Interrupt: pin A routed to IRQ 19
Region 0: Memory at fea40000 (32-bit, non-prefetchable) [size=128K]
Region 1: Memory at fea20000 (32-bit, non-prefetchable) [size=128K]
Region 2: I/O ports at d080 [size=64]
Expansion ROM at fea00000 [disabled] [size=128K]
Capabilities: [dc] Power Management version 2
Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA
PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=1 PME-
Capabilities: [e4] PCI-X non-bridge device
Command: DPERE- ERO+ RBC=512 OST=1
Status: Dev=00:00.0 64bit- 133MHz- SCD- USC- DC=simple
DMMRBC=2048 DMOST=1 DMCRS=8 RSCEM- 266MHz- 533MHz-
Kernel modules: e1000

05:02.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire
II(M)] IEEE 1394 OHCI Controller (rev c0) (prog-if 10 [OHCI])
Subsystem: ASUSTeK Computer Inc. M4A series motherboard
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 32 (8000ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fea60000 (32-bit, non-prefetchable) [size=2K]
Region 1: I/O ports at d000 [size=128]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2+ AuxCurrent=0mA
PME(D0-,D1-,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: firewire_ohci
Kernel modules: firewire-ohci

06:00.0 USB Controller: ASMedia Technology Inc. ASM1042 SuperSpeed USB
Host Controller (prog-if 30 [XHCI])
Subsystem: ASUSTeK Computer Inc. Device 8488
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 19
Region 0: Memory at fe900000 (64-bit, non-prefetchable) [size=32K]
Capabilities: [50] MSI: Enable- Count=1/8 Maskable- 64bit+
Address: 0000000000000000 Data: 0000
Capabilities: [68] MSI-X: Enable+ Count=8 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00002080
Capabilities: [78] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=55mA
PME(D0-,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [80] Express (v2) Legacy Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s
<64ns, L1 <2us
ExtTag- AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal-
Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
MaxPayload 128 bytes, MaxReadReq 512 bytes
DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq-
AuxPwr- TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x1, ASPM L0s L1,
Latency L0 unlimited, L1 unlimited
ClockPM- Surprise- LLActRep- BwNot-
LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- Retrain- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 5GT/s, Width x1, TrErr- Train- SlotClk+
DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Not Supported, TimeoutDis-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance-
SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range,
EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB
Capabilities: [100 v1] Virtual Channel
Caps: LPEVC=0 RefClk=100ns PATEntryBits=1
Arb: Fixed- WRR32- WRR64- WRR128-
Ctrl: ArbSelect=Fixed
Status: InProgress-
VC0: Caps: PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
Arb: Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
Ctrl: Enable+ ID=0 ArbSelect=Fixed TC/VC=01
Status: NegoPending- InProgress-
Kernel driver in use: xhci_hcd
Kernel modules: xhci-hcd


2011/12/1 Huang, Shane <Shane.Huang [at] amd>:
> Boris,
>
>> Shane, can you guys take a look at this, sounds like some unfortunate
>> sharing of AHCI and network IRQ numbers.
>
> I'm adding Dong who might help on this.
>
>
> Thanks,
> Shane
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clemens at ladisch

Dec 4, 2011, 4:48 AM

Post #6 of 40 (6597 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

Jeroen Van den Keybus wrote:
> [...]
> But to no avail. Both IRQ19 and IRQ16 keep becoming lost after a while.

You previously said that unloading e1000 made things better. Did this
affect both IRQs 16 and 19?

Can you check if this problem (on either 16 or 19) happens when you are
not using the e1000 port (i.e., unplugged)?

> I'm adding a full /proc/interrupts and lspci -vv output at the bottom,
> all from the 3.0.0 Ubuntu kernel. Feel free to mention any bad guys
> you recognize in this log.

The /proc/interrupts doesn't show e1000, but lspci does. ...?

Does the problem occur without fglrx?

To get the AHCI interrupt away from IRQ 19, try the patch below.
(But please don't show that ugly hack to any AMD guy. :)

> Is there any way of obtaining more output such as IO-APIC register
> states to verify that it is indeed a stuck IRQ input line and not an
> unsuccesful EOI ack ?

In theory, lspci's "Status: ... INTx+" shows an active interrupt line.


Regards,
Clemens


--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -2906,6 +2906,48 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65f8, quirk_intel_mc_errata);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65f9, quirk_intel_mc_errata);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x65fa, quirk_intel_mc_errata);

+#if defined(CONFIG_PCI_MSI) && \
+ (defined(CONFIG_SATA_AHCI) || defined(CONFIG_SATA_AHCI_MODULE))
+static void __init sb7x0_ahci_msi_enable(struct pci_dev *dev)
+{
+ u8 rev, ptr;
+ int where;
+ u32 misc_control;
+
+ pci_bus_read_config_byte(dev->bus, PCI_DEVFN(0x14, 0),
+ PCI_REVISION_ID, &rev);
+ if (rev < 0x3c) /* A14 */
+ return;
+
+ pci_read_config_byte(dev, 0x34, &ptr);
+ if (ptr == 0x70) {
+ where = 0x34;
+ } else {
+ pci_read_config_byte(dev, 0x61, &ptr);
+ if (ptr == 0x70)
+ where = 0x61;
+ else
+ return;
+ }
+
+ pci_read_config_byte(dev, 0x51, &ptr);
+ if (ptr != 0x70)
+ return;
+
+ pci_read_config_dword(dev, 0x40, &misc_control);
+ misc_control |= 1;
+ pci_write_config_dword(dev, 0x40, misc_control);
+
+ pci_write_config_byte(dev, where, 0x50);
+
+ misc_control &= ~1;
+ pci_write_config_dword(dev, 0x40, misc_control);
+
+ dev_dbg(&dev->dev, "AHCI: enabled MSI\n");
+}
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_ATI, 0x4391, sb7x0_ahci_msi_enable);
+#endif
+
static void pci_do_fixups(struct pci_dev *dev, struct pci_fixup *f,
struct pci_fixup *end)
{
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeroen.vandenkeybus at gmail

Dec 4, 2011, 5:36 AM

Post #7 of 40 (6613 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

> You previously said that unloading e1000 made things better.  Did this
> affect both IRQs 16 and 19?

No, this only affects IRQ 19. IRQ 16 usually dies within 15min..2hrs.

> Can you check if this problem (on either 16 or 19) happens when you are
> not using the e1000 port (i.e., unplugged)?

The problem occurs with the e1000 idle (unplugged) and under heavy
usage (plugged). Time to failure is also in the same order of
magnitude (i.e. 1..30 minutes). As of now, I never had IRQ 19 disabled
with the e1000 removed. The e1000 delivered with Ubuntu isn't
particularly recent (7.3.21-k8-NAPI). Before I suspected a kernel
problem, I already tried the 8.0.35 compiled from source obtained from
Intel. Exactly the same result: IRQ 19 gets banned.

> The /proc/interrupts doesn't show e1000, but lspci does.  ...?

You are right. I took that lspci after removing e1000, sorry for the
confusion. Please see the new /proc/interrupts:below.

> Does the problem occur without fglrx?

Good question. I'll try that immediately. Stand by.

> To get the AHCI interrupt away from IRQ 19, try the patch below.
> (But please don't show that ugly hack to any AMD guy. :)

I'll try that next too.

>> Is there any way of obtaining more output such as IO-APIC register
>> states to verify that it is indeed a stuck IRQ input line and not an
>> unsuccesful EOI ack ?

> In theory, lspci's "Status: ... INTx+" shows an active interrupt line.

Ok. In that case (taking the lspci from a failed system) no (listed)
device has INTx+.


Thanks,


J.


$ cat /proc/interrupts (with e1000 (eth1) still loaded - this dump is
after IRQ 19 is killed)

CPU0 CPU1
0: 45 26 IO-APIC-edge timer
1: 1 1 IO-APIC-edge i8042
5: 0 0 IO-APIC-edge parport0
7: 1 0 IO-APIC-edge
8: 1 0 IO-APIC-edge rtc0
9: 0 0 IO-APIC-fasteoi acpi
12: 1 3 IO-APIC-edge i8042
16: 121 559 IO-APIC-fasteoi firewire_ohci, hda_intel
17: 3 110 IO-APIC-fasteoi ehci_hcd:usb1,
ehci_hcd:usb2, ehci_hcd:usb3
18: 0 4 IO-APIC-fasteoi ohci_hcd:usb4,
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
19: 198169 11097 IO-APIC-fasteoi ahci, eth1
40: 3601 71 PCI-MSI-edge eth0
41: 0 0 PCI-MSI-edge xhci_hcd
42: 0 0 PCI-MSI-edge xhci_hcd
43: 0 0 PCI-MSI-edge xhci_hcd
44: 4 298 PCI-MSI-edge hda_intel
45: 0 3 PCI-MSI-edge fglrx[0]@PCI:0:1:0
NMI: 0 0 Non-maskable interrupts
LOC: 231521 231457 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
IWI: 0 0 IRQ work interrupts
RES: 37942 34198 Rescheduling interrupts
CAL: 256 225 Function call interrupts
TLB: 309 243 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 26 26 Machine check polls
ERR: 1
MIS: 0
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeroen.vandenkeybus at gmail

Dec 4, 2011, 5:54 AM

Post #8 of 40 (6602 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

>> Does the problem occur without fglrx?
>
> Good question. I'll try that immediately. Stand by.

I'm afraid it didn't matter.

dmesg log:

-- rmmod'ing e1000 in order not to get stuck while shutting down the X system
[ 42.990418] e1000 0000:05:01.0: PCI INT A disabled
-- Killed lightdm
[ 102.250141] [fglrx] IRQ 45 Disabled
[ 102.405031] HDMI hot plug event: Pin=3 Presence_Detect=1 ELD_Valid=1
[ 102.405063] HDMI status: Pin=3 Presence_Detect=1 ELD_Valid=1
-- rmmod'ed fglrx
[ 142.964281] pci 0000:00:01.0: PCI INT A disabled
[ 142.964323] [fglrx] module unloaded - fglrx 8.90.5 [Oct 12 2011]
-- modprobe'd e1000 again
[ 185.635457] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[ 185.635469] e1000: Copyright (c) 1999-2006 Intel Corporation.
[ 185.635612] e1000 0000:05:01.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[ 186.213243] e1000 0000:05:01.0: eth1: (PCI:33MHz:32-bit) 00:0e:0c:d9:6f:ca
[ 186.213263] e1000 0000:05:01.0: eth1: Intel(R) PRO/1000 Network Connection
[ 186.248807] ADDRCONF(NETDEV_UP): eth1: link is not ready
-- Lost IRQ 19
[ 354.446192] irq 19: nobody cared (try booting with the "irqpoll" option)
[ 354.446343] Pid: 0, comm: swapper Tainted: P
3.0.0-13-generic #22-Ubuntu
[ 354.446351] Call Trace:
[ 354.446357] <IRQ> [<ffffffff810cf96d>] __report_bad_irq+0x3d/0xe0
[ 354.446385] [<ffffffff810cfd95>] note_interrupt+0x135/0x180
[ 354.446396] [<ffffffff810cdd89>] handle_irq_event_percpu+0xa9/0x220
[ 354.446406] [<ffffffff810cdf4e>] handle_irq_event+0x4e/0x80
[ 354.446417] [<ffffffff810d06c4>] handle_fasteoi_irq+0x64/0xf0
[ 354.446427] [<ffffffff8100c252>] handle_irq+0x22/0x40
[ 354.446438] [<ffffffff815f422a>] do_IRQ+0x5a/0xe0
[ 354.446447] [<ffffffff815ea913>] common_interrupt+0x13/0x13
[ 354.446453] <EOI> [<ffffffff813725fb>] ? arch_local_irq_enable+0x8/0xd
[ 354.446476] [<ffffffff810887a5>] ? sched_clock_idle_wakeup_event+0x15/0x20
[ 354.446486] [<ffffffff813730ed>] acpi_idle_enter_simple+0xcc/0x102
[ 354.446497] [<ffffffff814ab5c2>] cpuidle_idle_call+0xa2/0x1d0
[ 354.446509] [<ffffffff8100920b>] cpu_idle+0xab/0x100
[ 354.446520] [<ffffffff815b858e>] rest_init+0x72/0x74
[ 354.446531] [<ffffffff81ad0c2b>] start_kernel+0x3d4/0x3df
[ 354.446540] [<ffffffff81ad0388>] x86_64_start_reservations+0x132/0x136
[ 354.446552] [<ffffffff81ad0140>] ? early_idt_handlers+0x140/0x140
[ 354.446561] [<ffffffff81ad0459>] x86_64_start_kernel+0xcd/0xdc
[ 354.446568] handlers:
[ 354.446642] [<ffffffffa0001f40>] ahci_interrupt
[ 354.446743] [<ffffffffa00496c0>] e1000_intr
[ 354.446830] Disabling IRQ #19

/proc/interrupts is consistent (IRQ45 now gone):

CPU0 CPU1
0: 45 3 IO-APIC-edge timer
1: 0 4 IO-APIC-edge i8042
5: 0 0 IO-APIC-edge parport0
7: 1 0 IO-APIC-edge
8: 1 0 IO-APIC-edge rtc0
9: 0 0 IO-APIC-fasteoi acpi
12: 0 6 IO-APIC-edge i8042
16: 11 559 IO-APIC-fasteoi firewire_ohci, hda_intel
17: 6 104 IO-APIC-fasteoi ehci_hcd:usb1,
ehci_hcd:usb2, ehci_hcd:usb3
18: 0 4 IO-APIC-fasteoi ohci_hcd:usb4,
ohci_hcd:usb5, ohci_hcd:usb6, ohci_hcd:usb7
19: 200703 10373 IO-APIC-fasteoi ahci, eth1
40: 1001 66 PCI-MSI-edge eth0
41: 0 0 PCI-MSI-edge xhci_hcd
42: 0 0 PCI-MSI-edge xhci_hcd
43: 0 0 PCI-MSI-edge xhci_hcd
44: 1 427 PCI-MSI-edge hda_intel
NMI: 0 0 Non-maskable interrupts
LOC: 12670 23434 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
IWI: 0 0 IRQ work interrupts
RES: 4824 3363 Rescheduling interrupts
CAL: 317 240 Function call interrupts
TLB: 388 264 TLB shootdowns
TRM: 0 0 Thermal event interrupts
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 3 3 Machine check polls
ERR: 1
MIS: 0


>> To get the AHCI interrupt away from IRQ 19, try the patch below.
>> (But please don't show that ugly hack to any AMD guy. :)
> I'll try that next too.

Moving on to the patch...

Rgds,


J.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeroen.vandenkeybus at gmail

Dec 4, 2011, 6:08 AM

Post #9 of 40 (6597 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

Clemens,

FYI,

> In theory, lspci's "Status: ... INTx+" shows an active interrupt line.

I have succeeded in catching a lspci on the SATA controller with INTx+
while IRQ 19 is disabled.


00:11.0 SATA controller: ATI Technologies Inc SB7x0/SB8x0/SB9x0 SATA
Controller [AHCI mode] (rev 40) (prog-if 01 [AHCI 1.0])
        Subsystem: ASUSTeK Computer Inc. Device 8496
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
        Latency: 32
        Interrupt: pin A routed to IRQ 19
        Region 0: I/O ports at f140 [size=8]
        Region 1: I/O ports at f130 [size=4]
        Region 2: I/O ports at f120 [size=8]
        Region 3: I/O ports at f110 [size=4]
        Region 4: I/O ports at f100 [size=16]
        Region 5: Memory at feb4f000 (32-bit, non-prefetchable) [size=1K]
        Capabilities: [70] SATA HBA v1.0 InCfgSpace
        Capabilities: [a4] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: ahci
        Kernel modules: ahci


The fact that the next lspci's showed INTx- shows that its pin is
definitely not stuck, does it not ? When I do e.g.

$ du -h -x --max-depth=1

in a second terminal, it gets the line nicely back to INTx+ due to
outstanding SATA commands. Cancelling the above du command (would
otherwise take ages to complete) results in INTx-.

Continuing with your patch...


J.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeroen.vandenkeybus at gmail

Dec 4, 2011, 7:06 AM

Post #10 of 40 (6589 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

I finished patching and testing kernel 3.2.0-rc2.

Also added #define DEBUG and running with cmdline "debug apic=debug"

The AHCI interrupt was successfully moved to IRQ 40. Good patchwork !

Up for only 10 min., IRQ 19 has not been revoked However, I already
lost IRQ 16 after 2 minutes. This kernel doesn't have support for any
audio, so there was only firewire_ohci on this line. However, lspci
for this device shows a firm INTx+. Cleared that by rmmod'ing and
modprobe'ing firewire_ohci again. After 20 min. IRQ 19 was lost again.

Now _I_ am lost. The only thing that IRQ 16 and IRQ 19 have in common
is that there are devices on them that do have an INTx line but do not
use it (MSI instead). However, I ran this kernel with pci=nomsi
(earlier post) and IRQs 16 and 19 went down as well. IRQs 17 and 18
were never revoked.

Yours truly puzzled,

J.



$ uname -r

3.2.0-rc2


$ cat /proc/interrupts

CPU0 CPU1
0: 44 2 IO-APIC-edge timer
1: 0 0 IO-APIC-edge i8042
7: 1 0 IO-APIC-edge
8: 0 1 IO-APIC-edge rtc0
9: 0 0 IO-APIC-fasteoi acpi
12: 0 2 IO-APIC-edge i8042
16: 15 99986 IO-APIC-fasteoi firewire_ohci
17: 0 110 IO-APIC-fasteoi ehci_hcd:usb6,
ehci_hcd:usb8, ehci_hcd:usb9
18: 3 37 IO-APIC-fasteoi ohci_hcd:usb3,
ohci_hcd:usb4, ohci_hcd:usb5, ohci_hcd:usb7
19: 156 21 IO-APIC-fasteoi eth1
40: 26986 6951 PCI-MSI-edge ahci
41: 1268 125 PCI-MSI-edge eth0
42: 0 0 PCI-MSI-edge xhci_hcd
43: 0 0 PCI-MSI-edge xhci_hcd
44: 0 0 PCI-MSI-edge xhci_hcd
NMI: 0 0 Non-maskable interrupts
LOC: 16629 27721 Local timer interrupts
SPU: 0 0 Spurious interrupts
PMI: 0 0 Performance monitoring interrupts
IWI: 0 0 IRQ work interrupts
RES: 5957 3060 Rescheduling interrupts
CAL: 138 174 Function call interrupts
TLB: 345 234 TLB shootdowns
THR: 0 0 Threshold APIC interrupts
MCE: 0 0 Machine check exceptions
MCP: 2 2 Machine check polls
ERR: 1
MIS: 0


$ sudo lspci -vv -s05:02.0

05:02.0 FireWire (IEEE 1394): VIA Technologies, Inc. VT6306/7/8 [Fire
II(M)] IEEE 1394 OHCI Controller (rev c0) (prog-if 10 [OHCI])
Subsystem: ASUSTeK Computer Inc. M4A series motherboard
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx+
Latency: 32 (8000ns max), Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 16
Region 0: Memory at fea60000 (32-bit, non-prefetchable) [size=2K]
Region 1: I/O ports at d000 [size=128]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1- D2+ AuxCurrent=0mA
PME(D0-,D1-,D2+,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: firewire_ohci
Kernel modules: firewire-ohci


dmesg log from the patch:

[ 0.279002] pci 0000:00:11.0: [1002:4391] type 0 class 0x000106
[ 0.279021] pci 0000:00:11.0: calling quirk_no_ata_d3+0x0/0x1b
[ 0.279030] pci 0000:00:11.0: calling quirk_mmio_always_on+0x0/0x17
[ 0.279053] pci 0000:00:11.0: reg 10: [io 0xf140-0xf147]
[ 0.279072] pci 0000:00:11.0: reg 14: [io 0xf130-0xf133]
[ 0.279091] pci 0000:00:11.0: reg 18: [io 0xf120-0xf127]
[ 0.279110] pci 0000:00:11.0: reg 1c: [io 0xf110-0xf113]
[ 0.279129] pci 0000:00:11.0: reg 20: [io 0xf100-0xf10f]
[ 0.279148] pci 0000:00:11.0: reg 24: [mem 0xfeb4f000-0xfeb4f3ff]
[ 0.279174] pci 0000:00:11.0: calling sb7x0_ahci_msi_enable+0x0/0x112
[ 0.279195] pci 0000:00:11.0: AHCI: enabled MSI
[ 0.279204] pci 0000:00:11.0: calling quirk_resource_alignment+0x0/0x16b


dmesg log from lost IRQ 16:

[ 104.618738] irq 16: nobody cared (try booting with the "irqpoll" option)
[ 104.618750] Pid: 0, comm: kworker/0:0 Not tainted 3.2.0-rc2 #7
[ 104.618754] Call Trace:
[ 104.618757] <IRQ> [<ffffffff810bb9cd>] __report_bad_irq+0x3d/0xe0
[ 104.618774] [<ffffffff810bbe0d>] note_interrupt+0x14d/0x210
[ 104.618780] [<ffffffff810b98a4>] handle_irq_event_percpu+0xc4/0x290
[ 104.618786] [<ffffffff810b9ab8>] handle_irq_event+0x48/0x70
[ 104.618792] [<ffffffff810bc7fa>] handle_fasteoi_irq+0x5a/0xe0
[ 104.618799] [<ffffffff81004012>] handle_irq+0x22/0x40
[ 104.618805] [<ffffffff81506baa>] do_IRQ+0x5a/0xd0
[ 104.618812] [<ffffffff814fe76b>] common_interrupt+0x6b/0x6b
[ 104.618815] <EOI> [<ffffffff81009906>] ? native_sched_clock+0x26/0x70
[ 104.618832] [<ffffffffa00c50d3>] ?
acpi_idle_enter_simple+0xc5/0x102 [processor]
[ 104.618840] [<ffffffffa00c50ce>] ?
acpi_idle_enter_simple+0xc0/0x102 [processor]
[ 104.618848] [<ffffffff814223b8>] cpuidle_idle_call+0xb8/0x230
[ 104.618855] [<ffffffff81001215>] cpu_idle+0xc5/0x130
[ 104.618861] [<ffffffff814efa56>] start_secondary+0x1ed/0x1f4
[ 104.618866] handlers:
[ 104.618874] [<ffffffffa00ad280>] irq_handler
[ 104.618878] Disabling IRQ #16


dmesg log from firewire_ohci reloading:

[ 1041.257730] firewire_ohci 0000:05:02.0: PCI INT A disabled
[ 1041.257737] firewire_ohci: Removed fw-ohci device.
[ 1062.915595] firewire_ohci 0000:05:02.0: PCI INT A -> GSI 16 (level,
low) -> IRQ 16
[ 1062.915610] firewire_ohci 0000:05:02.0: calling quirk_via_vlink+0x0/0xd0
[ 1062.980387] firewire_ohci: Added fw-ohci device 0000:05:02.0, OHCI
v1.10, 4 IR + 8 IT contexts, quirks 0x11
[ 1063.481956] firewire_core: created device fw0: GUID 001e8c0000509146, S400


dmesg log from lost IRQ 19:

[ 1205.490580] irq 19: nobody cared (try booting with the "irqpoll" option)
[ 1205.490592] Pid: 0, comm: swapper Not tainted 3.2.0-rc2 #7
[ 1205.490596] Call Trace:
[ 1205.490599] <IRQ> [<ffffffff810bb9cd>] __report_bad_irq+0x3d/0xe0
[ 1205.490616] [<ffffffff810bbe0d>] note_interrupt+0x14d/0x210
[ 1205.490623] [<ffffffff810b98a4>] handle_irq_event_percpu+0xc4/0x290
[ 1205.490629] [<ffffffff810b9ab8>] handle_irq_event+0x48/0x70
[ 1205.490635] [<ffffffff810bc7fa>] handle_fasteoi_irq+0x5a/0xe0
[ 1205.490642] [<ffffffff81004012>] handle_irq+0x22/0x40
[ 1205.490649] [<ffffffff81506baa>] do_IRQ+0x5a/0xd0
[ 1205.490655] [<ffffffff814fe76b>] common_interrupt+0x6b/0x6b
[ 1205.490658] <EOI> [<ffffffff81009906>] ? native_sched_clock+0x26/0x70
[ 1205.490677] [<ffffffffa00c50d3>] ?
acpi_idle_enter_simple+0xc5/0x102 [processor]
[ 1205.490684] [<ffffffffa00c50ce>] ?
acpi_idle_enter_simple+0xc0/0x102 [processor]
[ 1205.490692] [<ffffffff814223b8>] cpuidle_idle_call+0xb8/0x230
[ 1205.490699] [<ffffffff81001215>] cpu_idle+0xc5/0x130
[ 1205.490706] [<ffffffff814e2370>] rest_init+0x94/0xa4
[ 1205.490713] [<ffffffff81aafba4>] start_kernel+0x3a7/0x3b4
[ 1205.490719] [<ffffffff81aaf322>] x86_64_start_reservations+0x132/0x136
[ 1205.490725] [<ffffffff81aaf416>] x86_64_start_kernel+0xf0/0xf7
[ 1205.490729] handlers:
[ 1205.490736] [<ffffffffa01164f0>] e1000_intr
[ 1205.490740] Disabling IRQ #19
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clemens at ladisch

Dec 4, 2011, 8:59 AM

Post #11 of 40 (6603 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

Jeroen Van den Keybus wrote:
> The problem occurs with the e1000 idle (unplugged) and under heavy
> usage (plugged). Time to failure is also in the same order of
> magnitude (i.e. 1..30 minutes). As of now, I never had IRQ 19 disabled
> with the e1000 removed. The e1000 delivered with Ubuntu isn't
> particularly recent (7.3.21-k8-NAPI).

That version number doesn't mean much; there have been many changes to
the kernel driver since it was last updated.

One interesting patch is <http://git.kernel.org/linus/4c11b8adbc48>;
please check if you have it (the file was recently moved into
drivers/net/ethernet/intel/e1000/). But it's from January, your 3.2-rc*
should already have it.

> I have succeeded in catching a lspci on the SATA controller with INTx+
> while IRQ 19 is disabled. [...]
> The fact that the next lspci's showed INTx- shows that its pin is
> definitely not stuck, does it not ?

Indeed; that SATA controller appears to work fine.

> I already lost IRQ 16 after 2 minutes. This kernel doesn't have
> support for any audio, so there was only firewire_ohci on this line.
> However, lspci for this device shows a firm INTx+.

Your VT6308 is a widely-used chip, and there are no known interrupt-
related problems with it.

This PCI status register is part of the device itself, i.e., the
FireWire controller chip; there is nothing in the rest of the system,
hardware or software, that could affect this INTx value. This means
that the controller itself thinks that there is some FireWire-related
reason for the interrupts.

To instruct the firewire-ohci driver to log all interrupts and what the
device thinks the reason for them is, please run:

echo 4 > /sys/module/firewire_ohci/parameters/debug

As long as there is nothing connected, there should be nothing but
a timing interrupt every 64 seconds, like this:
firewire_ohci: IRQ 00200000 cycle64Seconds

> After 20 min. IRQ 19 was lost again.
>
> Now _I_ am lost. The only thing that IRQ 16 and IRQ 19 have in common
> is that there are devices on them that do have an INTx line but do not
> use it (MSI instead). However, I ran this kernel with pci=nomsi
> (earlier post) and IRQs 16 and 19 went down as well.

From the information available so far, it appears that you have two
similar but _independent_ problems with the e1000 and firewire devices.
(It might be possible that static electricity zapped both your PCI card
and the FireWire controller (which is directly near the first PCI slot),
or something like that.)


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeroen.vandenkeybus at gmail

Dec 5, 2011, 4:06 PM

Post #12 of 40 (6602 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

> As long as there is nothing connected, there should be nothing but
> a timing interrupt every 64 seconds, like this:
>  firewire_ohci: IRQ 00200000 cycle64Seconds


That is correct. I see those messages indeed. Until now, however, I
have not been able / lucky to witness another IRQ 16 banning. Still
running the test.

But...

I have also been looking into the e1000 driver. What I did was add
printk's on every invocation of the e1000_intr(). I used
printk_ratelimit(), as well as a local occurrence counter. There are
three places where I did the check and what I wrote to the log:

1. Right after determining that the Interrupt Cause Register is zero.
That means the interrupt was not meant for or caused by the e1000
(hardware failure let alone) ==> e1000: not ours
2. Right after determining that the ICR is set, but the driver is not
active. ==> e1000: ours, but down
3. At the end of e1000_intr. ==> e1000: ours.

The result:

[113757.420967] e1000: ours (240)
[113759.424936] e1000: ours (241)
[113761.428516] e1000: ours (242)
[113761.428528] e1000: not ours (0)
[113761.428536] e1000: not ours (1)
[113761.428543] e1000: not ours (2)
[113761.428551] e1000: not ours (3)
[113761.428558] e1000: not ours (4)
[113761.428566] e1000: not ours (5)
[113761.428579] e1000: not ours (6)
[113762.676114] irq 19: nobody cared (try booting with the "irqpoll" option)
[113762.676126] Pid: 0, comm: swapper Not tainted 3.2.0-rc2 #7
[113762.676130] Call Trace:
[113762.676133] <IRQ> [<ffffffff810bb9cd>] __report_bad_irq+0x3d/0xe0
[113762.676151] [<ffffffff810bbe0d>] note_interrupt+0x14d/0x210
[113762.676157] [<ffffffff810b98a4>] handle_irq_event_percpu+0xc4/0x290
[113762.676164] [<ffffffff810b9ab8>] handle_irq_event+0x48/0x70
[113762.676170] [<ffffffff810bc7fa>] handle_fasteoi_irq+0x5a/0xe0
[113762.676177] [<ffffffff81004012>] handle_irq+0x22/0x40
[113762.676183] [<ffffffff81506baa>] do_IRQ+0x5a/0xd0
[113762.676189] [<ffffffff814fe76b>] common_interrupt+0x6b/0x6b
[113762.676192] <EOI> [<ffffffff81009906>] ? native_sched_clock+0x26/0x70
[113762.676211] [<ffffffffa00c50d3>] ?
acpi_idle_enter_simple+0xc5/0x102 [processor]
[113762.676219] [<ffffffffa00c50ce>] ?
acpi_idle_enter_simple+0xc0/0x102 [processor]
[113762.676227] [<ffffffff814223b8>] cpuidle_idle_call+0xb8/0x230
[113762.676234] [<ffffffff81001215>] cpu_idle+0xc5/0x130
[113762.676241] [<ffffffff814e2370>] rest_init+0x94/0xa4
[113762.676248] [<ffffffff81aafba4>] start_kernel+0x3a7/0x3b4
[113762.676254] [<ffffffff81aaf322>] x86_64_start_reservations+0x132/0x136
[113762.676260] [<ffffffff81aaf416>] x86_64_start_kernel+0xf0/0xf7
[113762.676264] handlers:
[113762.676271] [<ffffffffa01164f0>] e1000_intr
[113762.676275] Disabling IRQ #19
[113768.766055] firewire_ohci: IRQ 00200000 cycle64Seconds
[113832.768181] firewire_ohci: IRQ 00200000 cycle64Seconds
[113896.770536] firewire_ohci: IRQ 00200000 cycle64Seconds
[113960.772976] firewire_ohci: IRQ 00200000 cycle64Seconds
[114024.775340] firewire_ohci: IRQ 00200000 cycle64Seconds
[114088.776662] firewire_ohci: IRQ 00200000 cycle64Seconds
[114152.778105] firewire_ohci: IRQ 00200000 cycle64Seconds
[114200.220155] e1000 0000:05:01.0: PCI INT A disabled
[114216.779703] firewire_ohci: IRQ 00200000 cycle64Seconds
[114265.335175] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[114265.335185] e1000: Copyright (c) 1999-2006 Intel Corporation.
[114265.335268] e1000 0000:05:01.0: PCI INT A -> GSI 19 (level, low) -> IRQ 19
[114265.931952] e1000 0000:05:01.0: eth1: (PCI:33MHz:32-bit) 00:0e:0c:d9:6f:ca
[114265.931977] e1000 0000:05:01.0: eth1: Intel(R) PRO/1000 Network Connection
[114265.947250] e1000_intr: 199750 callbacks suppressed
[114265.947257] e1000: ours (0)
[114265.948433] e1000: ours (1)
[114267.9:52645] e1000: ours (2)
[114269.956659] e1000: ours (3)
[114271.960528] e1000: ours (4)
[114273.964811] e1000: ours (5)

The e1000 chip raises the IRQ every 2 seconds. The e1000 driver sees
it ([...] e1000: ours) and, by reading the ICR, clears the IRQ line.

At ours (242) the interrupt arrives exactly at its expected time.
However, 8 microseconds later, e1000_intr() is invoked again. Now the
ICR is still empty, so e1000_intr() is returning IRQ_NONE. Then,
e1000_intr() is overwhelmed by interrupts that are apparently not
caused by the e1000 (and, by reading its ICR every time again, that
IRQ would have been cleared anyway). I suspect that the IRQ is simply
not properly acknowledged. (Only 6 occurrences of 'not ours' were
logged as a result of the use of printk_ratelimit(). After unloading
and loading the modified e1000.ko, ratelimit reports that nearly 200k
messages have been suppressed.)

I will now be checking this again on a fresh build (to ensure I
haven't forgotten to unpatch anything). I will also install a new
e1000 card although I doubt that it is defective.


J.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeroen.vandenkeybus at gmail

Dec 8, 2011, 3:33 AM

Post #13 of 40 (6577 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

> I will now be checking this again on a fresh build (to ensure I
> haven't forgotten to unpatch anything). I will also install a new
> e1000 card although I doubt that it is defective.

I made a fresh 3.2.0-rc2 build and the problems are still there (the
previous kernel still had IRQ_FORCED_THREADING disabled).

I both modified the IRQ handlers in the firewire-ohci and e1000 driver
to log, when they are called, whether they think the interrupt was
meant for them ('ours') or not ('not ours').

The results (for IRQ16 it took a rather long while to obtain) are listed below.

I have the impression that I see the same failure mechanism for both
IRQs. All goes well for a while, until an IRQ storm starts right
(e1000: 19 us, firewire-ohci: 39 us) after a valid IRQ.

Therefore there is a strong correlation between the arrival of the
spurious interrupt, alledgedly caused by a mystery device, and the
earlier arrival of a valid interrupt for a device. Combined with the
fact that it happens on 2 different IRQs pretty much rules out the
possibilty for me that there is either a mystery device at all, or
that the existing devices would both be defective, does it not ?

I also do not understand, if there would be a stuck IRQ line, why I
can unload and reload e1000 and firewire-ohci without immediately
getting the same IRQ storm.

Are there any tools suitable for tracing the handling of the last
valid interrupt ?

Thanks for any tips (and again for Clemens for providing a hack making
it possible for me to keep the disk IRQs out of the danger zone).


J.


dmesg logs for IRQ 16 and IRQ 19 getting banned.


...
[67962.892870] e1000: ours (271)
[67964.897018] e1000: ours (272)
[67966.900981] e1000: ours (273)
[67968.904908] e1000: ours (274)
[67970.908794] e1000: ours (275)
[67970.908813] e1000: not ours (0)
[67970.908825] e1000: not ours (1)
[67970.908835] e1000: not ours (2)
[67970.908845] e1000: not ours (3)
[67970.908855] e1000: not ours (4)
[67970.908865] e1000: not ours (5)
[67970.908877] e1000: not ours (6)
[67970.908887] e1000: not ours (7)
[67970.908895] e1000: not ours (8)
[67970.908907] e1000: not ours (9)
[67970.908917] e1000: not ours (10)
[67970.908927] e1000: not ours (11)
[67970.908936] e1000: not ours (12)
[67970.908945] e1000: not ours (13)
[67970.908954] e1000: not ours (14)
[67970.908964] e1000: not ours (15)
[67971.904010] e1000_intr: 152423 callbacks suppressed
[67971.904013] e1000: not ours (16)
[67971.904021] e1000: not ours (17)
[67971.904030] e1000: not ours (18)
[67971.904039] e1000: not ours (19)
[67971.904047] e1000: not ours (20)
[67971.904056] e1000: not ours (21)
[67971.904065] e1000: not ours (22)
[67971.904075] e1000: not ours (23)
[67971.904084] e1000: not ours (24)
[67971.904093] e1000: not ours (25)
[67971.904102] e1000: not ours (26)
[67971.904112] e1000: not ours (27)
[67971.904121] e1000: not ours (28)
[67971.904128] e1000: not ours (29)
[67971.904137] e1000: not ours (30)
[67971.904147] e1000: not ours (31)
[67971.904156] e1000: not ours (32)
[67971.904165] e1000: not ours (33)
[67971.904174] e1000: not ours (34)
[67971.904184] e1000: not ours (35)
[67972.210296] irq 19: nobody cared (try booting with the "irqpoll" option)
[67972.210305] Pid: 0, comm: swapper Not tainted 3.2.0-rc2 #2
[67972.210309] Call Trace:
[67972.210312] <IRQ> [<ffffffff810bbafd>] __report_bad_irq+0x3d/0xe0
[67972.210329] [<ffffffff810bbf3d>] note_interrupt+0x14d/0x210
[67972.210335] [<ffffffff810b98c4>] handle_irq_event_percpu+0xc4/0x290
[67972.210342] [<ffffffff810b9ad8>] handle_irq_event+0x48/0x70
[67972.210348] [<ffffffff810bc92a>] handle_fasteoi_irq+0x5a/0xe0
[67972.210354] [<ffffffff81004012>] handle_irq+0x22/0x40
[67972.210361] [<ffffffff81506caa>] do_IRQ+0x5a/0xd0
[67972.210367] [<ffffffff814fe86b>] common_interrupt+0x6b/0x6b
[67972.210370] <EOI> [<ffffffff81009906>] ? native_sched_clock+0x26/0x70
[67972.210387] [<ffffffffa00c50d3>] ?
acpi_idle_enter_simple+0xc5/0x102 [processor]
[67972.210395] [<ffffffffa00c50ce>] ?
acpi_idle_enter_simple+0xc0/0x102 [processor]
[67972.210403] [<ffffffff814224e8>] cpuidle_idle_call+0xb8/0x230
[67972.210409] [<ffffffff81001215>] cpu_idle+0xc5/0x130
[67972.210416] [<ffffffff814e24a0>] rest_init+0x94/0xa4
[67972.210423] [<ffffffff81aafba4>] start_kernel+0x3a7/0x3b4
[67972.210429] [<ffffffff81aaf322>] x86_64_start_reservations+0x132/0x136
[67972.210435] [<ffffffff81aaf416>] x86_64_start_kernel+0xf0/0xf7
[67972.210438] handlers:
[67972.210445] [<ffffffffa008e4f0>] e1000_intr
[67972.210449] Disabling IRQ #19
[67992.794771] irq_handler: 47265 callbacks suppressed
[67992.794783] firewire_ohci: ours (14)
[68056.795654] firewire_ohci: ours (15)
...

...
[158362.106314] firewire_ohci: ours (1426)
[158426.107131] firewire_ohci: ours (1427)
[158490.107972] firewire_ohci: ours (1428)
[158554.108857] firewire_ohci: ours (1429)
[158618.109671] firewire_ohci: ours (1430)
[158682.110521] firewire_ohci: ours (1431)
[158746.111369] firewire_ohci: ours (1432)
[158746.111408] firewire_ohci: not ours (0)
[158746.111421] firewire_ohci: not ours (1)
[158746.111432] firewire_ohci: not ours (2)
[158746.111444] firewire_ohci: not ours (3)
[158746.111461] firewire_ohci: not ours (4)
[158746.111473] firewire_ohci: not ours (5)
[158746.111484] firewire_ohci: not ours (6)
[158746.111495] firewire_ohci: not ours (7)
[158746.111502] firewire_ohci: not ours (8)
[158746.111510] firewire_ohci: not ours (9)
[158746.111518] firewire_ohci: not ours (10)
[158746.111526] firewire_ohci: not ours (11)
[158746.111534] firewire_ohci: not ours (12)
[158746.111542] firewire_ohci: not ours (13)
[158746.111550] firewire_ohci: not ours (14)
[158746.111558] firewire_ohci: not ours (15)
[158746.111565] firewire_ohci: not ours (16)
[158746.111573] firewire_ohci: not ours (17)
[158746.111581] firewire_ohci: not ours (18)
[158747.362748] irq 16: nobody cared (try booting with the "irqpoll" option)
[158747.362757] Pid: 0, comm: kworker/0:0 Not tainted 3.2.0-rc2 #2
[158747.362761] Call Trace:
[158747.362764] <IRQ> [<ffffffff810bbafd>] __report_bad_irq+0x3d/0xe0
[158747.362782] [<ffffffff810bbf3d>] note_interrupt+0x14d/0x210
[158747.362788] [<ffffffff810b98c4>] handle_irq_event_percpu+0xc4/0x290
[158747.362794] [<ffffffff810b9ad8>] handle_irq_event+0x48/0x70
[158747.362800] [<ffffffff810bc92a>] handle_fasteoi_irq+0x5a/0xe0
[158747.362807] [<ffffffff81004012>] handle_irq+0x22/0x40
[158747.362814] [<ffffffff81506caa>] do_IRQ+0x5a/0xd0
[158747.362820] [<ffffffff814fe86b>] common_interrupt+0x6b/0x6b
[158747.362823] <EOI> [<ffffffff81009906>] ? native_sched_clock+0x26/0x70
[158747.362840] [<ffffffffa00c50d3>] ?
acpi_idle_enter_simple+0xc5/0x102 [processor]
[158747.362848] [<ffffffffa00c50ce>] ?
acpi_idle_enter_simple+0xc0/0x102 [processor]
[158747.362856] [<ffffffff814224e8>] cpuidle_idle_call+0xb8/0x230
[158747.362862] [<ffffffff81001215>] cpu_idle+0xc5/0x130
[158747.362869] [<ffffffff814efb86>] start_secondary+0x1ed/0x1f4
[158747.362873] handlers:
[158747.362879] [<ffffffffa00b2100>] irq_handler
[158747.362883] Disabling IRQ #16
...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clemens at ladisch

Dec 8, 2011, 4:45 AM

Post #14 of 40 (6578 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

Jeroen Van den Keybus wrote:
> I have the impression that I see the same failure mechanism for both
> IRQs. All goes well for a while, until an IRQ storm starts right
> (e1000: 19 us, firewire-ohci: 39 us) after a valid IRQ.
>
> Therefore there is a strong correlation between the arrival of the
> spurious interrupt, alledgedly caused by a mystery device, and the
> earlier arrival of a valid interrupt for a device. Combined with the
> fact that it happens on 2 different IRQs pretty much rules out the
> possibilty for me that there is either a mystery device at all, or
> that the existing devices would both be defective, does it not ?

There appears to be a problem with the interrupt handling.

In PCI, interrupts are level-triggered, which means that the interrupt
line (INTx) is active when it's at level 0 and inactive when it's at
level 1. When a device wants to trigger an interrupt, it outputs zero
on its interrupt output. The level doesn't get reset to 1 until the
driver acknowledges the interrupt (in e1000, read of the ICR; in
firewire-ohci, write of IntEventClear). As long as the line stays at 0,
all interrupt handlers will continue being called. This mechanism
allows multiple devices to share one interrupt line.

In PCI Express, there are only one-to-one connections, and there are no
separate interrupt lines. A device raises an interrupt by sending
an interrupt message, which could be understood as a memory write to
a special address at the interrupt controller. Nothing needs to be done
to deactive the interrupt; if the device has another reason for
an interrupt, it just sends another interrupt message.

When a PCI device is connected to a PCI Express system, the old INTx
interrupt line must be converted to PCI Express messages. This is done
with _two_ special messages, Assert_INTx and Deassert_INTx. The first
tells the interrupt controller that some INTx line went from 1 to 0, the
second tells it that it went from 0 back to 1; this allows the interrupt
controller to implement the level-triggered behaviour.

It appears that some Deassert_INTx messages get lost on your system.
There are no indications of any other missing PCIe packets, so this
looks like a problem with the interrupt handling in your PCI/PCIe
bridge, the ASM1083 chip.

> I also do not understand, if there would be a stuck IRQ line, why I
> can unload and reload e1000 and firewire-ohci without immediately
> getting the same IRQ storm.

Linux will reenable the interrupt line when a new driver attaches to it.
At this point, it's still stuck, but the device initialization will
trigger some actual interrupts, and after the first assert/deassert
pair, the line will be unstuck.


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeroen.vandenkeybus at gmail

Dec 8, 2011, 1:27 PM

Post #15 of 40 (6597 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

Thanks for explaining the PCI to PCIe bridge architecture. Of course,
the ASM1083 can only be the cause if the Firewire controller is also
on that bus. Which I don't know.

> It appears that some Deassert_INTx messages get lost on your system.
> There are no indications of any other missing PCIe packets, so this
> looks like a problem with the interrupt handling in your PCI/PCIe
> bridge, the ASM1083 chip.

Assuming this is the case, I modified the e1000 driver to explicitly
set its IRQ line after 5 times having to send IRQ_NONE. (e1000_intr()
code at end of this post). The result of this test is that the IRQ
line indeed is set (in the next invocation, the ISR sees the forced
RXT0 interrupt, clears the IRQ line and sends IRQ_HANDLED). But alas,
the storm is not silenced at all.

If the ASM108x was the problem, I suspect that explicitly raising and
clearing the interrupt would have retriggered the INTx_Assert and
INTx_Deassert messages ? Meaning the bridge wouldn't be the problem.

@ Clemens: If I understand correctly, the IO-APIC is not even used in
this case ? (IRQ requests from e1000 all going through PCIe) Or is
there also a virtual IO-APIC monitoring Assert and Deassert messages.
Is the BIOS responsible for writing a mapping for the PCI IRQs to MSIs
into the ASM108x ? (And BTW, should the linux1394-devel still be
posted ?)

I'm thinking of immediately re-enabling the irqs after they've been
disabled in spurious.c.

I also think that the following posts may refer to the same problem:

http://ubuntuforums.org/showthread.php?t=1883854
https://lkml.org/lkml/2011/6/30/197
https://lkml.org/lkml/2011/10/14/146

Rgds,


J.


dmesg log:

[247181.656647] e1000: ours (60)
[247183.660996] e1000: ours (61)
[247185.664907] e1000: ours (62)
[247185.664926] e1000: not ours (0)
[247185.664937] e1000: not ours (1)
[247185.664948] e1000: not ours (2)
[247185.664958] e1000: not ours (3)
[247185.664968] e1000: not ours (4)
[247185.664982] e1000: sending RXT0 interrupt (mask=0x00000000)
[247185.664997] e1000: ours (63)
[247185.665009] e1000: not ours (0)
[247185.665024] e1000: not ours (1)
[247185.665034] e1000: not ours (2)
[247185.665041] e1000: not ours (3)
[247185.665053] e1000: not ours (4)
[247185.665065] e1000: sending RXT0 interrupt (mask=0x0000009d)
[247185.665077] e1000: ours (64)
[247185.665085] e1000: not ours (0)
[247185.665095] e1000: not ours (1)
[247185.665105] e1000: not ours (2)
[247186.319878] irq 19: nobody cared (try booting with the "irqpoll" option)
[247186.319887] Pid: 0, comm: swapper Not tainted 3.2.0-rc2 #2
[247186.319891] Call Trace:
[247186.319894] <IRQ> [<ffffffff810bbafd>] __report_bad_irq+0x3d/0xe0
[247186.319912] [<ffffffff810bbf3d>] note_interrupt+0x14d/0x210
[247186.319918] [<ffffffff810b98c4>] handle_irq_event_percpu+0xc4/0x290
[247186.319924] [<ffffffff810b9ad8>] handle_irq_event+0x48/0x70
[247186.319930] [<ffffffff810bc92a>] handle_fasteoi_irq+0x5a/0xe0
[247186.319937] [<ffffffff81004012>] handle_irq+0x22/0x40
[247186.319943] [<ffffffff81506caa>] do_IRQ+0x5a/0xd0
[247186.319950] [<ffffffff814fe86b>] common_interrupt+0x6b/0x6b
[247186.319953] <EOI> [<ffffffff81009906>] ? native_sched_clock+0x26/0x70
[247186.319970] [<ffffffffa00c50d3>] ?
acpi_idle_enter_simple+0xc5/0x102 [processor]
[247186.319978] [<ffffffffa00c50ce>] ?
acpi_idle_enter_simple+0xc0/0x102 [processor]
[247186.319986] [<ffffffff814224e8>] cpuidle_idle_call+0xb8/0x230
[247186.319992] [<ffffffff81001215>] cpu_idle+0xc5/0x130
[247186.319999] [<ffffffff814e24a0>] rest_init+0x94/0xa4
[247186.320006] [<ffffffff81aafba4>] start_kernel+0x3a7/0x3b4
[247186.320013] [<ffffffff81aaf322>] x86_64_start_reservations+0x132/0x136
[247186.320018] [<ffffffff81aaf416>] x86_64_start_kernel+0xf0/0xf7
[247186.320022] handlers:
[247186.320030] [<ffffffffa008e4f0>] e1000_intr
[247186.320034] Disabling IRQ #19


The modified e1000 interrupt handler:

static irqreturn_t e1000_intr(int irq, void *data)
{
struct net_device *netdev = data;
struct e1000_adapter *adapter = netdev_priv(netdev);
struct e1000_hw *hw = &adapter->hw;
u32 icr = er32(ICR);

static int i_not_ours = 0;

if (unlikely((!icr))) {
if (i_not_ours < 5) {
if (printk_ratelimit())
printk("e1000: not ours (%d)\n", i_not_ours++);
}
else {
if (printk_ratelimit())
printk("e1000: sending RXT0 interrupt
(mask=0x%08x)\n", er32(IMS));
ew32(ICS, E1000_ICS_RXT0);
}
return IRQ_NONE; /* Not our interrupt */
}

/*
* we might have caused the interrupt, but the above
* read cleared it, and just in case the driver is
* down there is nothing to do so return handled
*/
if (unlikely(test_bit(__E1000_DOWN, &adapter->flags))) {
static int i = 0;
if (printk_ratelimit())
printk("e1000: ours, but down (%d)\n", i++);
return IRQ_HANDLED;
}

if (unlikely(icr & (E1000_ICR_RXSEQ | E1000_ICR_LSC))) {
hw->get_link_status = 1;
/* guard against interrupt when we're going down */
if (!test_bit(__E1000_DOWN, &adapter->flags))
schedule_delayed_work(&adapter->watchdog_task, 1);
}

/* disable interrupts, without the synchronize_irq bit */
ew32(IMC, ~0);
E1000_WRITE_FLUSH();

if (likely(napi_schedule_prep(&adapter->napi))) {
adapter->total_tx_bytes = 0;
adapter->total_tx_packets = 0;
adapter->total_rx_bytes = 0;
adapter->total_rx_packets = 0;
__napi_schedule(&adapter->napi);
} else {
/* this really should not happen! if it does it is basically a
* bug, but not a hard error, so enable ints and continue */
if (!test_bit(__E1000_DOWN, &adapter->flags))
e1000_irq_enable(adapter);
}

{
static int i = 0;
if (printk_ratelimit())
printk("e1000: ours (%d)\n", i++);
i_not_ours = 0;
}

return IRQ_HANDLED;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clemens at ladisch

Dec 9, 2011, 12:22 AM

Post #16 of 40 (6597 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

Jeroen Van den Keybus wrote:
> the ASM1083 can only be the cause if the Firewire controller is also
> on that bus.

The VT6308 is PCI, and you have only one bus.

>> It appears that some Deassert_INTx messages get lost on your system.
>> There are no indications of any other missing PCIe packets, so this
>> looks like a problem with the interrupt handling in your PCI/PCIe
>> bridge, the ASM1083 chip.
>
> Assuming this is the case, I modified the e1000 driver to explicitly
> set its IRQ line after 5 times having to send IRQ_NONE. (e1000_intr()
> code at end of this post). The result of this test is that the IRQ
> line indeed is set (in the next invocation, the ISR sees the forced
> RXT0 interrupt, clears the IRQ line and sends IRQ_HANDLED). But alas,
> the storm is not silenced at all.
>
> If the ASM108x was the problem, I suspect that explicitly raising and
> clearing the interrupt would have retriggered the INTx_Assert and
> INTx_Deassert messages ?

Yes.

> Meaning the bridge wouldn't be the problem.

It's possible that
1) the ASM1083 does not react to changes of the PCI interrupt line, or
2) the interrupt controller ignores INTx_Deassert messages.

I'm wondering what the difference between triggering an interrupt and
reloading the driver is that makes it work again. I'd guess that
reattaching the driver reinitializes the interrupt, which would point
to 2).

> If I understand correctly, the IO-APIC is not even used in this case ?
> (IRQ requests from e1000 all going through PCIe) Or is there also
> a virtual IO-APIC monitoring Assert and Deassert messages.

All PCI interrupts (whether 'real' lines in hardware or emulated with
PCIe messages) end up at the I/O-APIC.

> Is the BIOS responsible for writing a mapping for the PCI IRQs to MSIs
> into the ASM108x ?

MSIs are edge-triggered; their message is different from the (de)assert
messages used for PCI level-triggered interrupts.

AFAIK the interrupt handling in a PCI/PCIe bridge should work
transparently, i.e., the bridge does not need to be configured by
software.

The BIOS is responsible for telling the kernel about all interrupt
mappings, and ACPI takes part in initializing the I/O-APIC.
Check if a newer BIOS exists.

> (And BTW, should the linux1394-devel still be posted ?)

Your trigger-interrupt patch would also be possible with firewire-ohci.

> I'm thinking of immediately re-enabling the irqs after they've been
> disabled in spurious.c.

You could try free_irq/request_irq, but I guess you cannot do this
directly from inside an interrupt handler.

> I also think that the following posts may refer to the same problem:
>
> http://ubuntuforums.org/showthread.php?t=1883854
> https://lkml.org/lkml/2011/6/30/197
> https://lkml.org/lkml/2011/10/14/146

That's similar symptoms, but completely different hardware.


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeroen.vandenkeybus at gmail

Dec 9, 2011, 3:17 AM

Post #17 of 40 (6574 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

> The VT6308 is PCI, and you have only one bus.

There's also a bridge on the FCH: bus 0 dev. 14 fn. 4, but I see that
the memory and IO regions of the e1000 and the VT630x are withing the
range of the ASM108x bridge. Thanks for pointing that out.

> It's possible that
> 1) the ASM1083 does not react to changes of the PCI interrupt line, or
> 2) the interrupt controller ignores INTx_Deassert messages.

I'm a bit puzzled. The IOAPIC operates on external IRQ lines. So that
would mean the LAPICs ?

> I'm wondering what the difference between triggering an interrupt and
> reloading the driver is that makes it work again.  I'd guess that
> reattaching the driver reinitializes the interrupt, which would point
> to 2).

I tried irq_disable(); irq_enable(); in spurious.c. That didn't change
anything. Storm continues. Also important: from my logs it appears
that when a driver is reloaded, there is indeed no storm at all. In my
log posted at Dec. 6, that is clearly visible.

> All PCI interrupts (whether 'real' lines in hardware or emulated with
> PCIe messages) end up at the I/O-APIC.

That would mean that the IO-APIC would decode MSI messages. I don't
think it can do that. Would it not be possible that the PCI bus IRQ
lines are directly connected to the FCH IO-APIC inputs (and that the
ASM1083 INTx lines are simply not connected ?

(Makes me wonder why Asus did not simply use the existing PCI bridge
on the FCH, which BTW also seems to depend on the use of the external
INTx lines.)

> Check if a newer BIOS exists.

Did that. No newer version is available.

> Your trigger-interrupt patch would also be possible with firewire-ohci.

If this would work, you'd have to patch the drivers of all PCI
devices. I'd much rather do it by modifying something in the IRQ
handling code.

>> I'm thinking of immediately re-enabling the irqs after they've been
>> disabled in spurious.c.

> You could try free_irq/request_irq, but I guess you cannot do this
> directly from inside an interrupt handler.

No, I did irq_disable/irq_enable, which should directly call the
mask/unmask methods of the chip handler. I still must check that,
though, and especially if the correct handler is used. But as I said
before, it didn't seem to do the trick.

>> I also think that the following posts may refer to the same problem:

<< > That's similar symptoms, but completely different hardware.
At first sight, yes, but they still share some of the problem areas. I
justed wanted to point out possibly similar cases. Never mind.

(
>> http://ubuntuforums.org/showthread.php?t=1883854
Asus board with ASM1083 (same bridge).

>> https://lkml.org/lkml/2011/6/30/197
>> https://lkml.org/lkml/2011/10/14/146
Has device 1b21:1080 (same bridge) in its Asus system.

https://lkml.org/lkml/2011/10/22/157
Has a E35M1-M board (essentially the same board as the E45M1-M with
the AMD E350 instead of the E450)
)


J.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clemens at ladisch

Dec 9, 2011, 4:55 AM

Post #18 of 40 (6578 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

Jeroen Van den Keybus wrote:
>> I'm wondering what the difference between triggering an interrupt and
>> reloading the driver is that makes it work again. I'd guess that
>> reattaching the driver reinitializes the interrupt, which would point
>> to 2).
>
> I tried irq_disable(); irq_enable(); in spurious.c. That didn't change
> anything. Storm continues. Also important: from my logs it appears
> that when a driver is reloaded, there is indeed no storm at all.

Temporarily disabling an irq is different from completely shutting it
down.

>> All PCI interrupts (whether 'real' lines in hardware or emulated with
>> PCIe messages) end up at the I/O-APIC.
>
> That would mean that the IO-APIC would decode MSI messages.

PCI interrupt messages (INTx_(de)assert) are special messages, while MSI
messages are just normal memory writes.

PCI interrupts (whether external or emulated) are always handled by the
I/O-APIC.

MSI interrupts usually go to some LAPIC (see the MSI address in the
lspci output; the I/O-APIC is at FEC00000, the LAPICs are at FEE00000).

> Would it not be possible that the PCI bus IRQ lines are directly
> connected to the FCH IO-APIC inputs (and that the ASM1083 INTx lines
> are simply not connected ?
>
> (Makes me wonder why Asus did not simply use the existing PCI bridge
> on the FCH, which BTW also seems to depend on the use of the external
> INTx lines.)

Device 14.4 would be the PCI bridge on AMD southbridges, but your model,
the A50M, does not have PCI support.

The interrupt lines are correctly connected to the ASM1083; otherwise,
they wouldn't work at all.

Also see "lspci -t".

>>> I also think that the following posts may refer to the same problem:
>>
>> That's similar symptoms, but completely different hardware.
>
> At first sight, yes, but they still share some of the problem areas. I
> justed wanted to point out possibly similar cases.

Indeed, I didn't realize they all have an ASM1083 bridge.

So it appears that this chip is just buggy.


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeroen.vandenkeybus at gmail

Dec 10, 2011, 4:10 AM

Post #19 of 40 (6564 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

> Temporarily disabling an irq is different from completely shutting it
> down.

Yes, but when unloading / reloading the driver, it seems that, at
least for the I/O-APIC, no more than that actually happens. (I've read
the original Intel I/O-APIC datasheet and the only way to clear a
pending IRQ is to do it by means of a message from the (local) APIC -
apart from some dirty trickery with modifying the IRQ type entry)

>>> All PCI interrupts (whether 'real' lines in hardware or emulated with
>>> PCIe messages) end up at the I/O-APIC.
>>
>> That would mean that the IO-APIC would decode MSI messages.

I was wrong there. Indeed the INTx Assert/Deassert messages are
entirely different and picked up by the I/O-APIC. The fact that,
without legacy PCI interrupts, IRQs 17 and 18 have never failed
indicates that the problem lies with the PCI/PCIe bridge. I'm thinking
of the following scenario:
- PCI device raises IRQ line.
- Bridge sees the transition and signals Assert.
- Assert travels through the PCIe fabric and arrives at the I/O-APIC.
- CPU services the IRQ, and does at least one (slow) PCI read to have
the device deassert its IRQ line. In practice, more PCI read/writes
are needed, requiring the bridge to do some PCIe traffic generation.
- Bridge sees the IRQ line trasition and signals Deassert, This
message has only a few usecs to arrive at the I/O-APIC.
- _However_ the CPU has by large already handled the IRQ and gets
interrupted again before the Deassert ever gets out. The resulting PCI
bus traffic further delays the Deassert message (due to e.g. PCIe
transmit credit exhaustion).

Another scenarion is an electrical problem such as insufficient margin
for high INTx signal detection. But I'll have to wire the setup to
test that.

My idea is that if we would not immediately hammer the bridge with
PCIe transactions, the Deassert message may eventually arrive ? Also,
is there any control by Linux of the credits issued ?

I therefore patched the polling system by detecting a stuck IRQ
already after 10 unserviced IRQs. Then the polling system will take
over for 50 cycles (5 seconds), after which the IRQ is reenabled. The
10 cycles may seem not too much, but usually there are no unserviced
IRQs at all, and I reenable the IRQ anyway after 5 seconds. And the
storms, if the IRQ is really stuck, are very small bursts of 10 IRQs,
causing no significant overhead. The alternative is a system unable to
run Linux.

It is not very elegant, but it gets the job done and allows the kernel
to recover from a single upset. Also, interrupt storms lasting over 1
second are avoided.


Results so far:


dmesg log:

...
[ 25.605552] init: plymouth-upstart-bridge main process (508) killed
by TERM signal
[ 26.641229] EXT4-fs (sda1): re-mounted. Opts: errors=remount-ro,commit=0
[ 1607.941232] irq 19: nobody cared (try booting with the "irqpoll" option)
[ 1607.941244] Pid: 0, comm: swapper Not tainted 3.2.0-rc4 #5
[ 1607.941248] Call Trace:
[ 1607.941252] <IRQ> [<ffffffff810bbe9d>] __report_bad_irq+0x3d/0xe0
[ 1607.941269] [<ffffffff810bc147>] note_interrupt+0x157/0x200
[ 1607.941276] [<ffffffff810b9a54>] handle_irq_event_percpu+0xc4/0x290
[ 1607.941282] [<ffffffff810b9c68>] handle_irq_event+0x48/0x70
[ 1607.941288] [<ffffffff810bcb1a>] handle_fasteoi_irq+0x5a/0xe0
[ 1607.941295] [<ffffffff81004012>] handle_irq+0x22/0x40
[ 1607.941301] [<ffffffff8150712a>] do_IRQ+0x5a/0xd0
[ 1607.941311] [<ffffffff814feceb>] common_interrupt+0x6b/0x6b
[ 1607.941314] <EOI> [<ffffffff81423b10>] ? ladder_select_state+0x180/0x180
[ 1607.941325] [<ffffffff81422904>] ? cpuidle_idle_call+0xf4/0x230
[ 1607.941331] [<ffffffff814228c8>] ? cpuidle_idle_call+0xb8/0x230
[ 1607.941338] [<ffffffff81001215>] cpu_idle+0xc5/0x130
[ 1607.941345] [<ffffffff814e2910>] rest_init+0x94/0xa4
[ 1607.941350] [<ffffffff81aafba4>] start_kernel+0x3a7/0x3b4
[ 1607.941357] [<ffffffff81aaf322>] x86_64_start_reservations+0x132/0x136
[ 1607.941363] [<ffffffff81aaf416>] x86_64_start_kernel+0xf0/0xf7
[ 1607.941367] handlers:
[ 1607.941378] [<ffffffffa006f4f0>] e1000_intr
[ 1607.941382] Disabling IRQ 19
[ 1608.040189] Polling IRQ.
[ 1608.140227] Polling IRQ.
...
[ 1612.840243] Polling IRQ.
[ 1612.940039] Polling IRQ.
[ 1613.040185] Reenabling IRQ.
[ 1908.541558] irq 19: nobody cared (try booting with the "irqpoll" option)
[ 1908.541570] Pid: 0, comm: swapper Not tainted 3.2.0-rc4 #5
[ 1908.541574] Call Trace:
[ 1908.541578] <IRQ> [<ffffffff810bbe9d>] __report_bad_irq+0x3d/0xe0
[ 1908.541595] [<ffffffff810bc147>] note_interrupt+0x157/0x200
[ 1908.541602] [<ffffffff810b9a54>] handle_irq_event_percpu+0xc4/0x290
[ 1908.541608] [<ffffffff810b9c68>] handle_irq_event+0x48/0x70
[ 1908.541614] [<ffffffff810bcb1a>] handle_fasteoi_irq+0x5a/0xe0
[ 1908.541620] [<ffffffff81004012>] handle_irq+0x22/0x40
[ 1908.541627] [<ffffffff8150712a>] do_IRQ+0x5a/0xd0
[ 1908.541633] [<ffffffff814feceb>] common_interrupt+0x6b/0x6b
[ 1908.541637] <EOI> [<ffffffff81423b34>] ? menu_reflect+0x24/0x50
[ 1908.541647] [<ffffffff810011da>] ? cpu_idle+0x8a/0x130
[ 1908.541652] [<ffffffff81001215>] ? cpu_idle+0xc5/0x130
[ 1908.541659] [<ffffffff814e2910>] rest_init+0x94/0xa4
[ 1908.541665] [<ffffffff81aafba4>] start_kernel+0x3a7/0x3b4
[ 1908.541672] [<ffffffff81aaf322>] x86_64_start_reservations+0x132/0x136
[ 1908.541678] [<ffffffff81aaf416>] x86_64_start_kernel+0xf0/0xf7
[ 1908.541681] handlers:
[ 1908.541695] [<ffffffffa006f4f0>] e1000_intr
[ 1908.541699] Disabling IRQ 19
[ 1908.640189] Polling IRQ.
[ 1908.740186] Polling IRQ.
...
[ 1913.440205] Polling IRQ.
[ 1913.540205] Polling IRQ.
[ 1913.640088] Reenabling IRQ.
[ 2319.361659] irq 19: nobody cared (try booting with the "irqpoll" option)
[ 2319.361671] Pid: 0, comm: swapper Not tainted 3.2.0-rc4 #5
[ 2319.361675] Call Trace:
[ 2319.361679] <IRQ> [<ffffffff810bbe9d>] __report_bad_irq+0x3d/0xe0
[ 2319.361696] [<ffffffff810bc147>] note_interrupt+0x157/0x200
[ 2319.361702] [<ffffffff810b9a54>] handle_irq_event_percpu+0xc4/0x290
[ 2319.361709] [<ffffffff810b9c68>] handle_irq_event+0x48/0x70
[ 2319.361715] [<ffffffff810bcb1a>] handle_fasteoi_irq+0x5a/0xe0
[ 2319.361721] [<ffffffff81004012>] handle_irq+0x22/0x40
[ 2319.361727] [<ffffffff8150712a>] do_IRQ+0x5a/0xd0
[ 2319.361734] [<ffffffff814feceb>] common_interrupt+0x6b/0x6b
[ 2319.361737] <EOI> [<ffffffff81009926>] ? native_sched_clock+0x26/0x70
[ 2319.361754] [<ffffffffa00cd0de>] ?
acpi_idle_enter_simple+0xd0/0x102 [processor]
[ 2319.361762] [<ffffffffa00cd0ce>] ?
acpi_idle_enter_simple+0xc0/0x102 [processor]
[ 2319.361769] [<ffffffff814228c8>] cpuidle_idle_call+0xb8/0x230
[ 2319.361776] [<ffffffff81001215>] cpu_idle+0xc5/0x130
[ 2319.361783] [<ffffffff814e2910>] rest_init+0x94/0xa4
[ 2319.361789] [<ffffffff81aafba4>] start_kernel+0x3a7/0x3b4
[ 2319.361796] [<ffffffff81aaf322>] x86_64_start_reservations+0x132/0x136
[ 2319.361802] [<ffffffff81aaf416>] x86_64_start_kernel+0xf0/0xf7
[ 2319.361806] handlers:
[ 2319.361816] [<ffffffffa006f4f0>] e1000_intr
[ 2319.361820] Disabling IRQ 19
[ 2319.460205] Polling IRQ.
[ 2319.560207] Polling IRQ.
...
[ 2324.260030] Polling IRQ.
[ 2324.360118] Polling IRQ.
[ 2324.460064] Reenabling IRQ.
[ 2782.285470] irq 19: nobody cared (try booting with the "irqpoll" option)
[ 2782.285482] Pid: 0, comm: swapper Not tainted 3.2.0-rc4 #5
[ 2782.285486] Call Trace:
[ 2782.285490] <IRQ> [<ffffffff810bbe9d>] __report_bad_irq+0x3d/0xe0
[ 2782.285507] [<ffffffff810bc147>] note_interrupt+0x157/0x200
[ 2782.285514] [<ffffffff810b9a54>] handle_irq_event_percpu+0xc4/0x290
[ 2782.285520] [<ffffffff810b9c68>] handle_irq_event+0x48/0x70
[ 2782.285526] [<ffffffff810bcb1a>] handle_fasteoi_irq+0x5a/0xe0
[ 2782.285532] [<ffffffff81004012>] handle_irq+0x22/0x40
[ 2782.285539] [<ffffffff8150712a>] do_IRQ+0x5a/0xd0
[ 2782.285545] [<ffffffff814feceb>] common_interrupt+0x6b/0x6b
[ 2782.285548] <EOI> [<ffffffff81009926>] ? native_sched_clock+0x26/0x70
[ 2782.285566] [<ffffffffa00cd0d3>] ?
acpi_idle_enter_simple+0xc5/0x102 [processor]
[ 2782.285574] [<ffffffffa00cd0ce>] ?
acpi_idle_enter_simple+0xc0/0x102 [processor]
[ 2782.285581] [<ffffffff814228c8>] cpuidle_idle_call+0xb8/0x230
[ 2782.285588] [<ffffffff81001215>] cpu_idle+0xc5/0x130
[ 2782.285594] [<ffffffff814e2910>] rest_init+0x94/0xa4
[ 2782.285600] [<ffffffff81aafba4>] start_kernel+0x3a7/0x3b4
[ 2782.285607] [<ffffffff81aaf322>] x86_64_start_reservations+0x132/0x136
[ 2782.285613] [<ffffffff81aaf416>] x86_64_start_kernel+0xf0/0xf7
[ 2782.285617] handlers:
[ 2782.285627] [<ffffffffa006f4f0>] e1000_intr
[ 2782.285631] Disabling IRQ 19
[ 2782.384226] Polling IRQ.
[ 2782.484041] Polling IRQ.
...
[ 2787.184224] Polling IRQ.
[ 2787.284223] Polling IRQ.
[ 2787.384222] Reenabling IRQ.
[ 3485.689347] irq 19: nobody cared (try booting with the "irqpoll" option)
[ 3485.689360] Pid: 0, comm: swapper Not tainted 3.2.0-rc4 #5
[ 3485.689364] Call Trace:
[ 3485.689368] <IRQ> [<ffffffff810bbe9d>] __report_bad_irq+0x3d/0xe0
[ 3485.689385] [<ffffffff810bc147>] note_interrupt+0x157/0x200
[ 3485.689392] [<ffffffff810b9a54>] handle_irq_event_percpu+0xc4/0x290
[ 3485.689398] [<ffffffff810b9c68>] handle_irq_event+0x48/0x70
[ 3485.689404] [<ffffffff810bcb1a>] handle_fasteoi_irq+0x5a/0xe0
[ 3485.689411] [<ffffffff81004012>] handle_irq+0x22/0x40
[ 3485.689417] [<ffffffff8150712a>] do_IRQ+0x5a/0xd0
[ 3485.689424] [<ffffffff814feceb>] common_interrupt+0x6b/0x6b
[ 3485.689427] <EOI> [<ffffffff81009926>] ? native_sched_clock+0x26/0x70
[ 3485.689444] [<ffffffffa00cd0d3>] ?
acpi_idle_enter_simple+0xc5/0x102 [processor]
[ 3485.689452] [<ffffffffa00cd0ce>] ?
acpi_idle_enter_simple+0xc0/0x102 [processor]
[ 3485.689459] [<ffffffff814228c8>] cpuidle_idle_call+0xb8/0x230
[ 3485.689466] [<ffffffff81001215>] cpu_idle+0xc5/0x130
[ 3485.689472] [<ffffffff814e2910>] rest_init+0x94/0xa4
[ 3485.689478] [<ffffffff81aafba4>] start_kernel+0x3a7/0x3b4
[ 3485.689485] [<ffffffff81aaf322>] x86_64_start_reservations+0x132/0x136
[ 3485.689491] [<ffffffff81aaf416>] x86_64_start_kernel+0xf0/0xf7
[ 3485.689495] handlers:
[ 3485.689505] [<ffffffffa006f4f0>] e1000_intr
[ 3485.689509] Disabling IRQ 19
[ 3485.788062] Polling IRQ.
[ 3485.888240] Polling IRQ.
...
[ 3490.588069] Polling IRQ.
[ 3490.688209] Polling IRQ.
[ 3490.788079] Reenabling IRQ.
[ 3810.336883] irq 19: nobody cared (try booting with the "irqpoll" option)
[ 3810.336896] Pid: 1764, comm: sshd Not tainted 3.2.0-rc4 #5
[ 3810.336900] Call Trace:
[ 3810.336904] <IRQ> [<ffffffff810bbe9d>] __report_bad_irq+0x3d/0xe0
[ 3810.336921] [<ffffffff810bc147>] note_interrupt+0x157/0x200
[ 3810.336927] [<ffffffff810b9a54>] handle_irq_event_percpu+0xc4/0x290
[ 3810.336935] [<ffffffff810500e5>] ? __local_bh_enable+0x35/0x90
[ 3810.336941] [<ffffffff810b9c68>] handle_irq_event+0x48/0x70
[ 3810.336947] [<ffffffff810bcb1a>] handle_fasteoi_irq+0x5a/0xe0
[ 3810.336954] [<ffffffff81004012>] handle_irq+0x22/0x40
[ 3810.336960] [<ffffffff8150712a>] do_IRQ+0x5a/0xd0
[ 3810.336966] [<ffffffff814feceb>] common_interrupt+0x6b/0x6b
[ 3810.336969] <EOI> [<ffffffff810310a3>] ? __wake_up+0x53/0x70
[ 3810.336979] [<ffffffff81154780>] ? fget_light+0xa0/0x100
[ 3810.336985] [<ffffffff81166516>] do_select+0x336/0x6e0
[ 3810.336991] [<ffffffff81165ee0>] ? poll_freewait+0xe0/0xe0
[ 3810.336996] [<ffffffff81165fd0>] ? __pollwait+0xf0/0xf0
[ 3810.337001] [<ffffffff81165fd0>] ? __pollwait+0xf0/0xf0
[ 3810.337006] [<ffffffff81165fd0>] ? __pollwait+0xf0/0xf0
[ 3810.337011] [<ffffffff81165fd0>] ? __pollwait+0xf0/0xf0
[ 3810.337016] [<ffffffff81165fd0>] ? __pollwait+0xf0/0xf0
[ 3810.337020] [<ffffffff81165fd0>] ? __pollwait+0xf0/0xf0
[ 3810.337025] [<ffffffff81165fd0>] ? __pollwait+0xf0/0xf0
[ 3810.337030] [<ffffffff81165fd0>] ? __pollwait+0xf0/0xf0
[ 3810.337035] [<ffffffff81165fd0>] ? __pollwait+0xf0/0xf0
[ 3810.337040] [<ffffffff81166a91>] core_sys_select+0x1d1/0x320
[ 3810.337047] [<ffffffff8103ee11>] ? get_parent_ip+0x11/0x50
[ 3810.337053] [<ffffffff81501bfd>] ? sub_preempt_count+0x9d/0xd0
[ 3810.337060] [<ffffffff810728a1>] ? __srcu_read_unlock+0x41/0x70
[ 3810.337065] [<ffffffff81190f82>] ? fsnotify+0x1c2/0x2a0
[ 3810.337071] [<ffffffff81166c9b>] sys_select+0xbb/0x100
[ 3810.337078] [<ffffffff8115338a>] ? sys_write+0x4a/0x90
[ 3810.337083] [<ffffffff8150582b>] system_call_fastpath+0x16/0x1b
[ 3810.337087] handlers:
[ 3810.337102] [<ffffffffa006f4f0>] e1000_intr
[ 3810.337106] Disabling IRQ 19
[ 3810.436188] Polling IRQ.
[ 3810.536226] Polling IRQ.
...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clemens at ladisch

Dec 10, 2011, 9:58 AM

Post #20 of 40 (6557 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

Jeroen Van den Keybus wrote:
> [...]
> - CPU services the IRQ, and does at least one (slow) PCI read to have
> the device deassert its IRQ line. In practice, more PCI read/writes
> are needed, requiring the bridge to do some PCIe traffic generation.
> - Bridge sees the IRQ line trasition and signals Deassert, This
> message has only a few usecs to arrive at the I/O-APIC.
> - _However_ the CPU has by large already handled the IRQ and gets
> interrupted again before the Deassert ever gets out. The resulting PCI
> bus traffic further delays the Deassert message (due to e.g. PCIe
> transmit credit exhaustion).
>
> My idea is that if we would not immediately hammer the bridge with
> PCIe transactions, the Deassert message may eventually arrive ?

PCIe messages are somewhat ordered; posted memory writes are allowed,
but IIRC a read transaction serializes all previous and following
transactions. Assuming that all involved devices work correctly.

> Also, is there any control by Linux of the credits issued ?

I don't think these can be controlled by software. The hardware is
supposed to get them correct.

> I therefore patched the polling system by detecting a stuck IRQ
> already after 10 unserviced IRQs. Then the polling system will take
> over for 50 cycles (5 seconds), after which the IRQ is reenabled.
>
> [ 1607.941232] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 1613.040185] Reenabling IRQ.
> [ 1908.541558] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 1913.640088] Reenabling IRQ.
> [ 2319.361659] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 2324.460064] Reenabling IRQ.
> [ 2782.285470] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 2787.384222] Reenabling IRQ.
> [ 3485.689347] irq 19: nobody cared (try booting with the "irqpoll" option)
> [ 3490.788079] Reenabling IRQ.
> [ 3810.336883] irq 19: nobody cared (try booting with the "irqpoll" option)

So the IRQ _does_ get unstuck eventually; I didn't expact that.

So either the ASM1083 delays its Deassert messages, or it is just way
too slow to react to changes in its PCI interrupt line inputs.

I'd guess that you can make the pollig time shorter; a few milliseconds
should be enough.


Your patch might be useful to others afflicted with this chip. Could
you publish it?


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jeroen.vandenkeybus at gmail

Dec 11, 2011, 7:28 AM

Post #21 of 40 (6644 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

> So the IRQ _does_ get unstuck eventually; I didn't expact that.

It would nevertheless make sense that the designer of the I/O-APIC
would have implemented a reasonable timeout for INTx-Deassert
reception. Perhaps that's what we see.

> So either the ASM1083 delays its Deassert messages, or it is just way
> too slow to react to changes in its PCI interrupt line inputs.

I'm afraid the only sensible thing to find this out would be to
somehow monitor the PCIe link traffic into the FCH from this ASM1083.
Maybe someone from AMD knows if this can be done ? Let's not forget
that the board seems to run fine under the Windows 7 O/S and maybe
Linux simply doesn't do a special trick with the bridge or the chipset
that Windows does. So, without further evidence, I would not (yet)
blame the bridge.

> I'd guess that you can make the pollig time shorter; a few milliseconds
> should be enough.

I tested the patch for a while now. I indeed decreased the polling
interval to 10 ms (100 Hz), and the IRQ is already enabled after 1
second (100 cycles). It works to a degree that the computer system
actually becomes useful. Under heavy use, the patch kicks in up to 10
times a minute. Otherwise it only is required a few times per hour. I
also turn off polling entirely when it is no longer needed.

Specifically for the Asus E45M1-M PRO I would recommend:

1. The IRQ bug manifests itself when using any device behind the
ASM1083 bridge. That includes the 2 PCI slots on the motherboard, as
well as the Firewire interface. Avoid their use. Preferably use the
PCIe x1 slot.

2. An important problem is that, when IRQ 16..19 goes down, an
integrated device, which otherwise works flawlessly, goes along with
it. This includes the SATA, USB and both audio (HDMI / Analog)
subsystems. If possible, enable the use of MSI for these devices.
Clemens's patch for AHCI MSI is a real help here.

3. Step 1 above will practically eliminate the occurrence of the IRQ
bug. If the PCI bus really is needed, the patch below must be used
(with the kernel irqpoll command line option turned on, of course).

> Your patch might be useful to others afflicted with this chip.  Could
> you publish it?

No problem, but I've never done this before. Is the result of diff
below ok ? Could someone specialized also have a look into the
thread-safety ?


J.


(Begin of patch for kernel/irq/spurious.c)

21c21
< #define POLL_SPURIOUS_IRQ_INTERVAL (HZ/10)
---
> #define POLL_SPURIOUS_IRQ_INTERVAL (HZ/100)
144c144
< int i;
---
> int i, poll_again;
149a150
> poll_again = 0; /* Will stay false as long as no polling candidate is found */
151c152
< unsigned int state;
---
> unsigned int state, irq;
161,164c162,182
<
< local_irq_disable();
< try_one_irq(i, desc, true);
< local_irq_enable();
---
>
> /* We end up here with a disabled spurious interrupt.
> desc->irqs_unhandled now tracks the number of times
> the interrupt has been polled */
>
> irq = desc->irq_data.irq;
> if (desc->irqs_unhandled < 100) { /* 1 second delay with poll frequency 100 Hz */
> if (desc->irqs_unhandled == 0)
> printk("Polling IRQ %d\n", irq);
> local_irq_disable();
> try_one_irq(i, desc, true);
> local_irq_enable();
> desc->irqs_unhandled++;
> poll_again = 1;
> } else {
> printk("Reenabling IRQ %d\n", irq);
> irq_enable(desc); /* Reenable the interrupt line */
> desc->depth--;
> desc->istate &= (~IRQS_SPURIOUS_DISABLED);
> desc->irqs_unhandled = 0;
> }
165a184,186
> if (poll_again)
> mod_timer(&poll_spurious_irq_timer,
> jiffies + POLL_SPURIOUS_IRQ_INTERVAL);
168,169d188
< mod_timer(&poll_spurious_irq_timer,
< jiffies + POLL_SPURIOUS_IRQ_INTERVAL);
180c199
< * If 99,900 of the previous 100,000 interrupts have not been handled
---
> * If 9 of the previous 10 interrupts have not been handled
184c203,211
< * (The other 100-of-100,000 interrupts may have been a correctly
---
> * Although this may cause early deactivation of a sporadically
> * malfunctioning IRQ line, the poll system will:
> * a) Poll it for 100 cycles at a 100 Hz rate
> * b) Reenable it afterwards
> *
> * In worst case, with current settings, this will cause short bursts
> * of 10 interrupts every second.
> *
> * (The other single interrupt may have been a correctly
305c332
< if (likely(desc->irq_count < 100000))
---
> if (likely(desc->irq_count < 10))
309c336
< if (unlikely(desc->irqs_unhandled > 99900)) {
---
> if (unlikely(desc->irqs_unhandled >= 9)) {
313c340
< __report_bad_irq(irq, desc, action_ret);
---
> /* __report_bad_irq(irq, desc, action_ret); */
317c344
< printk(KERN_EMERG "Disabling IRQ #%d\n", irq);
---
> printk(KERN_EMERG "Disabling IRQ %d\n", irq);

(End of patch)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


andymatei at gmail

Apr 25, 2012, 1:35 AM

Post #22 of 40 (5878 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

Hello all,

I am not that good in Linux as you guys are but here it goes. I have bought the Asus E45M1-M PRO to use in my home storage. I had 2 PCI Intel NICs that i wanted to use for iSCSI traffic that goes to 2 ESXi servers. 4 Seagate [at] 2T and one WD for OS. I use Openfiler as a storage appliance. Problem appears after 20-30 minutes after starting the PC. I get "Disabling IRQ 16" and "Disabling IRQ 19". SATA disks drop from 130MB/s to 2MB/s speed, web interface does not work anymore. After reboot everything works but goes down again after 30 minutes. Is there a patch or something that resolves this issue yet?

Thanks,
Andrei
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


andymatei at gmail

Apr 25, 2012, 1:48 AM

Post #23 of 40 (5864 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

Hmm... this means change the motherboard... great, a lot of money for a shitty Asus board.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clemens at ladisch

Apr 25, 2012, 1:48 AM

Post #24 of 40 (5888 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

andymatei [at] gmail wrote:
> I have bought the Asus E45M1-M PRO to use in my home storage. [...]
> I get "Disabling IRQ 16" and "Disabling IRQ 19". [...]
> Is there a patch or something that resolves this issue yet?

As far as I remember, this is a hardware bug in the ASMedia PCIe/PCI
bridge. This can neither easily nor completely be worked around in
software.


Regards,
Clemens
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


borislav.petkov at amd

Apr 27, 2012, 1:22 AM

Post #25 of 40 (5859 views)
Permalink
Re: Unhandled IRQs on AMD E-450 [In reply to]

On Wed, Apr 25, 2012 at 10:48:32AM +0200, Clemens Ladisch wrote:
> andymatei [at] gmail wrote:
> > I have bought the Asus E45M1-M PRO to use in my home storage. [...]
> > I get "Disabling IRQ 16" and "Disabling IRQ 19". [...]
> > Is there a patch or something that resolves this issue yet?
>
> As far as I remember, this is a hardware bug in the ASMedia PCIe/PCI
> bridge. This can neither easily nor completely be worked around in
> software.

So, I got noted of this discussion: https://lkml.org/lkml/2012/1/30/216

So why aren't you guys producing a proper patch for people to test?
Simply get everyone of the bugreporters to give it a try and if all is
well, Linus said he'll take it.

Clemens, Jeroen?

--
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
GM: Alberto Bozzo
Reg: Dornach, Landkreis Muenchen
HRB Nr. 43632 WEEE Registernr: 129 19551

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First page Previous page 1 2 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.