
jeroen.vandenkeybus at gmail
Dec 11, 2011, 7:28 AM
Views: 3086
Permalink
|
> So the IRQ _does_ get unstuck eventually; I didn't expact that. It would nevertheless make sense that the designer of the I/O-APIC would have implemented a reasonable timeout for INTx-Deassert reception. Perhaps that's what we see. > So either the ASM1083 delays its Deassert messages, or it is just way > too slow to react to changes in its PCI interrupt line inputs. I'm afraid the only sensible thing to find this out would be to somehow monitor the PCIe link traffic into the FCH from this ASM1083. Maybe someone from AMD knows if this can be done ? Let's not forget that the board seems to run fine under the Windows 7 O/S and maybe Linux simply doesn't do a special trick with the bridge or the chipset that Windows does. So, without further evidence, I would not (yet) blame the bridge. > I'd guess that you can make the pollig time shorter; a few milliseconds > should be enough. I tested the patch for a while now. I indeed decreased the polling interval to 10 ms (100 Hz), and the IRQ is already enabled after 1 second (100 cycles). It works to a degree that the computer system actually becomes useful. Under heavy use, the patch kicks in up to 10 times a minute. Otherwise it only is required a few times per hour. I also turn off polling entirely when it is no longer needed. Specifically for the Asus E45M1-M PRO I would recommend: 1. The IRQ bug manifests itself when using any device behind the ASM1083 bridge. That includes the 2 PCI slots on the motherboard, as well as the Firewire interface. Avoid their use. Preferably use the PCIe x1 slot. 2. An important problem is that, when IRQ 16..19 goes down, an integrated device, which otherwise works flawlessly, goes along with it. This includes the SATA, USB and both audio (HDMI / Analog) subsystems. If possible, enable the use of MSI for these devices. Clemens's patch for AHCI MSI is a real help here. 3. Step 1 above will practically eliminate the occurrence of the IRQ bug. If the PCI bus really is needed, the patch below must be used (with the kernel irqpoll command line option turned on, of course). > Your patch might be useful to others afflicted with this chip. Could > you publish it? No problem, but I've never done this before. Is the result of diff below ok ? Could someone specialized also have a look into the thread-safety ? J. (Begin of patch for kernel/irq/spurious.c) 21c21 < #define POLL_SPURIOUS_IRQ_INTERVAL (HZ/10) --- > #define POLL_SPURIOUS_IRQ_INTERVAL (HZ/100) 144c144 < int i; --- > int i, poll_again; 149a150 > poll_again = 0; /* Will stay false as long as no polling candidate is found */ 151c152 < unsigned int state; --- > unsigned int state, irq; 161,164c162,182 < < local_irq_disable(); < try_one_irq(i, desc, true); < local_irq_enable(); --- > > /* We end up here with a disabled spurious interrupt. > desc->irqs_unhandled now tracks the number of times > the interrupt has been polled */ > > irq = desc->irq_data.irq; > if (desc->irqs_unhandled < 100) { /* 1 second delay with poll frequency 100 Hz */ > if (desc->irqs_unhandled == 0) > printk("Polling IRQ %d\n", irq); > local_irq_disable(); > try_one_irq(i, desc, true); > local_irq_enable(); > desc->irqs_unhandled++; > poll_again = 1; > } else { > printk("Reenabling IRQ %d\n", irq); > irq_enable(desc); /* Reenable the interrupt line */ > desc->depth--; > desc->istate &= (~IRQS_SPURIOUS_DISABLED); > desc->irqs_unhandled = 0; > } 165a184,186 > if (poll_again) > mod_timer(&poll_spurious_irq_timer, > jiffies + POLL_SPURIOUS_IRQ_INTERVAL); 168,169d188 < mod_timer(&poll_spurious_irq_timer, < jiffies + POLL_SPURIOUS_IRQ_INTERVAL); 180c199 < * If 99,900 of the previous 100,000 interrupts have not been handled --- > * If 9 of the previous 10 interrupts have not been handled 184c203,211 < * (The other 100-of-100,000 interrupts may have been a correctly --- > * Although this may cause early deactivation of a sporadically > * malfunctioning IRQ line, the poll system will: > * a) Poll it for 100 cycles at a 100 Hz rate > * b) Reenable it afterwards > * > * In worst case, with current settings, this will cause short bursts > * of 10 interrupts every second. > * > * (The other single interrupt may have been a correctly 305c332 < if (likely(desc->irq_count < 100000)) --- > if (likely(desc->irq_count < 10)) 309c336 < if (unlikely(desc->irqs_unhandled > 99900)) { --- > if (unlikely(desc->irqs_unhandled >= 9)) { 313c340 < __report_bad_irq(irq, desc, action_ret); --- > /* __report_bad_irq(irq, desc, action_ret); */ 317c344 < printk(KERN_EMERG "Disabling IRQ #%d\n", irq); --- > printk(KERN_EMERG "Disabling IRQ %d\n", irq); (End of patch) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo [at] vger More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
|