Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[tip:x86/apic] x86: Use EOI register in io-apic on intel platforms

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


suresh.b.siddha at intel

Nov 2, 2009, 8:17 AM

Post #1 of 10 (592 views)
Permalink
[tip:x86/apic] x86: Use EOI register in io-apic on intel platforms

Commit-ID: b3ec0a37a7907813bb4fb85a2d94102c152470b7
Gitweb: http://git.kernel.org/tip/b3ec0a37a7907813bb4fb85a2d94102c152470b7
Author: Suresh Siddha <suresh.b.siddha [at] intel>
AuthorDate: Mon, 26 Oct 2009 14:24:35 -0800
Committer: Ingo Molnar <mingo [at] elte>
CommitDate: Mon, 2 Nov 2009 15:56:36 +0100

x86: Use EOI register in io-apic on intel platforms

IO-APIC's in intel chipsets support EOI register starting from
IO-APIC version 2. Use that when ever we need to clear the
IO-APIC RTE's RemoteIRR bit explicitly.

Signed-off-by: Suresh Siddha <suresh.b.siddha [at] intel>
Acked-by: Gary Hade <garyhade [at] us>
Cc: Eric W. Biederman <ebiederm [at] xmission>
LKML-Reference: <20091026230001.947855317 [at] sbs-t61>
[ Marked use_eio_reg as __read_mostly, fixed small details ]
Signed-off-by: Ingo Molnar <mingo [at] elte>
---
arch/x86/kernel/apic/io_apic.c | 81 ++++++++++++++++++++++++++-------------
1 files changed, 54 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 4e886ef..31e9db3 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2492,6 +2492,51 @@ static void ack_apic_edge(unsigned int irq)

atomic_t irq_mis_count;

+static int use_eoi_reg __read_mostly;
+
+static void __eoi_ioapic_irq(unsigned int irq, struct irq_cfg *cfg)
+{
+ struct irq_pin_list *entry;
+
+ for_each_irq_pin(entry, cfg->irq_2_pin) {
+ if (irq_remapped(irq))
+ io_apic_eoi(entry->apic, entry->pin);
+ else
+ io_apic_eoi(entry->apic, cfg->vector);
+ }
+}
+
+static void eoi_ioapic_irq(struct irq_desc *desc)
+{
+ struct irq_cfg *cfg;
+ unsigned long flags;
+ unsigned int irq;
+
+ irq = desc->irq;
+ cfg = desc->chip_data;
+
+ spin_lock_irqsave(&ioapic_lock, flags);
+ __eoi_ioapic_irq(irq, cfg);
+ spin_unlock_irqrestore(&ioapic_lock, flags);
+}
+
+static int ioapic_supports_eoi(void)
+{
+ struct pci_dev *root;
+
+ root = pci_get_bus_and_slot(0, PCI_DEVFN(0, 0));
+ if (root && root->vendor == PCI_VENDOR_ID_INTEL &&
+ mp_ioapics[0].apicver >= 0x2) {
+ use_eoi_reg = 1;
+ printk(KERN_INFO "IO-APIC supports EOI register\n");
+ } else
+ printk(KERN_INFO "IO-APIC doesn't support EOI\n");
+
+ return 0;
+}
+
+fs_initcall(ioapic_supports_eoi);
+
static void ack_apic_level(unsigned int irq)
{
struct irq_desc *desc = irq_to_desc(irq);
@@ -2575,37 +2620,19 @@ static void ack_apic_level(unsigned int irq)
/* Tail end of version 0x11 I/O APIC bug workaround */
if (!(v & (1 << (i & 0x1f)))) {
atomic_inc(&irq_mis_count);
- spin_lock(&ioapic_lock);
- __mask_and_edge_IO_APIC_irq(cfg);
- __unmask_and_level_IO_APIC_irq(cfg);
- spin_unlock(&ioapic_lock);
+
+ if (use_eoi_reg)
+ eoi_ioapic_irq(desc);
+ else {
+ spin_lock(&ioapic_lock);
+ __mask_and_edge_IO_APIC_irq(cfg);
+ __unmask_and_level_IO_APIC_irq(cfg);
+ spin_unlock(&ioapic_lock);
+ }
}
}

#ifdef CONFIG_INTR_REMAP
-static void __eoi_ioapic_irq(unsigned int irq, struct irq_cfg *cfg)
-{
- struct irq_pin_list *entry;
-
- for_each_irq_pin(entry, cfg->irq_2_pin)
- io_apic_eoi(entry->apic, entry->pin);
-}
-
-static void
-eoi_ioapic_irq(struct irq_desc *desc)
-{
- struct irq_cfg *cfg;
- unsigned long flags;
- unsigned int irq;
-
- irq = desc->irq;
- cfg = desc->chip_data;
-
- spin_lock_irqsave(&ioapic_lock, flags);
- __eoi_ioapic_irq(irq, cfg);
- spin_unlock_irqrestore(&ioapic_lock, flags);
-}
-
static void ir_ack_apic_edge(unsigned int irq)
{
ack_APIC_irq();
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


macro at linux-mips

Nov 3, 2009, 4:53 PM

Post #2 of 10 (561 views)
Permalink
Re: [tip:x86/apic] x86: Use EOI register in io-apic on intel platforms [In reply to]

On Mon, 2 Nov 2009, tip-bot for Suresh Siddha wrote:

> Commit-ID: b3ec0a37a7907813bb4fb85a2d94102c152470b7
> Gitweb: http://git.kernel.org/tip/b3ec0a37a7907813bb4fb85a2d94102c152470b7
> Author: Suresh Siddha <suresh.b.siddha [at] intel>
> AuthorDate: Mon, 26 Oct 2009 14:24:35 -0800
> Committer: Ingo Molnar <mingo [at] elte>
> CommitDate: Mon, 2 Nov 2009 15:56:36 +0100
>
> x86: Use EOI register in io-apic on intel platforms
>
> IO-APIC's in intel chipsets support EOI register starting from
> IO-APIC version 2. Use that when ever we need to clear the
> IO-APIC RTE's RemoteIRR bit explicitly.
>
> Signed-off-by: Suresh Siddha <suresh.b.siddha [at] intel>
> Acked-by: Gary Hade <garyhade [at] us>
> Cc: Eric W. Biederman <ebiederm [at] xmission>
> LKML-Reference: <20091026230001.947855317 [at] sbs-t61>
> [ Marked use_eio_reg as __read_mostly, fixed small details ]
> Signed-off-by: Ingo Molnar <mingo [at] elte>
> ---
[...]
> +static int ioapic_supports_eoi(void)
> +{
> + struct pci_dev *root;
> +
> + root = pci_get_bus_and_slot(0, PCI_DEVFN(0, 0));
> + if (root && root->vendor == PCI_VENDOR_ID_INTEL &&
> + mp_ioapics[0].apicver >= 0x2) {
> + use_eoi_reg = 1;
> + printk(KERN_INFO "IO-APIC supports EOI register\n");
> + } else
> + printk(KERN_INFO "IO-APIC doesn't support EOI\n");
> +
> + return 0;
> +}

This is wrong -- the 82093AA I/O APIC has its version set to 0x11 and it
does not support the EOI register. Similarly I/O APICs integrated into
the 82379AB south bridge and the 82374EB/SB EISA component.

Overall values below 0x10 are reserved for the 82489DX -- are you sure
you didn't mean 0x12 or 0x20?

Maciej
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


suresh.b.siddha at intel

Nov 3, 2009, 6:24 PM

Post #3 of 10 (563 views)
Permalink
Re: [tip:x86/apic] x86: Use EOI register in io-apic on intel platforms [In reply to]

On Tue, 2009-11-03 at 16:53 -0800, Maciej W. Rozycki wrote:
> On Mon, 2 Nov 2009, tip-bot for Suresh Siddha wrote:
>
> > Commit-ID: b3ec0a37a7907813bb4fb85a2d94102c152470b7
> > Gitweb: http://git.kernel.org/tip/b3ec0a37a7907813bb4fb85a2d94102c152470b7
> > Author: Suresh Siddha <suresh.b.siddha [at] intel>
> > AuthorDate: Mon, 26 Oct 2009 14:24:35 -0800
> > Committer: Ingo Molnar <mingo [at] elte>
> > CommitDate: Mon, 2 Nov 2009 15:56:36 +0100
> >
> > x86: Use EOI register in io-apic on intel platforms
> >
> > IO-APIC's in intel chipsets support EOI register starting from
> > IO-APIC version 2. Use that when ever we need to clear the
> > IO-APIC RTE's RemoteIRR bit explicitly.
> >
> > Signed-off-by: Suresh Siddha <suresh.b.siddha [at] intel>
> > Acked-by: Gary Hade <garyhade [at] us>
> > Cc: Eric W. Biederman <ebiederm [at] xmission>
> > LKML-Reference: <20091026230001.947855317 [at] sbs-t61>
> > [ Marked use_eio_reg as __read_mostly, fixed small details ]
> > Signed-off-by: Ingo Molnar <mingo [at] elte>
> > ---
> [...]
> > +static int ioapic_supports_eoi(void)
> > +{
> > + struct pci_dev *root;
> > +
> > + root = pci_get_bus_and_slot(0, PCI_DEVFN(0, 0));
> > + if (root && root->vendor == PCI_VENDOR_ID_INTEL &&
> > + mp_ioapics[0].apicver >= 0x2) {
> > + use_eoi_reg = 1;
> > + printk(KERN_INFO "IO-APIC supports EOI register\n");
> > + } else
> > + printk(KERN_INFO "IO-APIC doesn't support EOI\n");
> > +
> > + return 0;
> > +}
>
> This is wrong -- the 82093AA I/O APIC has its version set to 0x11 and it
> does not support the EOI register. Similarly I/O APICs integrated into
> the 82379AB south bridge and the 82374EB/SB EISA component.

Maciej, There might be some confusion (mostly on my side). When I looked
at ICH2 spec http://www.intel.com/assets/pdf/datasheet/290687.pdf it
says it has an EOI register and it is version 2.

All I heard internally was we have it from version 2 and it works (our
experiments on ICH4 etc worked).

But I do agree that I overlooked the version 11h of 82093AA (which is
older than ICH2).

> Overall values below 0x10 are reserved for the 82489DX

This is for local apic right?

> -- are you sure
> you didn't mean 0x12 or 0x20?

Let me check tomorrow and see where the confusion is.

thanks for looking at this.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


suresh.b.siddha at intel

Nov 4, 2009, 3:04 PM

Post #4 of 10 (557 views)
Permalink
Re: [tip:x86/apic] x86: Use EOI register in io-apic on intel platforms [In reply to]

On Tue, 2009-11-03 at 18:24 -0800, Suresh Siddha wrote:
> Maciej, There might be some confusion (mostly on my side). When I looked
> at ICH2 spec http://www.intel.com/assets/pdf/datasheet/290687.pdf it
> says it has an EOI register and it is version 2.

Ok, Issue is in the specs documentation. ICH2 to ICH5 seems to document
that they use version 0x2 ioapic's, while infact they use 0x20 version
and has a working EOI register. So we should use 0x20 as the version
check and not 0x2.

Maciej, can I have your ack for the appended patch?
---

From: Suresh Siddha <suresh.b.siddha [at] intel>
Subject: x86, ioapic: fix io-apic version check for the EOI reg presence

Maciej W. Rozycki reported:
> 82093AA I/O APIC has its version set to 0x11 and it
> does not support the EOI register. Similarly I/O APICs integrated into
> the 82379AB south bridge and the 82374EB/SB EISA component.

IO-APIC versions below 0x20 don't support EOI register.

Some of the Intel ICH Specs (ICH2 to ICH5) documents the io-apic
version as 0x2. This is an error with documentation and these ICH chips
use io-apic's of version 0x20 and indeed has a working EOI register
for the io-apic.

Fix the ioapic_supports_eoi() to check for version 0x20 and beyond.

Reported-by: Maciej W. Rozycki <macro [at] linux-mips>
Acked-by: Rajesh Sankaran <rajesh.sankaran [at] intel>
Signed-off-by: Suresh Siddha <suresh.b.siddha [at] intel>
---

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 47d95c3..68510d4 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2520,13 +2520,29 @@ static void eoi_ioapic_irq(struct irq_desc *desc)
spin_unlock_irqrestore(&ioapic_lock, flags);
}

+/*
+ * IO-APIC versions below 0x20 don't support EOI register.
+ * For the record, here is the information about various versions:
+ * 0Xh 82489DX
+ * 1Xh I/OAPIC or I/O(x)APIC which are not PCI 2.2 Compliant
+ * 2Xh I/O(x)APIC which is PCI 2.2 Compliant
+ * 30h-FFh Reserved
+ *
+ * Some of the Intel ICH Specs (ICH2 to ICH5) documents the io-apic
+ * version as 0x2. This is an error with documentation and these ICH chips
+ * use io-apic's of version 0x20.
+ */
static int ioapic_supports_eoi(void)
{
struct pci_dev *root;

root = pci_get_bus_and_slot(0, PCI_DEVFN(0, 0));
+ /*
+ * Perhaps we can do this for all vendors. But for now,
+ * no one else seems to have version >= 0x20 ??
+ */
if (root && root->vendor == PCI_VENDOR_ID_INTEL &&
- mp_ioapics[0].apicver >= 0x2) {
+ mp_ioapics[0].apicver >= 0x20) {
use_eoi_reg = 1;
printk(KERN_INFO "IO-APIC supports EOI register\n");
} else


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


macro at linux-mips

Nov 5, 2009, 6:46 AM

Post #5 of 10 (549 views)
Permalink
Re: [tip:x86/apic] x86: Use EOI register in io-apic on intel platforms [In reply to]

On Wed, 4 Nov 2009, Suresh Siddha wrote:

> > Maciej, There might be some confusion (mostly on my side). When I looked
> > at ICH2 spec http://www.intel.com/assets/pdf/datasheet/290687.pdf it
> > says it has an EOI register and it is version 2.

For the record: the original ICH/ICH0 (82801AA/AB) seems to have the
version set to 0x11 as well. Now its datasheet mentions the EOI register,
but even if it is not a mistake, its usefulness is rather limited as the
APIC does not support FSB delivery.

Also for the record: the 82489DX is a combined local and I/O APIC chip --
quite reasonable given it was intended for 486-class processors which
needed both units and the cost of manufacturing a separate chip for each
unit plus the extra PCB space needed in that case certainly outweighed the
loss from having a part of silicon unused on some chips.

> Ok, Issue is in the specs documentation. ICH2 to ICH5 seems to document
> that they use version 0x2 ioapic's, while infact they use 0x20 version
> and has a working EOI register. So we should use 0x20 as the version
> check and not 0x2.

I've checked some docs too and you may want to check whether to qualify
the use of the EOI register further with the DT (Delivery Type) bit in the
Boot Configuration register. It affects ICH2 at least (but some I/O APICs
do this differently; e.g. the PID or 82094AA changes its version between
0x13 and 0x21 depending on whether FSB delivery is used or not).
Otherwise you may have interrupts delivered twice (though it may not be a
big problem after all).

Note that we have a piece of code to report the existence of the DT bit
already, so we've got the knowledge about APIC versions already. ;) I
guess it should be modified a bit though as reportedly the PID does not
have the Boot Configuration register, so the version checked there should
be 0x20 exactly, rather than greater or equal.

Also you may want to see whether the complication in ack_apic_level()
that is meant to deal with an APIC erratum really matters for FSB delivery
-- I guess not, because if you explicitly ACK an interrupt in the I/O
unit, then even if it was incorrectly recorded as edge-triggered in the
local unit, the IRR bit will be correctly reset and the next message
delivered properly. Given you introduce a conditional statement anyway,
you can place more code within it and there will be no performance hit for
the other path and certainly a gain for this one.

Additionally it seems to me that code to migrate an IRQ is placed
incorrectly -- the erratum workaround should be placed above it as it
complements ack_APIC_irq() -- the IRR bit will not have been reset before
the workaround has been executed if the erratum was hit. Will you care
about this problem?

> Maciej, can I have your ack for the appended patch?

Certainly, it looks good to me.

Acked-by: Maciej W. Rozycki <macro [at] linux-mips>

> ---
>
> From: Suresh Siddha <suresh.b.siddha [at] intel>
> Subject: x86, ioapic: fix io-apic version check for the EOI reg presence
>
> Maciej W. Rozycki reported:
> > 82093AA I/O APIC has its version set to 0x11 and it
> > does not support the EOI register. Similarly I/O APICs integrated into
> > the 82379AB south bridge and the 82374EB/SB EISA component.
>
> IO-APIC versions below 0x20 don't support EOI register.
>
> Some of the Intel ICH Specs (ICH2 to ICH5) documents the io-apic
> version as 0x2. This is an error with documentation and these ICH chips
> use io-apic's of version 0x20 and indeed has a working EOI register
> for the io-apic.
>
> Fix the ioapic_supports_eoi() to check for version 0x20 and beyond.
>
> Reported-by: Maciej W. Rozycki <macro [at] linux-mips>
> Acked-by: Rajesh Sankaran <rajesh.sankaran [at] intel>
> Signed-off-by: Suresh Siddha <suresh.b.siddha [at] intel>
> ---
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index 47d95c3..68510d4 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -2520,13 +2520,29 @@ static void eoi_ioapic_irq(struct irq_desc *desc)
> spin_unlock_irqrestore(&ioapic_lock, flags);
> }
>
> +/*
> + * IO-APIC versions below 0x20 don't support EOI register.
> + * For the record, here is the information about various versions:
> + * 0Xh 82489DX
> + * 1Xh I/OAPIC or I/O(x)APIC which are not PCI 2.2 Compliant
> + * 2Xh I/O(x)APIC which is PCI 2.2 Compliant
> + * 30h-FFh Reserved
> + *
> + * Some of the Intel ICH Specs (ICH2 to ICH5) documents the io-apic
> + * version as 0x2. This is an error with documentation and these ICH chips
> + * use io-apic's of version 0x20.
> + */
> static int ioapic_supports_eoi(void)
> {
> struct pci_dev *root;
>
> root = pci_get_bus_and_slot(0, PCI_DEVFN(0, 0));
> + /*
> + * Perhaps we can do this for all vendors. But for now,
> + * no one else seems to have version >= 0x20 ??
> + */
> if (root && root->vendor == PCI_VENDOR_ID_INTEL &&
> - mp_ioapics[0].apicver >= 0x2) {
> + mp_ioapics[0].apicver >= 0x20) {
> use_eoi_reg = 1;
> printk(KERN_INFO "IO-APIC supports EOI register\n");
> } else
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


suresh.b.siddha at intel

Nov 5, 2009, 4:01 PM

Post #6 of 10 (544 views)
Permalink
Re: [tip:x86/apic] x86: Use EOI register in io-apic on intel platforms [In reply to]

On Thu, 2009-11-05 at 06:46 -0800, Maciej W. Rozycki wrote:
> For the record: the original ICH/ICH0 (82801AA/AB) seems to have the
> version set to 0x11 as well. Now its datasheet mentions the EOI register,
> but even if it is not a mistake,

hmm. what a mess.

I found a kernel log in an archive
(http://www.mail-archive.com/linux-usb-users [at] lists/msg13255.html) for ICH0 which shows that it uses an io-apic of version 0x20. So mostly the version in the datasheet is wrong here aswell :-(

And our IO/chipset architect (Rajesh) also confirmed that ICH0 indeed
has an EOI reg.

So, version 0x20 seems to be pretty safe to use EOI register. And
according to Gary Hade's experiments on IBM platforms having an io-apic
version 0x11, magic of mask+edge and unmask+level seems to clear
remoteIRR.

So the current logic looks safe to me.

> Also for the record: the 82489DX is a combined local and I/O APIC chip --

Thanks. This is what I learnt internally too yesterday and hence the
documentation update ;)

> I've checked some docs too and you may want to check whether to qualify
> the use of the EOI register further with the DT (Delivery Type) bit in the
> Boot Configuration register. It affects ICH2 at least

For relatively newer ICH's like ICH5, boot configuration register is
marked as a reserved register (perhaps with the serial bus going away,
so did this register but again with out the io-apic version change).

Anyways, our understanding is that EOI register in ICH2 should work
irrespective of DT bit in the boot config register.

> Also you may want to see whether the complication in ack_apic_level()
> that is meant to deal with an APIC erratum really matters for FSB delivery
> -- I guess not, because if you explicitly ACK an interrupt in the I/O
> unit, then even if it was incorrectly recorded as edge-triggered in the
> local unit, the IRR bit will be correctly reset and the next message
> delivered properly. Given you introduce a conditional statement anyway,
> you can place more code within it and there will be no performance hit for
> the other path and certainly a gain for this one.

I am not sure if I follow. With the recent changes (tip commit
5231a68614b94f60e8f6c56bc6e3d75955b9e75e), we use ipi on a new cpu to
handle a pending level-triggered interrupt on the cpu that is going
offline. It's just not only in the case of io-apic erratum for 0x11, we
see level triggered interrupt as edge interrupt at the cpu.


> Additionally it seems to me that code to migrate an IRQ is placed
> incorrectly -- the erratum workaround should be placed above it as it
> complements ack_APIC_irq() -- the IRR bit will not have been reset before
> the workaround has been executed if the erratum was hit. Will you care
> about this problem?

Yes. I see this issue and agree with your assesment. The result is that
we missed an irq migration attempt and delay it to the next arrival.

I will post a different fix and also update some of the code comments
around this to reflect new changes in the code.

> > Maciej, can I have your ack for the appended patch?
>
> Certainly, it looks good to me.
>
> Acked-by: Maciej W. Rozycki <macro [at] linux-mips>

Thanks. Ingo, Can you please queue this patch too? I am planning to do
couple of more cleanups on top of this. I will post them shortly.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


macro at linux-mips

Nov 5, 2009, 10:53 PM

Post #7 of 10 (542 views)
Permalink
Re: [tip:x86/apic] x86: Use EOI register in io-apic on intel platforms [In reply to]

On Thu, 5 Nov 2009, Suresh Siddha wrote:

> > I've checked some docs too and you may want to check whether to qualify
> > the use of the EOI register further with the DT (Delivery Type) bit in the
> > Boot Configuration register. It affects ICH2 at least
>
> For relatively newer ICH's like ICH5, boot configuration register is
> marked as a reserved register (perhaps with the serial bus going away,
> so did this register but again with out the io-apic version change).
>
> Anyways, our understanding is that EOI register in ICH2 should work
> irrespective of DT bit in the boot config register.

What I mean is if the serial delivery type is used, then an interrupt
will be acked twice -- once via an EOI message sent from the local APIC
over the serial bus and then again via the write to the EOI register.
There is a race condition here, so if the IRQ line is still/again
asserted, then most likely two consecutive IRQ messages will be sent by
the I/O APIC and they may be accepted by two different local units and
eventually delivered to the OS. Linux will cope, but still this is sloppy
programming, so it should be best avoided -- if possible that is.

> > Also you may want to see whether the complication in ack_apic_level()
> > that is meant to deal with an APIC erratum really matters for FSB delivery
> > -- I guess not, because if you explicitly ACK an interrupt in the I/O
> > unit, then even if it was incorrectly recorded as edge-triggered in the
> > local unit, the IRR bit will be correctly reset and the next message
> > delivered properly. Given you introduce a conditional statement anyway,
> > you can place more code within it and there will be no performance hit for
> > the other path and certainly a gain for this one.
>
> I am not sure if I follow. With the recent changes (tip commit
> 5231a68614b94f60e8f6c56bc6e3d75955b9e75e), we use ipi on a new cpu to
> handle a pending level-triggered interrupt on the cpu that is going
> offline. It's just not only in the case of io-apic erratum for 0x11, we
> see level triggered interrupt as edge interrupt at the cpu.

I don't get it, sorry -- an interrupt has its trigger mode implied by the
IRQ_LEVEL status flag being set or clear in the IRQ's descriptor. What's
set in the local APIC's TMR does not (or should not) matter IMO.

Has the trigger mode mismatch erratum been seen for FSB delivered
interrupts anyway? My proposed patch is below. Includes the code
rearrangement I proposed as well. Untested, not even built.

Maciej
---
Complete the EOI dance for the trigger mode mismatch APIC erratum before
proceeding to IRQ migration. Omit the erratum workaround for I/O APICs
using FSB delivery as they get their EOI message delivered by hand.

Signed-off-by: Maciej W. Rozycki <macro [at] linux-mips>
---
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 31e9db3..6e4edad 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2555,33 +2555,50 @@ static void ack_apic_level(unsigned int irq)
#endif

/*
- * It appears there is an erratum which affects at least version 0x11
- * of I/O APIC (that's the 82093AA and cores integrated into various
- * chipsets). Under certain conditions a level-triggered interrupt is
- * erroneously delivered as edge-triggered one but the respective IRR
- * bit gets set nevertheless. As a result the I/O unit expects an EOI
- * message but it will never arrive and further interrupts are blocked
- * from the source. The exact reason is so far unknown, but the
- * phenomenon was observed when two consecutive interrupt requests
- * from a given source get delivered to the same CPU and the source is
- * temporarily disabled in between.
- *
- * A workaround is to simulate an EOI message manually. We achieve it
- * by setting the trigger mode to edge and then to level when the edge
- * trigger mode gets detected in the TMR of a local APIC for a
- * level-triggered interrupt. We mask the source for the time of the
- * operation to prevent an edge-triggered interrupt escaping meanwhile.
- * The idea is from Manfred Spraul. --macro
+ * We must acknowledge the irq before we move it or the acknowledge
+ * will not propagate properly.
*/
- cfg = desc->chip_data;
- i = cfg->vector;
- v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));
+ if (use_eoi_reg) {
+ ack_APIC_irq();
+ eoi_ioapic_irq(desc);
+ } else {
+ /*
+ * It appears there is an erratum which affects at least
+ * version 0x11 of I/O APIC (that's the 82093AA and cores
+ * integrated into various chipsets). Under certain
+ * conditions a level-triggered interrupt is erroneously
+ * delivered as edge-triggered one but the respective IRR
+ * bit gets set nevertheless. As a result the I/O unit
+ * expects an EOI message but it will never arrive and
+ * further interrupts are blocked from the source. The
+ * exact reason is so far unknown, but the phenomenon was
+ * observed when two consecutive interrupt requests from
+ * a given source get delivered to the same CPU and the
+ * source is temporarily disabled in between.
+ *
+ * A workaround is to simulate an EOI message manually.
+ * We achieve it by setting the trigger mode to edge and
+ * then to level when the edge trigger mode gets detected
+ * in the TMR of a local APIC for a level-triggered
+ * interrupt. We mask the source for the time of the
+ * operation to prevent an edge-triggered interrupt
+ * escaping meanwhile.
+ *
+ * The idea is from Manfred Spraul. --macro
+ */
+ cfg = desc->chip_data;
+ i = cfg->vector;
+ v = apic_read(APIC_TMR + ((i & ~0x1f) >> 1));

- /*
- * We must acknowledge the irq before we move it or the acknowledge will
- * not propagate properly.
- */
- ack_APIC_irq();
+ ack_APIC_irq();
+
+ /* Tail end of version 0x11 I/O APIC bug workaround. */
+ atomic_inc(&irq_mis_count);
+ spin_lock(&ioapic_lock);
+ __mask_and_edge_IO_APIC_irq(cfg);
+ __unmask_and_level_IO_APIC_irq(cfg);
+ spin_unlock(&ioapic_lock);
+ }

/* Now we can move and renable the irq */
if (unlikely(do_unmask_irq)) {
@@ -2616,20 +2633,6 @@ static void ack_apic_level(unsigned int irq)
move_masked_irq(irq);
unmask_IO_APIC_irq_desc(desc);
}
-
- /* Tail end of version 0x11 I/O APIC bug workaround */
- if (!(v & (1 << (i & 0x1f)))) {
- atomic_inc(&irq_mis_count);
-
- if (use_eoi_reg)
- eoi_ioapic_irq(desc);
- else {
- spin_lock(&ioapic_lock);
- __mask_and_edge_IO_APIC_irq(cfg);
- __unmask_and_level_IO_APIC_irq(cfg);
- spin_unlock(&ioapic_lock);
- }
- }
}

#ifdef CONFIG_INTR_REMAP
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


suresh.b.siddha at intel

Nov 6, 2009, 11:27 PM

Post #8 of 10 (537 views)
Permalink
Re: [tip:x86/apic] x86: Use EOI register in io-apic on intel platforms [In reply to]

On Thu, 2009-11-05 at 22:53 -0800, Maciej W. Rozycki wrote:
> What I mean is if the serial delivery type is used, then an interrupt
> will be acked twice -- once via an EOI message sent from the local APIC
> over the serial bus and then again via the write to the EOI register.

Maciej, Case where we are doing an explicit EOI to the io-apic (using
EOI register) is when the level triggered interrupt gets registered at
the cpu as an edge interrupt (in the local apic's trigger mode
register).

It will arrive as an edge interrupt for two cases.
a) for corner conditions which hit the 82093AA (io-apic version 0x11)
erratum
b) with my recent patch in -tip, during a cpu offline, when we send an
ipi (IPI is always registered as an edge triggered at the cpu) to
service the interrupt at the new cpu destination, instead of servicing
at it's old destination cpu (as it has already disabled interrupts and
going down -- like not being on the cpu_online_map etc).

So we are not acking the io-apic twice in this case, as the eoi to the
local apic won't brodcast the eoi to the io-apic (because of the edge
mode in trigger mode register of the local apic).

> There is a race condition here, so if the IRQ line is still/again
> asserted, then most likely two consecutive IRQ messages will be sent by
> the I/O APIC and they may be accepted by two different local units and
> eventually delivered to the OS. Linux will cope, but still this is sloppy
> programming, so it should be best avoided -- if possible that is.

I hope the above clarified.

> >
> > I am not sure if I follow. With the recent changes (tip commit
> > 5231a68614b94f60e8f6c56bc6e3d75955b9e75e), we use ipi on a new cpu to
> > handle a pending level-triggered interrupt on the cpu that is going
> > offline. It's just not only in the case of io-apic erratum for 0x11, we
> > see level triggered interrupt as edge interrupt at the cpu.
>
> I don't get it, sorry -- an interrupt has its trigger mode implied by the
> IRQ_LEVEL status flag being set or clear in the IRQ's descriptor. What's
> set in the local APIC's TMR does not (or should not) matter IMO.

Same, did the above clarify?

>
> Has the trigger mode mismatch erratum been seen for FSB delivered
> interrupts anyway?

No.

> Complete the EOI dance for the trigger mode mismatch APIC erratum before
> proceeding to IRQ migration.

I am ok with what you are tying to fix, but not your patch. please see
below.

> + if (use_eoi_reg) {
> + ack_APIC_irq();
> + eoi_ioapic_irq(desc);

We shouldn't do this unconditionally. i.e., we should do this double ack
only when the level-triggered interrupt is seen as an edge triggered
interrupt at the cpu (specified by the trigger mode register in local
apic).

Otherwise as you mentioned above, we will see two EOI msg's at the
io-apic (one by the cpu's eoi broadcast and another by explicit eoi to
the io-apic).

Best way to fix the issue you noticed is by simply moving the tail end
of that erratum fix before we try to migrate the irq to a new place.

Do you agree?

---
From: "Maciej W. Rozycki" <macro [at] linux-mips>
Subject: x86, io-apic: move the effort of clearing remoteIRR explicitly
before migrating the irq

When the level-triggered interrupt is seen as an edge interrupt, we try
to clear the remoteIRR explicitly (using either an io-apic eoi register
when present or through the idea of changing trigger mode of the io-apic
RTE to edge and then back to level). But this explicit try also needs to
happen before we try to migrate the irq. Otherwise irq migration attempt
will fail anyhow, as it postpones the irq migration to a later attempt
when it sees the remoteIRR in the io-apic RTE still set.

Signed-off-by: "Maciej W. Rozycki" <macro [at] linux-mips>
Reviewed-by: Suresh Siddha <suresh.b.siddha [at] intel>
---
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 47d95c3..9a26ea1 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -2583,6 +2583,20 @@ static void ack_apic_level(unsigned int irq)
*/
ack_APIC_irq();

+ /* Tail end of version 0x11 I/O APIC bug workaround */
+ if (!(v & (1 << (i & 0x1f)))) {
+ atomic_inc(&irq_mis_count);
+
+ if (use_eoi_reg)
+ eoi_ioapic_irq(desc);
+ else {
+ spin_lock(&ioapic_lock);
+ __mask_and_edge_IO_APIC_irq(cfg);
+ __unmask_and_level_IO_APIC_irq(cfg);
+ spin_unlock(&ioapic_lock);
+ }
+ }
+
/* Now we can move and renable the irq */
if (unlikely(do_unmask_irq)) {
/* Only migrate the irq if the ack has been received.
@@ -2616,20 +2630,6 @@ static void ack_apic_level(unsigned int irq)
move_masked_irq(irq);
unmask_IO_APIC_irq_desc(desc);
}
-
- /* Tail end of version 0x11 I/O APIC bug workaround */
- if (!(v & (1 << (i & 0x1f)))) {
- atomic_inc(&irq_mis_count);
-
- if (use_eoi_reg)
- eoi_ioapic_irq(desc);
- else {
- spin_lock(&ioapic_lock);
- __mask_and_edge_IO_APIC_irq(cfg);
- __unmask_and_level_IO_APIC_irq(cfg);
- spin_unlock(&ioapic_lock);
- }
- }
}

#ifdef CONFIG_INTR_REMAP


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


macro at linux-mips

Nov 8, 2009, 11:06 AM

Post #9 of 10 (529 views)
Permalink
Re: [tip:x86/apic] x86: Use EOI register in io-apic on intel platforms [In reply to]

On Fri, 6 Nov 2009, Suresh Siddha wrote:

> > What I mean is if the serial delivery type is used, then an interrupt
> > will be acked twice -- once via an EOI message sent from the local APIC
> > over the serial bus and then again via the write to the EOI register.
>
> Maciej, Case where we are doing an explicit EOI to the io-apic (using
> EOI register) is when the level triggered interrupt gets registered at
> the cpu as an edge interrupt (in the local apic's trigger mode
> register).
>
> It will arrive as an edge interrupt for two cases.
> a) for corner conditions which hit the 82093AA (io-apic version 0x11)
> erratum
> b) with my recent patch in -tip, during a cpu offline, when we send an
> ipi (IPI is always registered as an edge triggered at the cpu) to
> service the interrupt at the new cpu destination, instead of servicing
> at it's old destination cpu (as it has already disabled interrupts and
> going down -- like not being on the cpu_online_map etc).
>
> So we are not acking the io-apic twice in this case, as the eoi to the
> local apic won't brodcast the eoi to the io-apic (because of the edge
> mode in trigger mode register of the local apic).

OK, I see what you mean, but that makes me wonder why you are going
through such contortions. In the case of a CPU going offline I would
expect it to be done more or less in such a way:

1. Write all-zeroes to its local APIC's LDR register and set its TPR to
0xef. This will take this APIC out from LoPri arbitration and thus
from accepting any I/O APIC interrupts. Fixed delivery mode IPIs will
still be accepted (if that's not needed then the TPR can be set to
0xff; any received IPIs will be lost).

2. Service any outstanding interrupts that have already been accepted by
the local APIC (you may have to poll on the local IRR register with
interrupts enabled for a short while).

3. Disable the local APIC via the SVR register, mask local interrupts in
the processor's EFLAGS register and start the offline procedure. This
is the point of no-return, further IPIs won't be accepted and the CPU
has to be put through an INIT-IPI+StartUp-IPI cycle to get in control
again.

If IPI reception was not needed through stage #2 above, then the local
APIC could have been disabled at #1 instead -- interrupts pending in
the local APIC as recorded in the IRR or marked as in-progress in the
ISR are guaranteed to be delivered to the CPU and EOIed (as
appropriate) normally even in the disabled state of the local APIC.

> Do you agree?

If the scenario I have outlined above cannot be made to work for some
reason, then please do me and the others a favour and since with this
change you are tying new functionality to code originally meant as a
workaround for an obscure erratum only, do write a proper explanation and
place it next to the original comment describing previous use of the code.
With your change as it is, it is all but obvious what this piece of code
is meant to do.

Your change is OK with me once accompanied with said comment, but please
investigate my scenario first -- your approach looks like a horrible hack
to me, sorry.

Maciej
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


suresh.b.siddha at intel

Dec 1, 2009, 4:56 PM

Post #10 of 10 (489 views)
Permalink
Re: [tip:x86/apic] x86: Use EOI register in io-apic on intel platforms [In reply to]

Maciej,

Sorry for delayed response. I was busy with other stuff and didn't get a
chance till now to get back on this. I just posted few patches which
came up as a result of our past discussions here in this thread.

Please see my responses inline for your earlier comments:

On Sun, 2009-11-08 at 11:06 -0800, Maciej W. Rozycki wrote:
> OK, I see what you mean, but that makes me wonder why you are going
> through such contortions. In the case of a CPU going offline I would
> expect it to be done more or less in such a way:
>
> 1. Write all-zeroes to its local APIC's LDR register and set its TPR to
> 0xef. This will take this APIC out from LoPri arbitration and thus
> from accepting any I/O APIC interrupts. Fixed delivery mode IPIs will
> still be accepted (if that's not needed then the TPR can be set to
> 0xff; any received IPIs will be lost).
>
> 2. Service any outstanding interrupts that have already been accepted by
> the local APIC (you may have to poll on the local IRR register with
> interrupts enabled for a short while).
>
> 3. Disable the local APIC via the SVR register, mask local interrupts in
> the processor's EFLAGS register and start the offline procedure. This
> is the point of no-return, further IPIs won't be accepted and the CPU
> has to be put through an INIT-IPI+StartUp-IPI cycle to get in control
> again.
>
> If IPI reception was not needed through stage #2 above, then the local
> APIC could have been disabled at #1 instead -- interrupts pending in
> the local APIC as recorded in the IRR or marked as in-progress in the
> ISR are guaranteed to be delivered to the CPU and EOIed (as
> appropriate) normally even in the disabled state of the local APIC.

But before these 3 steps you listed here, we need to migrate the irq to
the new destination. And that step will modify the IO-APIC RTE with the
new vector and new destination. And during this process, remoteIRR of
the IO-APIC RTE might be set and this inflight interrupt will get
registered at the original destination that is going offline.

So when we come to step 2 you listed above and service any outstanding
interrupts, EOI broadcast as part of that handling won't clear the
remoteIRR of the IO-APIC RTE, as the vector information in the io-apic
RTE got modified because of irq migration and is different from the
vector information in EOI broadcast message sent by the cpu. This will
result in stuck level interrupt.

This is one of the challenges Eric Biederman had in the past and he
tried things like polling from the process context and modifying the
IO-APIC RTE (with new destination and vector information) only when the
remoteIRR is not set etc. But Eric still saw some hangs and stuck
interrupt conditions with his experimental patches.

We took a route which needed minor changes to the existing code and fix
the local_irq_enable()/local_irq_disable() issue and stuck interrupt
issue in the cpu offline path by using the IO-APIC EOI register. Our
tests on Intel platforms having an EOI register for io-apic's and IBM's
(Gary) tests on io-apic's which don't have EOI register using AMD
platforms worked fine with our approach.

> > Do you agree?
>
> If the scenario I have outlined above cannot be made to work for some
> reason,

Perhaps we can make it work but it needs more changes and validation.
And atleast Eric's similar experiments in the past didn't yield good
results.

> then please do me and the others a favour and since with this
> change you are tying new functionality to code originally meant as a
> workaround for an obscure erratum only, do write a proper explanation and
> place it next to the original comment describing previous use of the code.
> With your change as it is, it is all but obvious what this piece of code
> is meant to do.

This patch was also in the series that I just sent.

> Your change is OK with me once accompanied with said comment, but please
> investigate my scenario first -- your approach looks like a horrible hack
> to me, sorry.

This being a fragile area and considering our experiences in the past, I
leveraged the existing code (of clearing remoteIRR manually on ioapic's
which don't have an EOI register) and luckily we had good success so
far.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.