Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


suresh.b.siddha at intel

May 21, 2012, 4:58 PM

Post #1 of 9 (99 views)
Permalink
[PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain

Until now, irq_cfg domain is mostly static. Either all cpu's (used by flat
mode) or one cpu (first cpu in the irq afffinity mask) to which irq is being
migrated (this is used by the rest of apic modes).

Upcoming x2apic cluster mode optimization patch allows the irq to be sent
to any cpu in the x2apic cluster (if supported by the HW). So irq_cfg
domain changes on the fly (depending on which cpu in the x2apic cluster
is online).

Instead of checking for any intersection between the new irq affinity
mask and the current irq_cfg domain, check if the new irq affinity mask
is a subset of the current irq_cfg domain. Otherwise proceed with
updating the irq_cfg domain aswell as assigning vector's on all the cpu's
specified in the new mask.

This also cleans up a workaround in updating irq_cfg domain for legacy irq's
that are handled by the IO-APIC.

Signed-off-by: Suresh Siddha <suresh.b.siddha [at] intel>
---
arch/x86/kernel/apic/io_apic.c | 15 ++++++---------
1 files changed, 6 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index ffdc152..bbf8c43 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1137,8 +1137,7 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
old_vector = cfg->vector;
if (old_vector) {
cpumask_and(tmp_mask, mask, cpu_online_mask);
- cpumask_and(tmp_mask, cfg->domain, tmp_mask);
- if (!cpumask_empty(tmp_mask)) {
+ if (cpumask_subset(tmp_mask, cfg->domain)) {
free_cpumask_var(tmp_mask);
return 0;
}
@@ -1152,6 +1151,11 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)

apic->vector_allocation_domain(cpu, tmp_mask);

+ if (cpumask_subset(tmp_mask, cfg->domain)) {
+ free_cpumask_var(tmp_mask);
+ return 0;
+ }
+
vector = current_vector;
offset = current_offset;
next:
@@ -1357,13 +1361,6 @@ static void setup_ioapic_irq(unsigned int irq, struct irq_cfg *cfg,

if (!IO_APIC_IRQ(irq))
return;
- /*
- * For legacy irqs, cfg->domain starts with cpu 0 for legacy
- * controllers like 8259. Now that IO-APIC can handle this irq, update
- * the cfg->domain.
- */
- if (irq < legacy_pic->nr_legacy_irqs && cpumask_test_cpu(0, cfg->domain))
- apic->vector_allocation_domain(0, cfg->domain);

if (assign_irq_vector(irq, cfg, apic->target_cpus()))
return;
--
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


agordeev at redhat

Jun 6, 2012, 10:20 AM

Post #2 of 9 (90 views)
Permalink
Re: [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain [In reply to]

On Mon, May 21, 2012 at 04:58:01PM -0700, Suresh Siddha wrote:
> Until now, irq_cfg domain is mostly static. Either all cpu's (used by flat
> mode) or one cpu (first cpu in the irq afffinity mask) to which irq is being
> migrated (this is used by the rest of apic modes).
>
> Upcoming x2apic cluster mode optimization patch allows the irq to be sent
> to any cpu in the x2apic cluster (if supported by the HW). So irq_cfg
> domain changes on the fly (depending on which cpu in the x2apic cluster
> is online).
>
> Instead of checking for any intersection between the new irq affinity
> mask and the current irq_cfg domain, check if the new irq affinity mask
> is a subset of the current irq_cfg domain. Otherwise proceed with
> updating the irq_cfg domain aswell as assigning vector's on all the cpu's
> specified in the new mask.
>
> This also cleans up a workaround in updating irq_cfg domain for legacy irq's
> that are handled by the IO-APIC.

Suresh,

I thought you posted these patches for reference and held off with my comments
until you are collecting the data. But since Ingo picked the patches I will
sound my concerns in this thread.

>
> Signed-off-by: Suresh Siddha <suresh.b.siddha [at] intel>
> ---
> arch/x86/kernel/apic/io_apic.c | 15 ++++++---------
> 1 files changed, 6 insertions(+), 9 deletions(-)
>
> diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> index ffdc152..bbf8c43 100644
> --- a/arch/x86/kernel/apic/io_apic.c
> +++ b/arch/x86/kernel/apic/io_apic.c
> @@ -1137,8 +1137,7 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
> old_vector = cfg->vector;
> if (old_vector) {
> cpumask_and(tmp_mask, mask, cpu_online_mask);
> - cpumask_and(tmp_mask, cfg->domain, tmp_mask);
> - if (!cpumask_empty(tmp_mask)) {
> + if (cpumask_subset(tmp_mask, cfg->domain)) {

Imagine that passed mask is a subset of cfg->domain and also contains at least
one online CPU from a different cluster. Since domains are always one cluster
wide this condition ^^^ will fail and we go further.

> free_cpumask_var(tmp_mask);
> return 0;
> }
> @@ -1152,6 +1151,11 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
>
> apic->vector_allocation_domain(cpu, tmp_mask);
>
> + if (cpumask_subset(tmp_mask, cfg->domain)) {

Because the mask intersects with cfg->domain this condition ^^^ may succeed and
we could return with no change from here.

That raises few concerns to me:

- The first check is not perfect, because it failed to recognize the
intersection right away. Instead, we possibly lost multiple loops through the
mask before we realized we do not need any change at all. Therefore...

- It would be better to recognize the intersection even before entering the
loop. But that is exactly what the removed code has been doing before.

- Depending from the passed mask, we equally likely could have select another
cluster and switch to it, even though the current cfg->domain is contained
within the requested mask. Besides it is just not nice, we are also switching
from a cache-hot cluster. If you suggested that it is enough to pick a first
found cluster (rather than select a best possible) then there is even less
reason to switch from cfg->domain here.

> + free_cpumask_var(tmp_mask);
> + return 0;
> + }
> +
> vector = current_vector;
> offset = current_offset;
> next:
> @@ -1357,13 +1361,6 @@ static void setup_ioapic_irq(unsigned int irq, struct irq_cfg *cfg,
>
> if (!IO_APIC_IRQ(irq))
> return;
> - /*
> - * For legacy irqs, cfg->domain starts with cpu 0 for legacy
> - * controllers like 8259. Now that IO-APIC can handle this irq, update
> - * the cfg->domain.
> - */
> - if (irq < legacy_pic->nr_legacy_irqs && cpumask_test_cpu(0, cfg->domain))
> - apic->vector_allocation_domain(0, cfg->domain);


This hunk reverts your 69c89ef commit. Regression?

>
> if (assign_irq_vector(irq, cfg, apic->target_cpus()))
> return;
> --
> 1.7.6.5
>

--
Regards,
Alexander Gordeev
agordeev [at] redhat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


suresh.b.siddha at intel

Jun 6, 2012, 4:02 PM

Post #3 of 9 (97 views)
Permalink
Re: [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain [In reply to]

On Wed, 2012-06-06 at 19:20 +0200, Alexander Gordeev wrote:
> On Mon, May 21, 2012 at 04:58:01PM -0700, Suresh Siddha wrote:
> > Until now, irq_cfg domain is mostly static. Either all cpu's (used by flat
> > mode) or one cpu (first cpu in the irq afffinity mask) to which irq is being
> > migrated (this is used by the rest of apic modes).
> >
> > Upcoming x2apic cluster mode optimization patch allows the irq to be sent
> > to any cpu in the x2apic cluster (if supported by the HW). So irq_cfg
> > domain changes on the fly (depending on which cpu in the x2apic cluster
> > is online).
> >
> > Instead of checking for any intersection between the new irq affinity
> > mask and the current irq_cfg domain, check if the new irq affinity mask
> > is a subset of the current irq_cfg domain. Otherwise proceed with
> > updating the irq_cfg domain aswell as assigning vector's on all the cpu's
> > specified in the new mask.
> >
> > This also cleans up a workaround in updating irq_cfg domain for legacy irq's
> > that are handled by the IO-APIC.
>
> Suresh,
>
> I thought you posted these patches for reference and held off with my comments
> until you are collecting the data. But since Ingo picked the patches I will
> sound my concerns in this thread.

These are tested patches and I am ok with Ingo picking it up for getting
further baked in -tip. About the data collection, I have to find the
right system/bios to run the tests for power-aware/round-robin interrupt
routing. Anyways logical xapic mode already has this capability and we
are adding the capability for x2apic cluster mode here. And also,
irqbalance has to ultimately take advantage of this by specifying
multiple cpu's when migrating an interrupt.

Only concern I have with this patchset is what I already mentioned in
the changelog of the second patch. i.e., it reduces the number of IRQ's
that the platform can handle, as we reduce the available number of
vectors by a factor of 16.

If this indeed becomes a problem, then there are few options. Either
reserve the vectors based on the irq destination mask (rather than
reserving on all the cluster members) or reducing the grouping from 16
to a smaller number etc. I can post another patch shortly for this.

> >
> > Signed-off-by: Suresh Siddha <suresh.b.siddha [at] intel>
> > ---
> > arch/x86/kernel/apic/io_apic.c | 15 ++++++---------
> > 1 files changed, 6 insertions(+), 9 deletions(-)
> >
> > diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
> > index ffdc152..bbf8c43 100644
> > --- a/arch/x86/kernel/apic/io_apic.c
> > +++ b/arch/x86/kernel/apic/io_apic.c
> > @@ -1137,8 +1137,7 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
> > old_vector = cfg->vector;
> > if (old_vector) {
> > cpumask_and(tmp_mask, mask, cpu_online_mask);
> > - cpumask_and(tmp_mask, cfg->domain, tmp_mask);
> > - if (!cpumask_empty(tmp_mask)) {
> > + if (cpumask_subset(tmp_mask, cfg->domain)) {
>
> Imagine that passed mask is a subset of cfg->domain and also contains at least
> one online CPU from a different cluster. Since domains are always one cluster
> wide this condition ^^^ will fail and we go further.
>
> > free_cpumask_var(tmp_mask);
> > return 0;
> > }
> > @@ -1152,6 +1151,11 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
> >
> > apic->vector_allocation_domain(cpu, tmp_mask);
> >
> > + if (cpumask_subset(tmp_mask, cfg->domain)) {
>
> Because the mask intersects with cfg->domain this condition ^^^ may succeed and
> we could return with no change from here.
>
> That raises few concerns to me:
> - The first check is not perfect, because it failed to recognize the
> intersection right away. Instead, we possibly lost multiple loops through the
> mask before we realized we do not need any change at all. Therefore...
>
> - It would be better to recognize the intersection even before entering the
> loop. But that is exactly what the removed code has been doing before.
>
> - Depending from the passed mask, we equally likely could have select another
> cluster and switch to it, even though the current cfg->domain is contained
> within the requested mask. Besides it is just not nice, we are also switching
> from a cache-hot cluster. If you suggested that it is enough to pick a first
> found cluster (rather than select a best possible) then there is even less
> reason to switch from cfg->domain here.

Few things to keep in perspective.

this is generic portion of the vector handling code and has to work
across different apic drivers and their cfg domains. And also most of
the intelligence lies in the irqbalance which specifies the irq
destination mask. Traditionally Kernel code selected the first possible
destination and not the best destination among the specified mask.

Anyways the above hunks are trying to address scenario like this for
example: During boot, all the IO-APIC interrupts (legacy/non-legacy) are
routed to cpu-0 with only cpu-0 in their cfg->domain (as we don't know
which other cpu's fall into the same x2apic cluster, we can't pre-set
them in the cfg->domain). Consider a single socket system. After the SMP
bringup of other siblings, those io-apic irq's affinity is modified to
all cpu's in setup_ioapic_dest(). And with the current code,
assign_irq_vector() will bail immediately with out reserving the
corresponding vector on all the cluster members that are now online. And
the interrupt ends up going to only cpu-0 and it will not get corrected
as long as cpu-0 is in the specified interrupt destination mask.

> > + free_cpumask_var(tmp_mask);
> > + return 0;
> > + }
> > +
> > vector = current_vector;
> > offset = current_offset;
> > next:
> > @@ -1357,13 +1361,6 @@ static void setup_ioapic_irq(unsigned int irq, struct irq_cfg *cfg,
> >
> > if (!IO_APIC_IRQ(irq))
> > return;
> > - /*
> > - * For legacy irqs, cfg->domain starts with cpu 0 for legacy
> > - * controllers like 8259. Now that IO-APIC can handle this irq, update
> > - * the cfg->domain.
> > - */
> > - if (irq < legacy_pic->nr_legacy_irqs && cpumask_test_cpu(0, cfg->domain))
> > - apic->vector_allocation_domain(0, cfg->domain);
>
>
> This hunk reverts your 69c89ef commit. Regression?
>

As I mentioned in the changelog, this patch removes the need for that
hacky workaround. commit 69c89ef didn't really fix the underlying
problem (and hence we re-encountered the similar issue (above mentioned)
in the context of x2apic cluster). Clean fix is to address the issue in
assign_irq_vector() which is what this patch does.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


suresh.b.siddha at intel

Jun 15, 2012, 5:25 PM

Post #4 of 9 (86 views)
Permalink
Re: [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain [In reply to]

On Wed, 2012-06-06 at 16:02 -0700, Suresh Siddha wrote:
> On Wed, 2012-06-06 at 19:20 +0200, Alexander Gordeev wrote:
> >
> > Suresh,
> >
> > I thought you posted these patches for reference and held off with my comments
> > until you are collecting the data. But since Ingo picked the patches I will
> > sound my concerns in this thread.
>
> These are tested patches and I am ok with Ingo picking it up for getting
> further baked in -tip. About the data collection, I have to find the
> right system/bios to run the tests for power-aware/round-robin interrupt
> routing. Anyways logical xapic mode already has this capability and we
> are adding the capability for x2apic cluster mode here. And also,
> irqbalance has to ultimately take advantage of this by specifying
> multiple cpu's when migrating an interrupt.
>
> Only concern I have with this patchset is what I already mentioned in
> the changelog of the second patch. i.e., it reduces the number of IRQ's
> that the platform can handle, as we reduce the available number of
> vectors by a factor of 16.
>
> If this indeed becomes a problem, then there are few options. Either
> reserve the vectors based on the irq destination mask (rather than
> reserving on all the cluster members) or reducing the grouping from 16
> to a smaller number etc. I can post another patch shortly for this.
>

Ok, here is a quick version of the patch doing the first option I
mentioned above. Reserve the vectors based on the irq destination mask
and the corresponding vector domain, rather than reserving the vector on
all the cpu's for the theoretical domain (which is an x2apic cluster).

Will do some more testing and post the patch with detailed changelog on
Monday. For now, appended the patch for quick reference. Thanks.
---

Signed-off-by: Suresh Siddha <suresh.b.siddha [at] intel>
---
arch/x86/include/asm/apic.h | 10 +++++++---
arch/x86/kernel/apic/apic_noop.c | 3 ++-
arch/x86/kernel/apic/io_apic.c | 28 ++++++++++++++++++++++++++--
arch/x86/kernel/apic/x2apic_cluster.c | 7 ++++---
4 files changed, 39 insertions(+), 9 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 8619a87..af9ec10 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -303,10 +303,12 @@ struct apic {
int disable_esr;

int dest_logical;
+ bool sub_domain;
unsigned long (*check_apicid_used)(physid_mask_t *map, int apicid);
unsigned long (*check_apicid_present)(int apicid);

- bool (*vector_allocation_domain)(int cpu, struct cpumask *retmask);
+ bool (*vector_allocation_domain)(int cpu, struct cpumask *retmask,
+ const struct cpumask *mask);
void (*init_apic_ldr)(void);

void (*ioapic_phys_id_map)(physid_mask_t *phys_map, physid_mask_t *retmap);
@@ -615,7 +617,8 @@ default_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
unsigned int *apicid);

static inline bool
-flat_vector_allocation_domain(int cpu, struct cpumask *retmask)
+flat_vector_allocation_domain(int cpu, struct cpumask *retmask,
+ const struct cpumask *mask)
{
/* Careful. Some cpus do not strictly honor the set of cpus
* specified in the interrupt destination when using lowest
@@ -631,7 +634,8 @@ flat_vector_allocation_domain(int cpu, struct cpumask *retmask)
}

static inline bool
-default_vector_allocation_domain(int cpu, struct cpumask *retmask)
+default_vector_allocation_domain(int cpu, struct cpumask *retmask,
+ const struct cpumask *mask)
{
cpumask_copy(retmask, cpumask_of(cpu));
return true;
diff --git a/arch/x86/kernel/apic/apic_noop.c b/arch/x86/kernel/apic/apic_noop.c
index 65c07fc..ebdd349 100644
--- a/arch/x86/kernel/apic/apic_noop.c
+++ b/arch/x86/kernel/apic/apic_noop.c
@@ -100,7 +100,8 @@ static unsigned long noop_check_apicid_present(int bit)
return physid_isset(bit, phys_cpu_present_map);
}

-static bool noop_vector_allocation_domain(int cpu, struct cpumask *retmask)
+static bool noop_vector_allocation_domain(int cpu, struct cpumask *retmask,
+ struct cpumask *mask)
{
if (cpu != 0)
pr_warning("APIC: Vector allocated for non-BSP cpu\n");
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 99a794d..3d1c37c 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1126,7 +1126,18 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
old_vector = cfg->vector;
if (old_vector) {
cpumask_and(tmp_mask, mask, cpu_online_mask);
- if (cpumask_subset(tmp_mask, cfg->domain)) {
+ /*
+ * If the APIC driver says it can't reliably route the
+ * interrupt to some subset of the domain members and the
+ * incoming affinity mask is already a subset of the domain,
+ * then we are done. We will keep the vector assigned in all
+ * the domain members.
+ *
+ * Or if the new mask is same as the existing domain, then
+ * also nothing more to do here.
+ */
+ if ((!apic->sub_domain && cpumask_subset(tmp_mask, cfg->domain))
+ || (cpumask_equal(tmp_mask, cfg->domain))) {
free_cpumask_var(tmp_mask);
return 0;
}
@@ -1139,9 +1150,22 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
int vector, offset;
bool more_domains;

- more_domains = apic->vector_allocation_domain(cpu, tmp_mask);
+ more_domains =
+ apic->vector_allocation_domain(cpu, tmp_mask, mask);

if (cpumask_subset(tmp_mask, cfg->domain)) {
+ /*
+ * If the APIC driver can route the interrupt
+ * to some subset of the domain, then we can
+ * cleanup the vector allocation for other members.
+ */
+ if (apic->sub_domain &&
+ !cpumask_equal(tmp_mask, cfg->domain)) {
+ cpumask_andnot(cfg->old_domain, cfg->domain,
+ tmp_mask);
+ cfg->move_in_progress = 1;
+ cpumask_and(cfg->domain, cfg->domain, tmp_mask);
+ }
free_cpumask_var(tmp_mask);
return 0;
}
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 943d03f..46cbbe1 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -212,10 +212,10 @@ static int x2apic_cluster_probe(void)
/*
* Each x2apic cluster is an allocation domain.
*/
-static bool cluster_vector_allocation_domain(int cpu, struct cpumask *retmask)
+static bool cluster_vector_allocation_domain(int cpu, struct cpumask *retmask,
+ const struct cpumask *mask)
{
- cpumask_clear(retmask);
- cpumask_copy(retmask, per_cpu(cpus_in_cluster, cpu));
+ cpumask_and(retmask, mask, per_cpu(cpus_in_cluster, cpu));
return true;
}

@@ -233,6 +233,7 @@ static struct apic apic_x2apic_cluster = {
.target_cpus = online_target_cpus,
.disable_esr = 0,
.dest_logical = APIC_DEST_LOGICAL,
+ .sub_domain = true,
.check_apicid_used = NULL,
.check_apicid_present = NULL,



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


agordeev at redhat

Jun 18, 2012, 2:17 AM

Post #5 of 9 (82 views)
Permalink
Re: [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain [In reply to]

On Fri, Jun 15, 2012 at 05:25:20PM -0700, Suresh Siddha wrote:
> On Wed, 2012-06-06 at 16:02 -0700, Suresh Siddha wrote:
> Ok, here is a quick version of the patch doing the first option I
> mentioned above. Reserve the vectors based on the irq destination mask
> and the corresponding vector domain, rather than reserving the vector on
> all the cpu's for the theoretical domain (which is an x2apic cluster).

Suresh,

I would suggest to go further and generalize your idea and introduce something
like compare_domains() instead sub_domain flag. This would help us to keep/let
__assign_irq_vector() be more generic and hide domains treating logic in
drivers.

Also, this would fix a previous subtle condition introduced with previous
commit 332afa6 ("x86/irq: Update irq_cfg domain unless the new affinity is a
subset of the current domain") when a change of affinity mask could trigger
unnecessary move of vector in case new and old masks intersect, but the new
mask is not a subset of the old one.

The patch is just to clarify the idea, neither tested nor even compiled.


diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index eec240e..fcca995 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -306,7 +306,10 @@ struct apic {
unsigned long (*check_apicid_used)(physid_mask_t *map, int apicid);
unsigned long (*check_apicid_present)(int apicid);

- bool (*vector_allocation_domain)(int cpu, struct cpumask *retmask);
+ bool (*vector_allocation_domain)(int cpu,
+ const struct cpumask *mask,
+ struct cpumask *retmask);
+ int (*compare_domains)(const struct cpumask *, const struct cpumask *);
void (*init_apic_ldr)(void);

void (*ioapic_phys_id_map)(physid_mask_t *phys_map, physid_mask_t *retmap);
@@ -615,7 +618,9 @@ default_cpu_mask_to_apicid_and(const struct cpumask *cpumask,
unsigned int *apicid);

static inline bool
-flat_vector_allocation_domain(int cpu, struct cpumask *retmask)
+flat_vector_allocation_domain(int cpu,
+ const struct cpumask *mask,
+ struct cpumask *retmask)
{
/* Careful. Some cpus do not strictly honor the set of cpus
* specified in the interrupt destination when using lowest
@@ -630,13 +635,33 @@ flat_vector_allocation_domain(int cpu, struct cpumask *retmask)
return false;
}

+static inline int
+flat_compare_domains(const struct cpumask *domain1,
+ const struct cpumask *domain2)
+{
+ if (cpumask_subset(domain1, domain2))
+ return 0;
+ return 1;
+}
+
static inline bool
-default_vector_allocation_domain(int cpu, struct cpumask *retmask)
+default_vector_allocation_domain(int cpu,
+ const struct cpumask *mask,
+ struct cpumask *retmask)
{
cpumask_copy(retmask, cpumask_of(cpu));
return true;
}

+static inline int
+default_compare_domains(const struct cpumask *domain1,
+ const struct cpumask *domain2)
+{
+ if (cpumask_intersects(domain1, domain2))
+ return 0;
+ return 1;
+}
+
static inline unsigned long default_check_apicid_used(physid_mask_t *map, int apicid)
{
return physid_isset(apicid, *map);
diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 0540f08..88cafc8 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -1126,7 +1126,7 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
old_vector = cfg->vector;
if (old_vector) {
cpumask_and(tmp_mask, mask, cpu_online_mask);
- if (cpumask_subset(tmp_mask, cfg->domain)) {
+ if (!apic->compare_domains(tmp_mask, cfg->domain)) {
free_cpumask_var(tmp_mask);
return 0;
}
@@ -1138,47 +1138,50 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
int new_cpu;
int vector, offset;
bool more_domains;
+ int cmp_domains;

- more_domains = apic->vector_allocation_domain(cpu, tmp_mask);
+ more = apic->vector_allocation_domain(cpu, mask, tmp_mask);
+ cmp = apic->compare_domains(tmp_mask, cfg->domain);

- if (cpumask_subset(tmp_mask, cfg->domain)) {
- free_cpumask_var(tmp_mask);
- return 0;
- }
-
- vector = current_vector;
- offset = current_offset;
+ if (cmp < 0) {
+ cpumask_andnot(cfg->old_domain, cfg->domain, tmp_mask);
+ cfg->move_in_progress = 1;
+ cpumask_and(cfg->domain, cfg->domain, tmp_mask);
+ } else if (cmp > 0) {
+ vector = current_vector;
+ offset = current_offset;
next:
- vector += 16;
- if (vector >= first_system_vector) {
- offset = (offset + 1) % 16;
- vector = FIRST_EXTERNAL_VECTOR + offset;
- }
-
- if (unlikely(current_vector == vector)) {
- if (more_domains)
- continue;
- else
- break;
- }
+ vector += 16;
+ if (vector >= first_system_vector) {
+ offset = (offset + 1) % 16;
+ vector = FIRST_EXTERNAL_VECTOR + offset;
+ }

- if (test_bit(vector, used_vectors))
- goto next;
+ if (unlikely(current_vector == vector)) {
+ if (more)
+ continue;
+ else
+ break;
+ }

- for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
- if (per_cpu(vector_irq, new_cpu)[vector] != -1)
+ if (test_bit(vector, used_vectors))
goto next;
- /* Found one! */
- current_vector = vector;
- current_offset = offset;
- if (old_vector) {
- cfg->move_in_progress = 1;
- cpumask_copy(cfg->old_domain, cfg->domain);
+
+ for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
+ if (per_cpu(vector_irq, new_cpu)[vector] != -1)
+ goto next;
+ /* Found one! */
+ current_vector = vector;
+ current_offset = offset;
+ if (old_vector) {
+ cfg->move_in_progress = 1;
+ cpumask_copy(cfg->old_domain, cfg->domain);
+ }
+ for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
+ per_cpu(vector_irq, new_cpu)[vector] = irq;
+ cfg->vector = vector;
+ cpumask_copy(cfg->domain, tmp_mask);
}
- for_each_cpu_and(new_cpu, tmp_mask, cpu_online_mask)
- per_cpu(vector_irq, new_cpu)[vector] = irq;
- cfg->vector = vector;
- cpumask_copy(cfg->domain, tmp_mask);
err = 0;
break;
}
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 943d03f..090c69e 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -212,13 +212,24 @@ static int x2apic_cluster_probe(void)
/*
* Each x2apic cluster is an allocation domain.
*/
-static bool cluster_vector_allocation_domain(int cpu, struct cpumask *retmask)
+static bool cluster_vector_allocation_domain(int cpu,
+ const struct cpumask *mask,
+ struct cpumask *retmask)
{
- cpumask_clear(retmask);
- cpumask_copy(retmask, per_cpu(cpus_in_cluster, cpu));
+ cpumask_and(retmask, mask, per_cpu(cpus_in_cluster, cpu));
return true;
}

+static int cluster_compare_domains(const struct cpumask *domain1,
+ const struct cpumask *domain2)
+{
+ if (cpumask_subset(domain1, domain2))
+ return -1;
+ if (cpumask_equal(domain1, domain2))
+ return 0;
+ return 1;
+}
+
static struct apic apic_x2apic_cluster = {

.name = "cluster x2apic",
@@ -237,6 +248,7 @@ static struct apic apic_x2apic_cluster = {
.check_apicid_present = NULL,

.vector_allocation_domain = cluster_vector_allocation_domain,
+ .compare_domains = cluster_compare_domains,
.init_apic_ldr = init_x2apic_ldr,

.ioapic_phys_id_map = NULL,
diff --git a/arch/x86/kernel/apic/x2apic_phys.c b/arch/x86/kernel/apic/x2apic_phys.c
index e03a1e1..24511a9 100644
--- a/arch/x86/kernel/apic/x2apic_phys.c
+++ b/arch/x86/kernel/apic/x2apic_phys.c
@@ -106,6 +106,7 @@ static struct apic apic_x2apic_phys = {
.check_apicid_present = NULL,

.vector_allocation_domain = default_vector_allocation_domain,
+ .compare_domains = default_compare_domains,
.init_apic_ldr = init_x2apic_ldr,

.ioapic_phys_id_map = NULL,

--
Regards,
Alexander Gordeev
agordeev [at] redhat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


suresh.b.siddha at intel

Jun 18, 2012, 5:51 PM

Post #6 of 9 (81 views)
Permalink
Re: [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain [In reply to]

On Mon, 2012-06-18 at 11:17 +0200, Alexander Gordeev wrote:
> I would suggest to go further and generalize your idea and introduce something
> like compare_domains() instead sub_domain flag. This would help us to keep/let
> __assign_irq_vector() be more generic and hide domains treating logic in
> drivers.

That sounds good.

> @@ -1138,47 +1138,50 @@ __assign_irq_vector(int irq, struct irq_cfg *cfg, const struct cpumask *mask)
> int new_cpu;
> int vector, offset;
> bool more_domains;
> + int cmp_domains;
>
> - more_domains = apic->vector_allocation_domain(cpu, tmp_mask);
> + more = apic->vector_allocation_domain(cpu, mask, tmp_mask);
> + cmp = apic->compare_domains(tmp_mask, cfg->domain);

I think we should be able to consolidate both of them into one
apic_driver specific routine. Also I think we should be able to use the
tmp_mask in each round and minimize the number of unnecessary
for_each_cpu iterations. And that should clean up the more_domains bool
logic we did earlier.

Will post the complete patch in a day.

thanks,
suresh



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


suresh.b.siddha at intel

Jun 19, 2012, 5:18 PM

Post #7 of 9 (77 views)
Permalink
Re: [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain [In reply to]

On Mon, 2012-06-18 at 17:51 -0700, Suresh Siddha wrote:
> > - more_domains = apic->vector_allocation_domain(cpu, tmp_mask);
> > + more = apic->vector_allocation_domain(cpu, mask, tmp_mask);
> > + cmp = apic->compare_domains(tmp_mask, cfg->domain);
>
> I think we should be able to consolidate both of them into one
> apic_driver specific routine. Also I think we should be able to use the
> tmp_mask in each round and minimize the number of unnecessary
> for_each_cpu iterations. And that should clean up the more_domains bool
> logic we did earlier.

Just posted couple of patches which did the above. Please review and
ack.

On Mon, 2012-06-18 at 11:17 +0200, Alexander Gordeev wrote:
> Also, this would fix a previous subtle condition introduced with previous
> commit 332afa6 ("x86/irq: Update irq_cfg domain unless the new affinity is a
> subset of the current domain") when a change of affinity mask could trigger
> unnecessary move of vector in case new and old masks intersect, but the new
> mask is not a subset of the old one.

So, I gave some thought to this. And I don't like/didn't address it for
the reasons I mentioned earlier. I don't want to add intelligence to the
apic drivers. That belongs to a higher level entity like 'irqbalance'.
Also when irqbalance asks for a specific mask, I don't want its behavior
to depend on the previous configuration. And I want the behavior to be
some what consistent across different apic drivers whose underlying HW
behavior is similar. For example logical-flat with no HW round-robin
will route the interrupt to the first cpu specified in the user-mask.

Also this is a not common case worth caring about.

BTW, there is still one open that I would like to address. How to handle
the vector pressure during boot etc (as the default vector assignment
specifies all online cpus) when there are lot interrupt sources but
fewer x2apic clusters (like one or two socket server case).

We should be able to do something like the appended. Any better
suggestions? I don't want to add boot parameters to limit the x2apic
cluster membership etc (to fewer than 16 logical cpu's) if possible.
---

arch/x86/kernel/apic/x2apic_cluster.c | 15 ++++++++++++++-
1 files changed, 14 insertions(+), 1 deletions(-)

diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index 4d49512..0e3d659 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -217,7 +217,20 @@ static void cluster_vector_allocation_domain(const struct cpumask *mask,
const struct cpumask *prevmask)
{
int cpu = cpumask_first(retmask);
- cpumask_and(retmask, mask, per_cpu(cpus_in_cluster, cpu));
+
+ /*
+ * By default during boot, device bringup etc interrupt
+ * will be routed to a specific cpu.
+ *
+ * On irq migration requests coming from irqbalance etc,
+ * interrupts will be routed to the x2apic cluster (cluster-id
+ * derived from the first cpu in the mask) members specified
+ * in the mask.
+ */
+ if (mask == cpu_online_mask)
+ cpumask_copy(retmask, cpumask_of(cpu));
+ else
+ cpumask_and(retmask, mask, per_cpu(cpus_in_cluster, cpu));
}

static struct apic apic_x2apic_cluster = {


thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


agordeev at redhat

Jun 21, 2012, 4:00 AM

Post #8 of 9 (74 views)
Permalink
Re: [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain [In reply to]

On Tue, Jun 19, 2012 at 05:18:42PM -0700, Suresh Siddha wrote:
> On Mon, 2012-06-18 at 17:51 -0700, Suresh Siddha wrote:
> BTW, there is still one open that I would like to address. How to handle
> the vector pressure during boot etc (as the default vector assignment
> specifies all online cpus) when there are lot interrupt sources but
> fewer x2apic clusters (like one or two socket server case).
>
> We should be able to do something like the appended. Any better
> suggestions? I don't want to add boot parameters to limit the x2apic
> cluster membership etc (to fewer than 16 logical cpu's) if possible.

This cpu_online_mask approach should work IMO. Although it looks little bit
hacky for me. May be we could start with default_vector_allocation_domain()
and explicitly switch to cluster_vector_allocation_domain() once booted?

As of boot parameters, I can think of multi-pass walk thru a cpumask to find a
free cluster => core => sibling. In worst case I can imagine a vector space
defragmentator. But nothing really small to avoid current code reshake.

Also, how heavy the vector pressure actually is?

--
Regards,
Alexander Gordeev
agordeev [at] redhat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


suresh.b.siddha at intel

Jun 21, 2012, 2:58 PM

Post #9 of 9 (72 views)
Permalink
Re: [PATCH 1/2] x86, irq: update irq_cfg domain unless the new affinity is a subset of the current domain [In reply to]

On Thu, 2012-06-21 at 13:00 +0200, Alexander Gordeev wrote:
> On Tue, Jun 19, 2012 at 05:18:42PM -0700, Suresh Siddha wrote:
> > On Mon, 2012-06-18 at 17:51 -0700, Suresh Siddha wrote:
> > BTW, there is still one open that I would like to address. How to handle
> > the vector pressure during boot etc (as the default vector assignment
> > specifies all online cpus) when there are lot interrupt sources but
> > fewer x2apic clusters (like one or two socket server case).
> >
> > We should be able to do something like the appended. Any better
> > suggestions? I don't want to add boot parameters to limit the x2apic
> > cluster membership etc (to fewer than 16 logical cpu's) if possible.
>
> This cpu_online_mask approach should work IMO. Although it looks little bit
> hacky for me. May be we could start with default_vector_allocation_domain()
> and explicitly switch to cluster_vector_allocation_domain() once booted?

It is not just during boot. Module load/unload will also go through
these paths.

> As of boot parameters, I can think of multi-pass walk thru a cpumask to find a
> free cluster => core => sibling. In worst case I can imagine a vector space
> defragmentator. But nothing really small to avoid current code reshake.

We can probably go with something simple as I sent earlier (third patch
in the new version does this). Depending on the need/future use-cases,
we can further enhance this later.

> Also, how heavy the vector pressure actually is?

I don't know. but I suspect one socket server where they can be one
x2apic cluster (for example 8-core with HT), will see some pressure on
some platforms.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.