Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[v7 0/8] Reduce cross CPU IPI interference

 

 

First page Previous page 1 2 3 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded


gilad at benyossef

Jan 26, 2012, 2:01 AM

Post #1 of 55 (107 views)
Permalink
[v7 0/8] Reduce cross CPU IPI interference

We have lots of infrastructure in place to partition multi-core systems
such that we have a group of CPUs that are dedicated to specific task:
cgroups, scheduler and interrupt affinity, and cpuisol= boot parameter.
Still, kernel code will at times interrupt all CPUs in the system via IPIs
for various needs. These IPIs are useful and cannot be avoided altogether,
but in certain cases it is possible to interrupt only specific CPUs that
have useful work to do and not the entire system.

This patch set, inspired by discussions with Peter Zijlstra and Frederic
Weisbecker when testing the nohz task patch set, is a first stab at trying
to explore doing this by locating the places where such global IPI calls
are being made and turning the global IPI into an IPI for a specific group
of CPUs. The purpose of the patch set is to get feedback if this is the
right way to go for dealing with this issue and indeed, if the issue is
even worth dealing with at all. Based on the feedback from this patch set
I plan to offer further patches that address similar issue in other code
paths.

The patch creates an on_each_cpu_mask and on_each_cpu_cond infrastructure
API (the former derived from existing arch specific versions in Tile and
Arm) and uses them to turn several global IPI invocation to per CPU
group invocations.

This 7th iteration includes the following changes, all based on feedback
from Milton Miller:

- Use a static cpumask_t to track CPUs with pcps in drain_all_pages.
- Fix logic bug sending an IPI based on state of pcps in last zone only
(and re-run tests to make sure we still see the benefits).
- Use bool and smp_call_func_t for on_each_cpu_cond prototype.
- Accept a GFP flags parameters by on_each_cpu_cond for cpumask allocation.
- Disable preemption around for_each_online_cpu in on_each_cpu_cond to avoid.
racing with hotplug events.
- Use bool and smp_call_func_t for n_each_cpu_mask prototype.
- Multiple documentation and description fixes and improvements.

The patch set also available from the ipi_noise_v7 branch at
git://github.com/gby/linux.git

Signed-off-by: Gilad Ben-Yossef <gilad [at] benyossef>
CC: Christoph Lameter <cl [at] linux>
CC: Chris Metcalf <cmetcalf [at] tilera>
CC: Peter Zijlstra <a.p.zijlstra [at] chello>
CC: Frederic Weisbecker <fweisbec [at] gmail>
CC: linux-mm [at] kvack
CC: Pekka Enberg <penberg [at] kernel>
CC: Matt Mackall <mpm [at] selenic>
CC: Sasha Levin <levinsasha928 [at] gmail>
CC: Rik van Riel <riel [at] redhat>
CC: Andi Kleen <andi [at] firstfloor>
CC: Mel Gorman <mel [at] csn>
CC: Andrew Morton <akpm [at] linux-foundation>
CC: Alexander Viro <viro [at] zeniv>
CC: Avi Kivity <avi [at] redhat>
CC: Michal Nazarewicz <mina86 [at] mina86>
CC: Kosaki Motohiro <kosaki.motohiro [at] gmail>
CC: Milton Miller <miltonm [at] bga>

Gilad Ben-Yossef (8):
smp: introduce a generic on_each_cpu_mask function
arm: move arm over to generic on_each_cpu_mask
tile: move tile to use generic on_each_cpu_mask
smp: add func to IPI cpus based on parameter func
slub: only IPI CPUs that have per cpu obj to flush
fs: only send IPI to invalidate LRU BH when needed
mm: only IPI CPUs to drain local pages if they exist
mm: add vmstat counters for tracking PCP drains

arch/arm/kernel/smp_tlb.c | 20 ++-------
arch/tile/include/asm/smp.h | 7 ---
arch/tile/kernel/smp.c | 19 ---------
fs/buffer.c | 15 +++++++-
include/linux/smp.h | 41 +++++++++++++++++++
include/linux/vm_event_item.h | 1 +
kernel/smp.c | 87 +++++++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 36 ++++++++++++++++-
mm/slub.c | 10 ++++-
mm/vmstat.c | 2 +
10 files changed, 194 insertions(+), 44 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


a.p.zijlstra at chello

Jan 26, 2012, 7:19 AM

Post #2 of 55 (99 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Thu, 2012-01-26 at 12:01 +0200, Gilad Ben-Yossef wrote:
> Gilad Ben-Yossef (8):
> smp: introduce a generic on_each_cpu_mask function
> arm: move arm over to generic on_each_cpu_mask
> tile: move tile to use generic on_each_cpu_mask
> smp: add func to IPI cpus based on parameter func
> slub: only IPI CPUs that have per cpu obj to flush
> fs: only send IPI to invalidate LRU BH when needed
> mm: only IPI CPUs to drain local pages if they exist

These patches look very nice!

Acked-by: Peter Zijlstra <a.p.zijlstra [at] chello>


> mm: add vmstat counters for tracking PCP drains
>
I understood from previous postings this patch wasn't meant for
inclusion, if it is, note that cpumask_weight() is a potentially very
expensive operation.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


gilad at benyossef

Jan 29, 2012, 12:25 AM

Post #3 of 55 (97 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Thu, Jan 26, 2012 at 5:19 PM, Peter Zijlstra <a.p.zijlstra [at] chello> wrote:
>
> On Thu, 2012-01-26 at 12:01 +0200, Gilad Ben-Yossef wrote:
> > Gilad Ben-Yossef (8):
> >   smp: introduce a generic on_each_cpu_mask function
> >   arm: move arm over to generic on_each_cpu_mask
> >   tile: move tile to use generic on_each_cpu_mask
> >   smp: add func to IPI cpus based on parameter func
> >   slub: only IPI CPUs that have per cpu obj to flush
> >   fs: only send IPI to invalidate LRU BH when needed
> >   mm: only IPI CPUs to drain local pages if they exist
>
> These patches look very nice!
>
> Acked-by: Peter Zijlstra <a.p.zijlstra [at] chello>
>

Thank you :-)

If this is of interest, I keep a list tracking global IPI and global
task schedulers sources in the core kernel here:
https://github.com/gby/linux/wiki.

I plan to visit all these potential interference source to see if
something can be done to lower their effect on
isolated CPUs over time.

>
> >   mm: add vmstat counters for tracking PCP drains
> >
> I understood from previous postings this patch wasn't meant for
> inclusion, if it is, note that cpumask_weight() is a potentially very
> expensive operation.

Right. The only purpose of the patch is to show the usefulness
of the previous patch in the series. It is not meant for mainline.

Thanks,
Gilad



--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad [at] benyossef
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"Unfortunately, cache misses are an equal opportunity pain provider."
-- Mike Galbraith, LKML
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


fweisbec at gmail

Feb 1, 2012, 9:04 AM

Post #4 of 55 (96 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Sun, Jan 29, 2012 at 10:25:46AM +0200, Gilad Ben-Yossef wrote:
> On Thu, Jan 26, 2012 at 5:19 PM, Peter Zijlstra <a.p.zijlstra [at] chello> wrote:
> >
> > On Thu, 2012-01-26 at 12:01 +0200, Gilad Ben-Yossef wrote:
> > > Gilad Ben-Yossef (8):
> > >   smp: introduce a generic on_each_cpu_mask function
> > >   arm: move arm over to generic on_each_cpu_mask
> > >   tile: move tile to use generic on_each_cpu_mask
> > >   smp: add func to IPI cpus based on parameter func
> > >   slub: only IPI CPUs that have per cpu obj to flush
> > >   fs: only send IPI to invalidate LRU BH when needed
> > >   mm: only IPI CPUs to drain local pages if they exist
> >
> > These patches look very nice!
> >
> > Acked-by: Peter Zijlstra <a.p.zijlstra [at] chello>
> >
>
> Thank you :-)
>
> If this is of interest, I keep a list tracking global IPI and global
> task schedulers sources in the core kernel here:
> https://github.com/gby/linux/wiki.
>
> I plan to visit all these potential interference source to see if
> something can be done to lower their effect on
> isolated CPUs over time.

Very nice especially as many people seem to be interested in
CPU isolation.

When we get the adaptive tickless feature in place, perhaps we'll
also need to think about some way to have more control on the
CPU affinity of some non pinned timers to avoid disturbing
adaptive tickless CPUs. We still need to consider their cache affinity
though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


a.p.zijlstra at chello

Feb 1, 2012, 9:35 AM

Post #5 of 55 (95 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Sun, 2012-01-29 at 10:25 +0200, Gilad Ben-Yossef wrote:
>
> If this is of interest, I keep a list tracking global IPI and global
> task schedulers sources in the core kernel here:
> https://github.com/gby/linux/wiki.

You can add synchronize_.*_expedited() to the list, it does its best to
bash the entire machine in order to try and make RCU grace periods
happen fast.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peterz at infradead

Feb 1, 2012, 9:57 AM

Post #6 of 55 (95 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Wed, 2012-02-01 at 18:35 +0100, Peter Zijlstra wrote:
> On Sun, 2012-01-29 at 10:25 +0200, Gilad Ben-Yossef wrote:
> >
> > If this is of interest, I keep a list tracking global IPI and global
> > task schedulers sources in the core kernel here:
> > https://github.com/gby/linux/wiki.
>
> You can add synchronize_.*_expedited() to the list, it does its best to
> bash the entire machine in order to try and make RCU grace periods
> happen fast.

Also anything using stop_machine, such as module unload, cpu hot-unplug
and text_poke().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


paulmck at linux

Feb 1, 2012, 10:40 AM

Post #7 of 55 (95 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Wed, Feb 01, 2012 at 06:35:22PM +0100, Peter Zijlstra wrote:
> On Sun, 2012-01-29 at 10:25 +0200, Gilad Ben-Yossef wrote:
> >
> > If this is of interest, I keep a list tracking global IPI and global
> > task schedulers sources in the core kernel here:
> > https://github.com/gby/linux/wiki.
>
> You can add synchronize_.*_expedited() to the list, it does its best to
> bash the entire machine in order to try and make RCU grace periods
> happen fast.

I have duly added "Make synchronize_sched_expedited() avoid IPIing idle
CPUs" to http://kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html.

This should not be hard once I have built up some trust in the new
RCU idle-detection code. It would also automatically apply to
Frederic's dyntick-idle userspace work.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


cl at linux

Feb 1, 2012, 12:06 PM

Post #8 of 55 (96 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Wed, 1 Feb 2012, Paul E. McKenney wrote:

> On Wed, Feb 01, 2012 at 06:35:22PM +0100, Peter Zijlstra wrote:
> > On Sun, 2012-01-29 at 10:25 +0200, Gilad Ben-Yossef wrote:
> > >
> > > If this is of interest, I keep a list tracking global IPI and global
> > > task schedulers sources in the core kernel here:
> > > https://github.com/gby/linux/wiki.
> >
> > You can add synchronize_.*_expedited() to the list, it does its best to
> > bash the entire machine in order to try and make RCU grace periods
> > happen fast.
>
> I have duly added "Make synchronize_sched_expedited() avoid IPIing idle
> CPUs" to http://kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html.
>
> This should not be hard once I have built up some trust in the new
> RCU idle-detection code. It would also automatically apply to
> Frederic's dyntick-idle userspace work.

Could we also apply the same approach to processors busy doing
computational work? In that case the OS is also not needed. Interrupting
these activities is impacting on performance and latency.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


paulmck at linux

Feb 1, 2012, 12:13 PM

Post #9 of 55 (96 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Wed, Feb 01, 2012 at 02:06:07PM -0600, Christoph Lameter wrote:
> On Wed, 1 Feb 2012, Paul E. McKenney wrote:
>
> > On Wed, Feb 01, 2012 at 06:35:22PM +0100, Peter Zijlstra wrote:
> > > On Sun, 2012-01-29 at 10:25 +0200, Gilad Ben-Yossef wrote:
> > > >
> > > > If this is of interest, I keep a list tracking global IPI and global
> > > > task schedulers sources in the core kernel here:
> > > > https://github.com/gby/linux/wiki.
> > >
> > > You can add synchronize_.*_expedited() to the list, it does its best to
> > > bash the entire machine in order to try and make RCU grace periods
> > > happen fast.
> >
> > I have duly added "Make synchronize_sched_expedited() avoid IPIing idle
> > CPUs" to http://kernel.org/pub/linux/kernel/people/paulmck/rcutodo.html.
> >
> > This should not be hard once I have built up some trust in the new
> > RCU idle-detection code. It would also automatically apply to
> > Frederic's dyntick-idle userspace work.
>
> Could we also apply the same approach to processors busy doing
> computational work? In that case the OS is also not needed. Interrupting
> these activities is impacting on performance and latency.

Yep, that is in fact what Frederic's dyntick-idle userspace work does.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


gilad at benyossef

Feb 2, 2012, 12:46 AM

Post #10 of 55 (87 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Wed, Feb 1, 2012 at 7:04 PM, Frederic Weisbecker <fweisbec [at] gmail> wrote:
>
> On Sun, Jan 29, 2012 at 10:25:46AM +0200, Gilad Ben-Yossef wrote:
>
> > If this is of interest, I keep a list tracking global IPI and global
> > task schedulers sources in the core kernel here:
> > https://github.com/gby/linux/wiki.
> >
> > I plan to visit all these potential interference source to see if
> > something can be done to lower their effect on
> > isolated CPUs over time.
>
> Very nice especially as many people seem to be interested in
> CPU isolation.


Yes, that is what drives me as well. I have a bare metal program
I'm trying to kill here, I researched CPU isolation and ran into your
nohz patch set and asked myself: "OK, if we disable the tick what else
is on the way?"

>
>
> When we get the adaptive tickless feature in place, perhaps we'll
> also need to think about some way to have more control on the
> CPU affinity of some non pinned timers to avoid disturbing
> adaptive tickless CPUs. We still need to consider their cache affinity
> though.


Right. I'm thinking we can treat a CPU going in adaptive tick mode in a similar
fashion to a CPU going offline for the purpose of timer migration.

Some pinned timers might be able to get special treatment as well - take for
example the vmstat work being schedule every second, what should we do with
it for CPU isolation?

It makes sense to me to have that stop scheduling itself when we have the tick
disabled for both idle and a nohz task.

A similar thing can be said for the clocksource watchdog for example - we might
consider having it not trigger stuff on idle or nohz task CPUs

Maybe we can have some notification mechanism when a task goes into nohz
mode and back to let stuff disable itself and back if it makes sense.
It seems more
sensible then having all these individual pieces check for whether
this CPU or other is
in idle or nohz task mode.

The question for nohz task then is when does the notification needs to go out?
only when a task managed to go into nohz mode or when we add a cpu to an
adaptive tick cpuset? because for stuff like vmstat, the very existence of the
runnable workqueue thread can keep a task from going into nohz mode. bah.
maybe we need two notifications...


Thanks!
Gilad
--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad [at] benyossef
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Feb 2, 2012, 1:34 AM

Post #11 of 55 (86 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On 02/01/2012 10:13 PM, Paul E. McKenney wrote:
> >
> > Could we also apply the same approach to processors busy doing
> > computational work? In that case the OS is also not needed. Interrupting
> > these activities is impacting on performance and latency.
>
> Yep, that is in fact what Frederic's dyntick-idle userspace work does.
>

Running in a guest is a special case of running in userspace, so we'd
need to extend this work to kvm as well.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


gilad at benyossef

Feb 2, 2012, 1:42 AM

Post #12 of 55 (85 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Wed, Feb 1, 2012 at 7:57 PM, Peter Zijlstra <peterz [at] infradead> wrote:
> On Wed, 2012-02-01 at 18:35 +0100, Peter Zijlstra wrote:
>> On Sun, 2012-01-29 at 10:25 +0200, Gilad Ben-Yossef wrote:
>> >
>> > If this is of interest, I keep a list tracking global IPI and global
>> > task schedulers sources in the core kernel here:
>> > https://github.com/gby/linux/wiki.
>>
>> You can add synchronize_.*_expedited() to the list, it does its best to
>> bash the entire machine in order to try and make RCU grace periods
>> happen fast.
>
> Also anything using stop_machine, such as module unload, cpu hot-unplug
> and text_poke().

Thanks! I've added it to the list together with the clocksource
watchdog, which is registering
a timer on each cpu in a cyclinc fashion.

Gilad


--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad [at] benyossef
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


paulmck at linux

Feb 2, 2012, 7:34 AM

Post #13 of 55 (85 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Thu, Feb 02, 2012 at 11:34:25AM +0200, Avi Kivity wrote:
> On 02/01/2012 10:13 PM, Paul E. McKenney wrote:
> > >
> > > Could we also apply the same approach to processors busy doing
> > > computational work? In that case the OS is also not needed. Interrupting
> > > these activities is impacting on performance and latency.
> >
> > Yep, that is in fact what Frederic's dyntick-idle userspace work does.
>
> Running in a guest is a special case of running in userspace, so we'd
> need to extend this work to kvm as well.

As long as rcu_idle_enter() is called at the appropriate time, RCU will
happily ignore the CPU. ;-)

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


cmetcalf at tilera

Feb 2, 2012, 7:41 AM

Post #14 of 55 (85 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On 2/2/2012 3:46 AM, Gilad Ben-Yossef wrote:
> On Wed, Feb 1, 2012 at 7:04 PM, Frederic Weisbecker <fweisbec [at] gmail> wrote:
>> Very nice especially as many people seem to be interested in CPU isolation.

Indeed!

> Yes, that is what drives me as well. I have a bare metal program
> I'm trying to kill here, I researched CPU isolation and ran into your
> nohz patch set and asked myself: "OK, if we disable the tick what else
> is on the way?"

At Tilera we have been supporting a "dataplane" mode (aka Zero Overhead
Linux - the marketing name). This is configured on a per-cpu basis, and in
addition to setting isolcpus for those nodes, also suppresses various
things that might otherwise run (soft lockup detection, vmstat work,
etc.). The claim is that you need to specify these kinds of things
per-core since it's not always possible for the kernel to know that you
really don't want the scheduler or any other interrupt source to touch the
core, as opposed to the case where you just happen to have a single process
scheduled on the core and you don't mind occasional interrupts. But
there's definitely appeal in having the kernel do it adaptively too,
particularly if it can be made to work just as well as configuring it
statically.

We also have a set_dataplane() syscall that a task can make to allow it to
request some additional semantics from the kernel, such as various
debugging modes, a flag to request populating the page table fully, and a
flag to request that all pending kernel timer ticks, etc., happen while the
task spins in the kernel before actually returning to userspace from a
syscall (so you don't get unexpected interrupts once you're back in
userspace). I've appended the relevant bits of <asm/dataplane.h> for more
details.

We've been planning to start working with the community on returning this,
but since fiddling with the scheduler is pretty tricky stuff and it wasn't
clear there was a lot of interest, we've been deferring it in favor of
other activities. But seeing more about Frederic Weisbecker's and Gilad
Ben-Yossef's work makes me think that it might be a good time for us to
start that process. For a start I'll see about putting up a git branch on
kernel.org that has our dataplane stuff in it, for reference.

/*
* Quiesce the timer interrupt before returning to user space after a
* system call. Normally if a task on a dataplane core makes a
* syscall, the system will run one or more timer ticks after the
* syscall has completed, causing unexpected interrupts in userspace.
* Setting DP_QUIESCE avoids that problem by having the kernel "hold"
* the task in kernel mode until the timer ticks are complete. This
* will make syscalls dramatically slower.
*
* If multiple dataplane tasks are scheduled on a single core, this
* in effect silently disables DP_QUIESCE, which allows the tasks to make
* progress, but without actually disabling the timer tick.
*/
#define DP_QUIESCE 0x1

/*
* Disallow the application from entering the kernel in any way,
* unless it calls set_dataplane() again without this bit set.
* Issuing any other syscall or causing a page fault would generate a
* kernel message, and "kill -9" the process.
*
* Setting this flag automatically sets DP_QUIESCE as well.
*/
#define DP_STRICT 0x2

/*
* Debug dataplane interrupts, so that if any interrupt source
* attempts to involve a dataplane cpu, a kernel message and stack
* backtrace will be generated on the console. As this warning is a
* slow event, it may make sense to avoid this mode in production code
* to avoid making any possible interrupts even more heavyweight.
*
* Setting this flag automatically sets DP_QUIESCE as well.
*/
#define DP_DEBUG 0x4

/*
* Cause all memory mappings to be populated in the page table.
* Specifying this when entering dataplane mode ensures that no future
* page fault events will occur to cause interrupts into the Linux
* kernel, as long as no new mappings are installed by mmap(), etc.
* Note that since the hardware TLB is of finite size, there will
* still be the potential for TLB misses that the hypervisor handles,
* either via its software TLB cache (fast path) or by walking the
* kernel page tables (slow path), so touching large amounts of memory
* will still incur hypervisor interrupt overhead.
*/
#define DP_POPULATE 0x8


--
Chris Metcalf, Tilera Corp.
http://www.tilera.com

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Feb 2, 2012, 8:14 AM

Post #15 of 55 (85 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On 02/02/2012 05:34 PM, Paul E. McKenney wrote:
> On Thu, Feb 02, 2012 at 11:34:25AM +0200, Avi Kivity wrote:
> > On 02/01/2012 10:13 PM, Paul E. McKenney wrote:
> > > >
> > > > Could we also apply the same approach to processors busy doing
> > > > computational work? In that case the OS is also not needed. Interrupting
> > > > these activities is impacting on performance and latency.
> > >
> > > Yep, that is in fact what Frederic's dyntick-idle userspace work does.
> >
> > Running in a guest is a special case of running in userspace, so we'd
> > need to extend this work to kvm as well.
>
> As long as rcu_idle_enter() is called at the appropriate time, RCU will
> happily ignore the CPU. ;-)
>

It's not called (since the cpu is not idle). Instead we call
rcu_virt_note_context_switch().

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


fweisbec at gmail

Feb 2, 2012, 8:24 AM

Post #16 of 55 (85 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Thu, Feb 02, 2012 at 10:46:32AM +0200, Gilad Ben-Yossef wrote:
> On Wed, Feb 1, 2012 at 7:04 PM, Frederic Weisbecker <fweisbec [at] gmail> wrote:
> >
> > On Sun, Jan 29, 2012 at 10:25:46AM +0200, Gilad Ben-Yossef wrote:
> >
> > > If this is of interest, I keep a list tracking global IPI and global
> > > task schedulers sources in the core kernel here:
> > > https://github.com/gby/linux/wiki.
> > >
> > > I plan to visit all these potential interference source to see if
> > > something can be done to lower their effect on
> > > isolated CPUs over time.
> >
> > Very nice especially as many people seem to be interested in
> > CPU isolation.
>
>
> Yes, that is what drives me as well. I have a bare metal program
> I'm trying to kill here, I researched CPU isolation and ran into your
> nohz patch set and asked myself: "OK, if we disable the tick what else
> is on the way?"
>
> >
> >
> > When we get the adaptive tickless feature in place, perhaps we'll
> > also need to think about some way to have more control on the
> > CPU affinity of some non pinned timers to avoid disturbing
> > adaptive tickless CPUs. We still need to consider their cache affinity
> > though.
>
>
> Right. I'm thinking we can treat a CPU going in adaptive tick mode in a similar
> fashion to a CPU going offline for the purpose of timer migration.
>
> Some pinned timers might be able to get special treatment as well - take for
> example the vmstat work being schedule every second, what should we do with
> it for CPU isolation?

Right, I remember I saw these vmstat timers on my way when I tried to get 0
interrupts on a CPU.

I think all these timers need to be carefully reviewed before doing anything.
But we certainly shouldn't adopt the behaviour of migrating timers by default.

Some timers really needs to stay on the expected CPU. Note that some
timers may be shutdown by CPU hotplug callbacks. Those wouldn't be migrated
in case of CPU offlining. We need to keep them.

> It makes sense to me to have that stop scheduling itself when we have the tick
> disabled for both idle and a nohz task.

We have deferrable timers, their semantics is to not fire when the CPU is
idle. But beeing idle and beeing adaptive tickless is not the same. On adaptive
tickless the CPU is busy doing things that might be relevant for these deferrable
timers.

So I don't think we can apply the same logic.


>
> A similar thing can be said for the clocksource watchdog for example - we might
> consider having it not trigger stuff on idle or nohz task CPUs

This one is particular and is only armed when the tsc is unstable (IIUC). I
guess we shouldn't worry about that, it's a corner case.

> Maybe we can have some notification mechanism when a task goes into nohz
> mode and back to let stuff disable itself and back if it makes sense.
> It seems more
> sensible then having all these individual pieces check for whether
> this CPU or other is
> in idle or nohz task mode.
>
> The question for nohz task then is when does the notification needs to go out?
> only when a task managed to go into nohz mode or when we add a cpu to an
> adaptive tick cpuset? because for stuff like vmstat, the very existence of the
> runnable workqueue thread can keep a task from going into nohz mode. bah.
> maybe we need two notifications...

I think we really need to explore these timers and workqueues case by case.
And may be set up a way to affine these to particular cpusets if needed.

>
>
> Thanks!
> Gilad
> --
> Gilad Ben-Yossef
> Chief Coffee Drinker
> gilad [at] benyossef
> Israel Cell: +972-52-8260388
> US Cell: +1-973-8260388
> http://benyossef.com
>
> "If you take a class in large-scale robotics, can you end up in a
> situation where the homework eats your dog?"
>  -- Jean-Baptiste Queru
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


cl at linux

Feb 2, 2012, 8:29 AM

Post #17 of 55 (85 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Thu, 2 Feb 2012, Frederic Weisbecker wrote:

> > Some pinned timers might be able to get special treatment as well - take for
> > example the vmstat work being schedule every second, what should we do with
> > it for CPU isolation?
>
> Right, I remember I saw these vmstat timers on my way when I tried to get 0
> interrupts on a CPU.
>
> I think all these timers need to be carefully reviewed before doing anything.
> But we certainly shouldn't adopt the behaviour of migrating timers by default.
>
> Some timers really needs to stay on the expected CPU. Note that some
> timers may be shutdown by CPU hotplug callbacks. Those wouldn't be migrated
> in case of CPU offlining. We need to keep them.
>
> > It makes sense to me to have that stop scheduling itself when we have the tick
> > disabled for both idle and a nohz task.

The vmstat timer only makes sense when the OS is doing something on the
processor. Otherwise if no counters are incremented and the page and slab
allocator caches are empty then there is no need to run the vmstat timer.


paulmck at linux

Feb 2, 2012, 9:01 AM

Post #18 of 55 (85 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Thu, Feb 02, 2012 at 06:14:36PM +0200, Avi Kivity wrote:
> On 02/02/2012 05:34 PM, Paul E. McKenney wrote:
> > On Thu, Feb 02, 2012 at 11:34:25AM +0200, Avi Kivity wrote:
> > > On 02/01/2012 10:13 PM, Paul E. McKenney wrote:
> > > > >
> > > > > Could we also apply the same approach to processors busy doing
> > > > > computational work? In that case the OS is also not needed. Interrupting
> > > > > these activities is impacting on performance and latency.
> > > >
> > > > Yep, that is in fact what Frederic's dyntick-idle userspace work does.
> > >
> > > Running in a guest is a special case of running in userspace, so we'd
> > > need to extend this work to kvm as well.
> >
> > As long as rcu_idle_enter() is called at the appropriate time, RCU will
> > happily ignore the CPU. ;-)
> >
>
> It's not called (since the cpu is not idle). Instead we call
> rcu_virt_note_context_switch().

Frederic's work checks to see if there is only one runnable user task
on a given CPU. If there is only one, then the scheduling-clock interrupt
is turned off for that CPU, and RCU is told to ignore it while it is
executing in user space. Not sure whether this covers KVM guests.

In any case, this is not yet in mainline.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Feb 2, 2012, 9:23 AM

Post #19 of 55 (85 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On 02/02/2012 07:01 PM, Paul E. McKenney wrote:
> >
> > It's not called (since the cpu is not idle). Instead we call
> > rcu_virt_note_context_switch().
>
> Frederic's work checks to see if there is only one runnable user task
> on a given CPU. If there is only one, then the scheduling-clock interrupt
> is turned off for that CPU, and RCU is told to ignore it while it is
> executing in user space. Not sure whether this covers KVM guests.

Conceptually it's the same. Maybe it needs adjustments, since kvm
enters a guest in a different way than the kernel exits to userspace.

> In any case, this is not yet in mainline.

Let me know when it's in, and I'll have a look.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


cl at linux

Feb 2, 2012, 9:25 AM

Post #20 of 55 (85 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Thu, 2 Feb 2012, Paul E. McKenney wrote:

> Frederic's work checks to see if there is only one runnable user task
> on a given CPU. If there is only one, then the scheduling-clock interrupt
> is turned off for that CPU, and RCU is told to ignore it while it is
> executing in user space. Not sure whether this covers KVM guests.
>
> In any case, this is not yet in mainline.

Sounds great. Is there any plan on when to merge it? Where are the most up
to date patches vs mainstream?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


paulmck at linux

Feb 2, 2012, 9:51 AM

Post #21 of 55 (85 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Thu, Feb 02, 2012 at 07:23:39PM +0200, Avi Kivity wrote:
> On 02/02/2012 07:01 PM, Paul E. McKenney wrote:
> > >
> > > It's not called (since the cpu is not idle). Instead we call
> > > rcu_virt_note_context_switch().
> >
> > Frederic's work checks to see if there is only one runnable user task
> > on a given CPU. If there is only one, then the scheduling-clock interrupt
> > is turned off for that CPU, and RCU is told to ignore it while it is
> > executing in user space. Not sure whether this covers KVM guests.
>
> Conceptually it's the same. Maybe it needs adjustments, since kvm
> enters a guest in a different way than the kernel exits to userspace.
>
> > In any case, this is not yet in mainline.
>
> Let me know when it's in, and I'll have a look.

Could you please touch base with Frederic Weisbecker to make sure that
what he is doing works for you?

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


gilad at benyossef

Feb 5, 2012, 3:46 AM

Post #22 of 55 (83 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Thu, Feb 2, 2012 at 5:41 PM, Chris Metcalf <cmetcalf [at] tilera> wrote:
> On 2/2/2012 3:46 AM, Gilad Ben-Yossef wrote:
>
>> Yes, that is what drives me as well. I have a bare metal program
>> I'm trying to kill here, I researched CPU isolation and ran into your
>> nohz patch set and asked myself: "OK, if we disable the tick what else
>> is on the way?"
>
> At Tilera we have been supporting a "dataplane" mode (aka Zero Overhead
> Linux - the marketing name).  This is configured on a per-cpu basis, and in
> addition to setting isolcpus for those nodes, also suppresses various
> things that might otherwise run (soft lockup detection, vmstat work,
> etc.).  The claim is that you need to specify these kinds of things
> per-core since it's not always possible for the kernel to know that you
> really don't want the scheduler or any other interrupt source to touch the
> core, as opposed to the case where you just happen to have a single process
> scheduled on the core and you don't mind occasional interrupts.  But
> there's definitely appeal in having the kernel do it adaptively too,
> particularly if it can be made to work just as well as configuring it
> statically.

Currently adaptive tick needs to be enabled as a cpuset property in
order to apply,
but once enabled it is activated automatically when feasible.

The combination of per cpuset enabling and automatic activation makes
sense to me
since cpuset is the way to go to isolate cpus for specific tasks going forward.
>
> We also have a set_dataplane() syscall that a task can make to allow it to
> request some additional semantics from the kernel, such as various
> debugging modes, a flag to request populating the page table fully, and a
> flag to request that all pending kernel timer ticks, etc., happen while the
> task spins in the kernel before actually returning to userspace from a
> syscall (so you don't get unexpected interrupts once you're back in
> userspace).

Oohh.. I like that :-)

> I've appended the relevant bits of <asm/dataplane.h> for more
> details.
>
> We've been planning to start working with the community on returning this,
> but since fiddling with the scheduler is pretty tricky stuff and it wasn't
> clear there was a lot of interest, we've been deferring it in favor of
> other activities.  But seeing more about Frederic Weisbecker's and Gilad
> Ben-Yossef's work makes me think that it might be a good time for us to
> start that process.  For a start I'll see about putting up a git branch on
> kernel.org that has our dataplane stuff in it, for reference.
>

This sounds very interesting. Thanks you!

I for one will be delighted to see that tree as a reference. There is nothing
I hate more then re-inventing the wheel... :-)

> /*
>  * Quiesce the timer interrupt before returning to user space after a
>  * system call.  Normally if a task on a dataplane core makes a
>  * syscall, the system will run one or more timer ticks after the
>  * syscall has completed, causing unexpected interrupts in userspace.
>  * Setting DP_QUIESCE avoids that problem by having the kernel "hold"
>  * the task in kernel mode until the timer ticks are complete.  This
>  * will make syscalls dramatically slower.
>  *
>  * If multiple dataplane tasks are scheduled on a single core, this
>  * in effect silently disables DP_QUIESCE, which allows the tasks to make
>  * progress, but without actually disabling the timer tick.
>  */
> #define DP_QUIESCE      0x1
>
> /*
>  * Disallow the application from entering the kernel in any way,
>  * unless it calls set_dataplane() again without this bit set.
>  * Issuing any other syscall or causing a page fault would generate a
>  * kernel message, and "kill -9" the process.
>  *
>  * Setting this flag automatically sets DP_QUIESCE as well.
>  */
> #define DP_STRICT       0x2
>
> /*
>  * Debug dataplane interrupts, so that if any interrupt source
>  * attempts to involve a dataplane cpu, a kernel message and stack
>  * backtrace will be generated on the console.  As this warning is a
>  * slow event, it may make sense to avoid this mode in production code
>  * to avoid making any possible interrupts even more heavyweight.
>  *
>  * Setting this flag automatically sets DP_QUIESCE as well.
>  */
> #define DP_DEBUG        0x4
>
> /*
>  * Cause all memory mappings to be populated in the page table.
>  * Specifying this when entering dataplane mode ensures that no future
>  * page fault events will occur to cause interrupts into the Linux
>  * kernel, as long as no new mappings are installed by mmap(), etc.
>  * Note that since the hardware TLB is of finite size, there will
>  * still be the potential for TLB misses that the hypervisor handles,
>  * either via its software TLB cache (fast path) or by walking the
>  * kernel page tables (slow path), so touching large amounts of memory
>  * will still incur hypervisor interrupt overhead.
>  */
> #define DP_POPULATE     0x8

hmm... I've probably missed something, but doesn't this replicate
mlockall (MCL_CURRENT|MCL_FUTURE) ?

Thanks!
Gilad




--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad [at] benyossef
Israel Cell: +972-52-8260388
US Cell: +1-973-8260388
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


gilad at benyossef

Feb 5, 2012, 4:06 AM

Post #23 of 55 (83 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Thu, Feb 2, 2012 at 7:25 PM, Christoph Lameter <cl [at] linux> wrote:
> On Thu, 2 Feb 2012, Paul E. McKenney wrote:
>
>> Frederic's work checks to see if there is only one runnable user task
>> on a given CPU.  If there is only one, then the scheduling-clock interrupt
>> is turned off for that CPU, and RCU is told to ignore it while it is
>> executing in user space.  Not sure whether this covers KVM guests.
>>
>> In any case, this is not yet in mainline.
>
> Sounds great. Is there any plan on when to merge it? Where are the most up
> to date patches vs mainstream?
>


Frederic has the latest version in a git tree here:

git://github.com/fweisbec/linux-dynticks.git
nohz/cpuset-v2-pre-20120117

It's on top latest rcu/core.

I've been playing with it for some time now. It works very well, considering the
early state - there are a couple of TODO items listed here:
https://tglx.de/~fweisbec/TODO-nohz-cpusets and I've seen an assert from
the RCU code once.

Also, there is some system stuff "in the way" so to speak, of getting the full
benefits:

I had to disable the clock source watchdog (I'm testing in a KVM VM, so I guess
the TSC is not stable), the vmstat_stats work on that CPU and to (try
to) fix what
looks like a bug in the NOHZ timer code.

But the good news is that with these hacks applied I managed to run a 100%
CPU task with zero interrupts (ticks or otherwise) on an isolated cpu.

Disregarding TLB overhead, you get bare metal performance with Linux user
space manageability and debug capabilities. Pretty magical really: It's like
eating your cake and having it too :-)

Gilad

--
Gilad Ben-Yossef
Chief Coffee Drinker
gilad [at] benyossef
http://benyossef.com

"If you take a class in large-scale robotics, can you end up in a
situation where the homework eats your dog?"
 -- Jean-Baptiste Queru
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


avi at redhat

Feb 5, 2012, 4:16 AM

Post #24 of 55 (84 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On 02/02/2012 07:51 PM, Paul E. McKenney wrote:
> On Thu, Feb 02, 2012 at 07:23:39PM +0200, Avi Kivity wrote:
> > On 02/02/2012 07:01 PM, Paul E. McKenney wrote:
> > > >
> > > > It's not called (since the cpu is not idle). Instead we call
> > > > rcu_virt_note_context_switch().
> > >
> > > Frederic's work checks to see if there is only one runnable user task
> > > on a given CPU. If there is only one, then the scheduling-clock interrupt
> > > is turned off for that CPU, and RCU is told to ignore it while it is
> > > executing in user space. Not sure whether this covers KVM guests.
> >
> > Conceptually it's the same. Maybe it needs adjustments, since kvm
> > enters a guest in a different way than the kernel exits to userspace.
> >
> > > In any case, this is not yet in mainline.
> >
> > Let me know when it's in, and I'll have a look.
>
> Could you please touch base with Frederic Weisbecker to make sure that
> what he is doing works for you?
>

Looks like there are new rcu_user_enter() and rcu_user_exit() APIs which
we can use. Hopefully they subsume rcu_virt_note_context_switch() so we
only need one set of APIs.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


paulmck at linux

Feb 5, 2012, 8:59 AM

Post #25 of 55 (68 views)
Permalink
Re: [v7 0/8] Reduce cross CPU IPI interference [In reply to]

On Sun, Feb 05, 2012 at 02:16:17PM +0200, Avi Kivity wrote:
> On 02/02/2012 07:51 PM, Paul E. McKenney wrote:
> > On Thu, Feb 02, 2012 at 07:23:39PM +0200, Avi Kivity wrote:
> > > On 02/02/2012 07:01 PM, Paul E. McKenney wrote:
> > > > >
> > > > > It's not called (since the cpu is not idle). Instead we call
> > > > > rcu_virt_note_context_switch().
> > > >
> > > > Frederic's work checks to see if there is only one runnable user task
> > > > on a given CPU. If there is only one, then the scheduling-clock interrupt
> > > > is turned off for that CPU, and RCU is told to ignore it while it is
> > > > executing in user space. Not sure whether this covers KVM guests.
> > >
> > > Conceptually it's the same. Maybe it needs adjustments, since kvm
> > > enters a guest in a different way than the kernel exits to userspace.
> > >
> > > > In any case, this is not yet in mainline.
> > >
> > > Let me know when it's in, and I'll have a look.
> >
> > Could you please touch base with Frederic Weisbecker to make sure that
> > what he is doing works for you?
>
> Looks like there are new rcu_user_enter() and rcu_user_exit() APIs which
> we can use. Hopefully they subsume rcu_virt_note_context_switch() so we
> only need one set of APIs.

Now that you mention it, that is a good goal. However, it requires
coordination with Frederic's code as well, so some investigation
is required. Bad things happen if you tell RCU you are idle when you
really are not and vice versa!

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First page Previous page 1 2 3 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.