Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

Context switch times

 

 

First page Previous page 1 2 3 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded


kravetz at us

Oct 4, 2001, 2:04 PM

Post #1 of 54 (1092 views)
Permalink
Context switch times

I've been working on a rewrite of our Multi-Queue scheduler
and am using the lat_ctx program of LMbench as a benchmark.
I'm lucky enough to have access to an 8-CPU system for use
during development. One time, I 'accidently' booted the
kernel that came with the distribution installed on this
machine. That kernel level is '2.2.16-22'. The results of
running lat-ctx on this kernel when compared to 2.4.10 really
surprised me. Here is an example:
2.4.10 on 8 CPUs: lat_ctx -s 0 -r 2 results
"size=0k ovr=2.27
2 3.86
2.2.16-22 on 8 CPUS: lat_ctx -s 0 -r 2 results
"size=0k ovr=1.99
2 1.44
As you can see, the context switch times for 2.4.10 are more
than double what they were for 2.2.16-22 in this example.
Comments?
One observation I did make is that this may be related to CPU
affinity/cache warmth. If you increase the number of 'TRIPS'
to a very large number, you can run 'top' and observe per-CPU
utilization. On 2.2.16-22, the '2 task' benchmark seemed to
stay on 3 of the 8 CPUs. On 2.4.10, these 2 tasks were run
on all 8 CPUs and utilization was about the same for each CPU.
--
Mike Kravetz kravetz [at] us
IBM Peace, Love and Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


arjan at fenrus

Oct 4, 2001, 2:14 PM

Post #2 of 54 (1077 views)
Permalink
Re: Context switch times [In reply to]

In article <20011004140417.C1245 [at] w-mikek2> you wrote:
> 2.4.10 on 8 CPUs: lat_ctx -s 0 -r 2 results
> "size=0k ovr=2.27
> 2 3.86
> 2.2.16-22 on 8 CPUS: lat_ctx -s 0 -r 2 results
> "size=0k ovr=1.99
> 2 1.44
> As you can see, the context switch times for 2.4.10 are more
> than double what they were for 2.2.16-22 in this example.
> Comments?
2.4.x supports SSE on pentium III/athlons, so the SSE registers need to be
saved/restored on a taskswitch as well.... that's not exactly free.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


davem at redhat

Oct 4, 2001, 2:25 PM

Post #3 of 54 (1075 views)
Permalink
Re: Context switch times [In reply to]

From: arjan [at] fenrus
Date: Thu, 04 Oct 2001 22:14:13 +0100

> Comments?

2.4.x supports SSE on pentium III/athlons, so the SSE registers need to be
saved/restored on a taskswitch as well.... that's not exactly free.
lat_ctx doesn't execute any FPU ops. So at worst this happens once
on GLIBC program startup, but then never again.
This assumes I understand how lazy i387 restores work in the kernel
:-)
Franks a lot,
David S. Miller
davem [at] redhat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


rgooch at ras

Oct 4, 2001, 2:39 PM

Post #4 of 54 (1076 views)
Permalink
Re: Context switch times [In reply to]

David S. Miller writes:
> From: arjan [at] fenrus
> Date: Thu, 04 Oct 2001 22:14:13 +0100
>
> > Comments?
>
> 2.4.x supports SSE on pentium III/athlons, so the SSE registers need to be
> saved/restored on a taskswitch as well.... that's not exactly free.
>
> lat_ctx doesn't execute any FPU ops. So at worst this happens once
> on GLIBC program startup, but then never again.
Has something changed? Last I looked, the whole lmbench timing harness
was based on using the FPU. That was the cause of the big argument
Larry and I had some years back: my context switch benchmark didn't
use the FPU, and thus was more sensitive to variations (such as cache
misses due to aliasing).
Regards,
Richard....
Permanent: rgooch [at] atnf
Current: rgooch [at] ras
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


davem at redhat

Oct 4, 2001, 2:52 PM

Post #5 of 54 (1075 views)
Permalink
Re: Context switch times [In reply to]

From: Richard Gooch <rgooch [at] ras>
Date: Thu, 4 Oct 2001 15:39:05 -0600
David S. Miller writes:
> lat_ctx doesn't execute any FPU ops. So at worst this happens once
> on GLIBC program startup, but then never again.

Has something changed? Last I looked, the whole lmbench timing harness
was based on using the FPU.
Oops, that's entirely possible...
But things are usually layed out like this:
capture_start_time();
context_switch_N_times();
capture_end_time();
So the FPU hit is only before/after the runs, not during each and
every iteration.
Franks a lot,
David S. Miller
davem [at] redhat
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


bcrl at redhat

Oct 4, 2001, 2:55 PM

Post #6 of 54 (1076 views)
Permalink
Re: Context switch times [In reply to]

On Thu, Oct 04, 2001 at 02:52:39PM -0700, David S. Miller wrote:
> So the FPU hit is only before/after the runs, not during each and
> every iteration.
Right. Plus, the original mail mentioned that it was hitting all 8
CPUs, which is a pretty good example of braindead scheduler behaviour.
-ben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


davidel at xmailserver

Oct 4, 2001, 3:35 PM

Post #7 of 54 (1075 views)
Permalink
Re: Context switch times [In reply to]

On Thu, 4 Oct 2001, Benjamin LaHaise wrote:
> On Thu, Oct 04, 2001 at 02:52:39PM -0700, David S. Miller wrote:
> > So the FPU hit is only before/after the runs, not during each and
> > every iteration.
>
> Right. Plus, the original mail mentioned that it was hitting all 8
> CPUs, which is a pretty good example of braindead scheduler behaviour.
There was a discussion about process spinning among idle CPUs a couple of
months ago.
Mike, did you code the patch that stick the task to an idle between the
send-IPI and the idle wakeup ?
At that time we simply left the issue unaddressed.
- Davide
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


torvalds at transmeta

Oct 4, 2001, 3:42 PM

Post #8 of 54 (1076 views)
Permalink
Re: Context switch times [In reply to]

In article <20011004175526.C18528 [at] redhat>,
Benjamin LaHaise <bcrl [at] redhat> wrote:
>On Thu, Oct 04, 2001 at 02:52:39PM -0700, David S. Miller wrote:
>> So the FPU hit is only before/after the runs, not during each and
>> every iteration.
>
>Right. Plus, the original mail mentioned that it was hitting all 8
>CPUs, which is a pretty good example of braindead scheduler behaviour.
Careful.
That's not actually true (the braindead part, that i).
We went through this with Ingo and Larry McVoy, and the sad fact is that
to get the best numbers for lmbench, you simply have to do the wrong
thing.
Could we try to hit just two? Probably, but it doesn't really matter,
though: to make the lmbench scheduler benchmark go at full speed, you
want to limit it to _one_ CPU, which is not sensible in real-life
situations. The amount of concurrency in the context switching
benchmark is pretty small, and does not make up for bouncing the locks
etc between CPU's.
However, that lack of concurrency in lmbench is totally due to the
artificial nature of the benchmark, and the bigger-footprint scheduling
stuff (that isn't reported very much in the summary) are more realistic.
So 2.4.x took the (painful) road of saying that we care less about that
particular benchmark than about some other more realistic loads.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kravetz at us

Oct 4, 2001, 3:49 PM

Post #9 of 54 (1077 views)
Permalink
Re: Context switch times [In reply to]

On Thu, Oct 04, 2001 at 03:35:35PM -0700, Davide Libenzi wrote:
> > Right. Plus, the original mail mentioned that it was hitting all 8
> > CPUs, which is a pretty good example of braindead scheduler behaviour.
>
> There was a discussion about process spinning among idle CPUs a couple of
> months ago.
> Mike, did you code the patch that stick the task to an idle between the
> send-IPI and the idle wakeup ?
> At that time we simply left the issue unaddressed.
I believe the 'quick and dirty' patches we came up with substantially
increased context switch times for this benchmark (doubled them).
The reason is that you needed to add IPI time to the context switch
time. Therefore, I did not actively pursue getting these accepted. :)
It appears that something in the 2.2 scheduler did a much better
job of handling this situation. I haven't looked at the 2.2 code.
Does anyone know what feature of the 2.2 scheduler was more successful
in keeping tasks on the CPUs on which they previously executed?
Also, why was that code removed from 2.4? I can research, but I
suspect someone here has firsthand knowledge.
--
Mike Kravetz kravetz [at] us
IBM Peace, Love and Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


bcrl at redhat

Oct 4, 2001, 3:53 PM

Post #10 of 54 (1077 views)
Permalink
Re: Context switch times [In reply to]

On Thu, Oct 04, 2001 at 10:42:37PM +0000, Linus Torvalds wrote:
> Could we try to hit just two? Probably, but it doesn't really matter,
> though: to make the lmbench scheduler benchmark go at full speed, you
> want to limit it to _one_ CPU, which is not sensible in real-life
> situations. The amount of concurrency in the context switching
> benchmark is pretty small, and does not make up for bouncing the locks
> etc between CPU's.
I don't quite agree with you that it doesn't matter. A lot of tests
(volanomark, other silly things) show that the current scheduler jumps
processes from CPU to CPU on SMP boxes far too easily, in addition to the
lengthy duration of run queue processing when loaded down. Yes, these
applications are leaving too many runnable processes around, but that's
the way some large app server loads behave. And right now it makes linux
look bad compared to other OSes.
Yes, low latency is good, but jumping around cpus adds more latency in
cache misses across slow busses than is needed when the working set is
already present in the 2MB L2 of your high end server.
-ben
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kravetz at us

Oct 4, 2001, 4:41 PM

Post #11 of 54 (1075 views)
Permalink
Re: Context switch times [In reply to]

On Thu, Oct 04, 2001 at 10:42:37PM +0000, Linus Torvalds wrote:
> Could we try to hit just two? Probably, but it doesn't really matter,
> though: to make the lmbench scheduler benchmark go at full speed, you
> want to limit it to _one_ CPU, which is not sensible in real-life
> situations.
Can you clarify? I agree that tuning the system for the best LMbench
performance is not a good thing to do! However, in general on an
8 CPU system with only 2 'active' tasks I would think limiting the
tasks to 2 CPUs would be desirable for cache effects.
I know that running LMbench with 2 active tasks on an 8 CPU system
results in those 2 tasks being 'round-robined' among all 8 CPUs.
Prior analysis leads me to believe the reason for this is due to
IPI latency. reschedule_idle() chooses the 'best/correct' CPU for
a task to run on, but before schedule() runs on that CPU another
CPU runs schedule() and the result is that the task runs on a
?less desirable? CPU. The nature of the LMbench scheduler benchmark
makes this occur frequently. The real question is: how often
does this happen in real-life situations?
--
Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


torvalds at transmeta

Oct 4, 2001, 4:50 PM

Post #12 of 54 (1077 views)
Permalink
Re: Context switch times [In reply to]

On Thu, 4 Oct 2001, Mike Kravetz wrote:
> On Thu, Oct 04, 2001 at 10:42:37PM +0000, Linus Torvalds wrote:
> > Could we try to hit just two? Probably, but it doesn't really matter,
> > though: to make the lmbench scheduler benchmark go at full speed, you
> > want to limit it to _one_ CPU, which is not sensible in real-life
> > situations.
>
> Can you clarify? I agree that tuning the system for the best LMbench
> performance is not a good thing to do! However, in general on an
> 8 CPU system with only 2 'active' tasks I would think limiting the
> tasks to 2 CPUs would be desirable for cache effects.
Yes, limiting to 2 CPU's probably gets better cache behaviour, and it
might be worth looking into why it doesn't. The CPU affinity _should_
prioritize it down to two, but I haven't thought through your theory about
IPI latency.
However, the reason 2.2.x does so well is that in 2.2.x it will stay on
_once_ CPU if I remember correctly. We basically tuned the scheduler for
lmbench, and not much else.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


davidel at xmailserver

Oct 4, 2001, 4:56 PM

Post #13 of 54 (1075 views)
Permalink
Re: Context switch times [In reply to]

On Thu, 4 Oct 2001, Mike Kravetz wrote:
> On Thu, Oct 04, 2001 at 10:42:37PM +0000, Linus Torvalds wrote:
> > Could we try to hit just two? Probably, but it doesn't really matter,
> > though: to make the lmbench scheduler benchmark go at full speed, you
> > want to limit it to _one_ CPU, which is not sensible in real-life
> > situations.
>
> Can you clarify? I agree that tuning the system for the best LMbench
> performance is not a good thing to do! However, in general on an
> 8 CPU system with only 2 'active' tasks I would think limiting the
> tasks to 2 CPUs would be desirable for cache effects.
>
> I know that running LMbench with 2 active tasks on an 8 CPU system
> results in those 2 tasks being 'round-robined' among all 8 CPUs.
> Prior analysis leads me to believe the reason for this is due to
> IPI latency. reschedule_idle() chooses the 'best/correct' CPU for
> a task to run on, but before schedule() runs on that CPU another
> CPU runs schedule() and the result is that the task runs on a
> ?less desirable? CPU. The nature of the LMbench scheduler benchmark
> makes this occur frequently. The real question is: how often
> does this happen in real-life situations?
Well, if you remember the first time this issue was discussed on the
mailing list was due a real life situation not due a bench run.
- Davide
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


andrea at suse

Oct 4, 2001, 5:45 PM

Post #14 of 54 (1075 views)
Permalink
Re: Context switch times [In reply to]

On Thu, Oct 04, 2001 at 04:41:02PM -0700, Mike Kravetz wrote:
> I know that running LMbench with 2 active tasks on an 8 CPU system
> results in those 2 tasks being 'round-robined' among all 8 CPUs.
> Prior analysis leads me to believe the reason for this is due to
> IPI latency. reschedule_idle() chooses the 'best/correct' CPU for
> a task to run on, but before schedule() runs on that CPU another
> CPU runs schedule() and the result is that the task runs on a
> ?less desirable? CPU. The nature of the LMbench scheduler benchmark
doesn't lmbench wakeup only via pipes? Linux uses the sync-wakeup that
avoids reschedule_idle in such case, to serialize the pipe load in the
same cpu.
Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


kravetz at us

Oct 4, 2001, 9:35 PM

Post #15 of 54 (1074 views)
Permalink
Re: Context switch times [In reply to]

On Fri, Oct 05, 2001 at 02:45:26AM +0200, Andrea Arcangeli wrote:
> doesn't lmbench wakeup only via pipes? Linux uses the sync-wakeup that
> avoids reschedule_idle in such case, to serialize the pipe load in the
> same cpu.
That's what I thought too. However, kernel profile data of a
lmbench run on 2.4.10 reveals that the pipe routines only call
the non-synchronous form of wake_up. I believe I reached the
same conclusion in the 2.4.7 time frame by instrumenting this
code.
--
Mike
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dm at desanasystems

Oct 4, 2001, 11:31 PM

Post #16 of 54 (1076 views)
Permalink
Re: Context switch times [In reply to]

> I believe the 'quick and dirty' patches we came up with substantially
> increased context switch times for this benchmark (doubled them).
> The reason is that you needed to add IPI time to the context switch
> time. Therefore, I did not actively pursue getting these accepted. :)
> It appears that something in the 2.2 scheduler did a much better
> job of handling this situation. I haven't looked at the 2.2 code.
> Does anyone know what feature of the 2.2 scheduler was more successful
> in keeping tasks on the CPUs on which they previously executed?
> Also, why was that code removed from 2.4? I can research, but I
> suspect someone here has firsthand knowledge.
The reason 2.2 does better is because under some conditions if a woken up
process's preferred CPU is busy it will refrain from moving it to another
CPU even if there are many idle CPUs, in the hope that the preferred CPU
will become available soon. This can cause situations where processes are
sitting on the run queue while CPUs idle, but works great for lmbench. OTOH
2.4 assigns processes to CPUs as soon as possible. IIRC this change
happened in one of the early 2.3.4x kernels.
---
Dimitris Michailidis dm [at] desanasystems
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


alan at lxorguk

Oct 5, 2001, 8:13 AM

Post #17 of 54 (1075 views)
Permalink
Re: Context switch times [In reply to]

> I don't quite agree with you that it doesn't matter. A lot of tests
> (volanomark, other silly things) show that the current scheduler jumps
> processes from CPU to CPU on SMP boxes far too easily, in addition to the
> lengthy duration of run queue processing when loaded down. Yes, these
> applications are leaving too many runnable processes around, but that's
> the way some large app server loads behave. And right now it makes linux
> look bad compared to other OSes.
The current scheduler is completely hopeless by the time you hit 3
processors and a bit flaky on two. Its partly the algorithm and partly
implementation
#1 We schedule tasks based on a priority that has no cache
consideration
#2 We have an O(tasks) loop refreshing priority that isnt needed
(Montavista fix covers this)
#3 We have an O(runnable) loop which isnt ideal
#4 On x86 we are horribly cache pessimal. All the task structs are
on the same cache colour. Multiple tasks waiting for the same event
put their variables (like the wait queue) on the same cache line.
The poor cache performance greatly comes from problem 1. Because we prioritize
blindly on CPU usage with a tiny magic constant fudge factor we are
executing totally wrong task orders. Instead we need to schedule for cache
optimal behaviour, and the moderator is the fairness requirement, not the
other way around.
The classic example is two steady cpu loads and an occasionally waking
client (like an editor)
We end up scheduling [A, B are the loads, C is the editor)
A C B C A C B C A C B C A
whenever a key is hit we swap CPU hog because the priority balanced shifted.
Really we should have scheduled something looking more like
A C A C A C B C B C B C B
I would argue there are two needed priorities
1. CPU usage based priority - the current one
2. Task priority.
Task priority being set to maximum when a task sleeps and then bounded by
a function of max(task_priorty, fn(cpupriority)) so that tasks fall down
priority levels as their cpu usage in the set of time slices. This causes
tasks that are CPU hogs to sit at the bottom of the pile with the same low
task_priority meaning they run for long bursts and don't get pre-empted and
switched with other hogs through each cycle of the scheduler.
This isnt idle speculation - I've done some minimal playing with this but
my initial re-implementation didnt handle SMP at all and I am still not 100%
sure how to resolve SMP or how SMP will improve out of the current cunning
plan.
Alan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


ebiederm at xmission

Oct 5, 2001, 8:15 AM

Post #18 of 54 (1077 views)
Permalink
Re: Context switch times [In reply to]

Linus Torvalds <torvalds [at] transmeta> writes:
> On Thu, 4 Oct 2001, Mike Kravetz wrote:
>
> > On Thu, Oct 04, 2001 at 10:42:37PM +0000, Linus Torvalds wrote:
> > > Could we try to hit just two? Probably, but it doesn't really matter,
> > > though: to make the lmbench scheduler benchmark go at full speed, you
> > > want to limit it to _one_ CPU, which is not sensible in real-life
> > > situations.
> >
> > Can you clarify? I agree that tuning the system for the best LMbench
> > performance is not a good thing to do! However, in general on an
> > 8 CPU system with only 2 'active' tasks I would think limiting the
> > tasks to 2 CPUs would be desirable for cache effects.
>
> Yes, limiting to 2 CPU's probably gets better cache behaviour, and it
> might be worth looking into why it doesn't. The CPU affinity _should_
> prioritize it down to two, but I haven't thought through your theory about
> IPI latency.
I don't know what it is but I have seen this excessive cpu switching
in the wild. In particular on a dual processor machine I ran 2 cpu
intensive jobs, and a handful of daemons. And the cpu intensive jobs
would switch cpus every couple of seconds.
I was investigating it because on the Athlon I was running on a
customer was getting a super linear speed up. With one processes it
would take 8 minutes, and with 2 processes one would take 8 minutes
and the other would take 6 minutes. Very strange.
These processes except at their very beginning did no I/O and were
pure cpu hogs until they spit out their results. Very puzzling.
I can't see why we would ever want to take the cache miss penalty of
switching cpus, in this case.
Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


george at mvista

Oct 5, 2001, 10:49 AM

Post #19 of 54 (1079 views)
Permalink
Re: Context switch times [In reply to]

Alan Cox wrote:
>
> > I don't quite agree with you that it doesn't matter. A lot of tests
> > (volanomark, other silly things) show that the current scheduler jumps
> > processes from CPU to CPU on SMP boxes far too easily, in addition to the
> > lengthy duration of run queue processing when loaded down. Yes, these
> > applications are leaving too many runnable processes around, but that's
> > the way some large app server loads behave. And right now it makes linux
> > look bad compared to other OSes.
>
> The current scheduler is completely hopeless by the time you hit 3
> processors and a bit flaky on two. Its partly the algorithm and partly
> implementation
>
> #1 We schedule tasks based on a priority that has no cache
> consideration
>
> #2 We have an O(tasks) loop refreshing priority that isnt needed
> (Montavista fix covers this)
>
> #3 We have an O(runnable) loop which isnt ideal
>
> #4 On x86 we are horribly cache pessimal. All the task structs are
> on the same cache colour. Multiple tasks waiting for the same event
> put their variables (like the wait queue) on the same cache line.
>
> The poor cache performance greatly comes from problem 1. Because we prioritize
> blindly on CPU usage with a tiny magic constant fudge factor we are
> executing totally wrong task orders. Instead we need to schedule for cache
> optimal behaviour, and the moderator is the fairness requirement, not the
> other way around.
>
> The classic example is two steady cpu loads and an occasionally waking
> client (like an editor)
>
> We end up scheduling [A, B are the loads, C is the editor)
>
> A C B C A C B C A C B C A
>
> whenever a key is hit we swap CPU hog because the priority balanced shifted.
> Really we should have scheduled something looking more like
>
> A C A C A C B C B C B C B
>
> I would argue there are two needed priorities
>
> 1. CPU usage based priority - the current one
>
> 2. Task priority.
>
> Task priority being set to maximum when a task sleeps and then bounded by
> a function of max(task_priorty, fn(cpupriority)) so that tasks fall down
> priority levels as their cpu usage in the set of time slices. This causes
> tasks that are CPU hogs to sit at the bottom of the pile with the same low
> task_priority meaning they run for long bursts and don't get pre-empted and
> switched with other hogs through each cycle of the scheduler.
Let me see if I have this right. Task priority goes to max on any (?)
sleep regardless of how long. And to min if it doesn't sleep for some
period of time. Where does the time slice counter come into this, if at
all?
For what its worth I am currently updating the MontaVista scheduler so,
I am open to ideas.
George
>
> This isnt idle speculation - I've done some minimal playing with this but
> my initial re-implementation didnt handle SMP at all and I am still not 100%
> sure how to resolve SMP or how SMP will improve out of the current cunning
> plan.
>
> Alan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo [at] vger
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


alan at lxorguk

Oct 5, 2001, 3:29 PM

Post #20 of 54 (1076 views)
Permalink
Re: Context switch times [In reply to]

> Let me see if I have this right. Task priority goes to max on any (?)
> sleep regardless of how long. And to min if it doesn't sleep for some
> period of time. Where does the time slice counter come into this, if at
> all?
>
> For what its worth I am currently updating the MontaVista scheduler so,
> I am open to ideas.
The time slice counter is the limit on the amount of time you can execute,
the priority determines who runs first.
So if you used your cpu quota you will get run reluctantly. If you slept
you will get run early and as you use time slice count you will drop
priority bands, but without pre-emption until you cross a band and there
is another task with higher priority.
This damps down task thrashing a bit, and for the cpu hogs it gets the
desired behaviour - which is that the all run their full quantum in the
background one after another instead of thrashing back and forth
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


davidel at xmailserver

Oct 5, 2001, 3:56 PM

Post #21 of 54 (1075 views)
Permalink
Re: Context switch times [In reply to]

On Fri, 5 Oct 2001, Alan Cox wrote:
> > Let me see if I have this right. Task priority goes to max on any (?)
> > sleep regardless of how long. And to min if it doesn't sleep for some
> > period of time. Where does the time slice counter come into this, if at
> > all?
> >
> > For what its worth I am currently updating the MontaVista scheduler so,
> > I am open to ideas.
>
> The time slice counter is the limit on the amount of time you can execute,
> the priority determines who runs first.
>
> So if you used your cpu quota you will get run reluctantly. If you slept
> you will get run early and as you use time slice count you will drop
> priority bands, but without pre-emption until you cross a band and there
> is another task with higher priority.
>
> This damps down task thrashing a bit, and for the cpu hogs it gets the
> desired behaviour - which is that the all run their full quantum in the
> background one after another instead of thrashing back and forth
What if we give to prev a priority boost P=F(T) where T is the time
prev is ran before the current schedule ?
- Davide
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


alan at lxorguk

Oct 5, 2001, 4:04 PM

Post #22 of 54 (1076 views)
Permalink
Re: Context switch times [In reply to]

> > This damps down task thrashing a bit, and for the cpu hogs it gets the
> > desired behaviour - which is that the all run their full quantum in the
> > background one after another instead of thrashing back and forth
>
> What if we give to prev a priority boost P=F(T) where T is the time
> prev is ran before the current schedule ?
That would be the wrong key. You can argue certainly that it is maybe
appropriate to use some function based on remaining scheduler ticks, but
that already occurs as the scheduler ticks is the upper bound for priority
band
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


davidel at xmailserver

Oct 5, 2001, 4:16 PM

Post #23 of 54 (1075 views)
Permalink
Re: Context switch times [In reply to]

On Sat, 6 Oct 2001, Alan Cox wrote:
> > > This damps down task thrashing a bit, and for the cpu hogs it gets the
> > > desired behaviour - which is that the all run their full quantum in the
> > > background one after another instead of thrashing back and forth
> >
> > What if we give to prev a priority boost P=F(T) where T is the time
> > prev is ran before the current schedule ?
>
> That would be the wrong key. You can argue certainly that it is maybe
> appropriate to use some function based on remaining scheduler ticks, but
> that already occurs as the scheduler ticks is the upper bound for priority
> band
No, i mean T = (Tstart - Tend) where :
Tstart = time the current ( prev ) task has been scheduled
Tend = current time ( in schedule() )
Basically it's the total time the current ( prev ) task has had the CPU
- Davide
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


alan at lxorguk

Oct 5, 2001, 4:17 PM

Post #24 of 54 (1074 views)
Permalink
Re: Context switch times [In reply to]

> No, i mean T = (Tstart - Tend) where :
>
> Tstart = time the current ( prev ) task has been scheduled
> Tend = current time ( in schedule() )
>
> Basically it's the total time the current ( prev ) task has had the CPU
Ok let me ask one question - why ?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


davidel at xmailserver

Oct 5, 2001, 4:21 PM

Post #25 of 54 (1075 views)
Permalink
Re: Context switch times [In reply to]

On Sat, 6 Oct 2001, Alan Cox wrote:
> > No, i mean T = (Tstart - Tend) where :
> >
> > Tstart = time the current ( prev ) task has been scheduled
> > Tend = current time ( in schedule() )
> >
> > Basically it's the total time the current ( prev ) task has had the CPU
>
> Ok let me ask one question - why ?
Because the more the task is ran the higher the cache footprint is and the
higher the cache footprint is the more we'd like to pick up the exiting
task to rerun.
- Davide
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First page Previous page 1 2 3 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.