Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH] sched/rt: fix SCHED_RR across cgroups

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


ccross at android

May 16, 2012, 9:34 PM

Post #1 of 11 (167 views)
Permalink
[PATCH] sched/rt: fix SCHED_RR across cgroups

task_tick_rt has an optimization to only reschedule SCHED_RR tasks
if they were the only element on their rq. However, with cgroups
a SCHED_RR task could be the only element on its per-cgroup rq but
still be competing with other SCHED_RR tasks in its parent's
cgroup. In this case, the SCHED_RR task in the child cgroup would
never yield at the end of its timeslice. If the child cgroup
rt_runtime_us was the same as the parent cgroup rt_runtime_us,
the task in the parent cgroup would starve completely.

Modify task_tick_rt to check that the task is the only task on its
rq, and that the each of the scheduling entities of its ancestors
is also the only entity on its rq.

Signed-off-by: Colin Cross <ccross [at] android>
---
kernel/sched/rt.c | 15 ++++++++++-----
1 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 44af55e..8f32475 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1983,6 +1983,8 @@ static void watchdog(struct rq *rq, struct task_struct *p)

static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
{
+ struct sched_rt_entity *rt_se = &p->rt;
+
update_curr_rt(rq);

watchdog(rq, p);
@@ -2000,12 +2002,15 @@ static void task_tick_rt(struct rq *rq, struct task_struct *p, int queued)
p->rt.time_slice = RR_TIMESLICE;

/*
- * Requeue to the end of queue if we are not the only element
- * on the queue:
+ * Requeue to the end of queue if we (and all of our ancestors) are the
+ * only element on the queue
*/
- if (p->rt.run_list.prev != p->rt.run_list.next) {
- requeue_task_rt(rq, p, 0);
- set_tsk_need_resched(p);
+ for_each_sched_rt_entity(rt_se) {
+ if (rt_se->run_list.prev != rt_se->run_list.next) {
+ requeue_task_rt(rq, p, 0);
+ set_tsk_need_resched(p);
+ return;
+ }
}
}

--
1.7.7.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


a.p.zijlstra at chello

May 18, 2012, 1:56 AM

Post #2 of 11 (150 views)
Permalink
Re: [PATCH] sched/rt: fix SCHED_RR across cgroups [In reply to]

On Wed, 2012-05-16 at 21:34 -0700, Colin Cross wrote:
> task_tick_rt has an optimization to only reschedule SCHED_RR tasks
> if they were the only element on their rq. However, with cgroups
> a SCHED_RR task could be the only element on its per-cgroup rq but
> still be competing with other SCHED_RR tasks in its parent's
> cgroup. In this case, the SCHED_RR task in the child cgroup would
> never yield at the end of its timeslice. If the child cgroup
> rt_runtime_us was the same as the parent cgroup rt_runtime_us,
> the task in the parent cgroup would starve completely.
>
> Modify task_tick_rt to check that the task is the only task on its
> rq, and that the each of the scheduling entities of its ancestors
> is also the only entity on its rq.
>
> Signed-off-by: Colin Cross <ccross [at] android>

OK, fair enough.. one does wonder though, WTH is android doing with
SCHED_RR?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


ccross at android

May 18, 2012, 10:52 AM

Post #3 of 11 (160 views)
Permalink
Re: [PATCH] sched/rt: fix SCHED_RR across cgroups [In reply to]

On Fri, May 18, 2012 at 1:56 AM, Peter Zijlstra <a.p.zijlstra [at] chello> wrote:
> On Wed, 2012-05-16 at 21:34 -0700, Colin Cross wrote:
>> task_tick_rt has an optimization to only reschedule SCHED_RR tasks
>> if they were the only element on their rq.  However, with cgroups
>> a SCHED_RR task could be the only element on its per-cgroup rq but
>> still be competing with other SCHED_RR tasks in its parent's
>> cgroup.  In this case, the SCHED_RR task in the child cgroup would
>> never yield at the end of its timeslice.  If the child cgroup
>> rt_runtime_us was the same as the parent cgroup rt_runtime_us,
>> the task in the parent cgroup would starve completely.
>>
>> Modify task_tick_rt to check that the task is the only task on its
>> rq, and that the each of the scheduling entities of its ancestors
>> is also the only entity on its rq.
>>
>> Signed-off-by: Colin Cross <ccross [at] android>
>
> OK, fair enough.. one does wonder though, WTH is android doing with
> SCHED_RR?

Nothing, I was just experimenting with how it interacted with cgroups
and the numbers didn't make sense.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


a.p.zijlstra at chello

May 18, 2012, 11:37 AM

Post #4 of 11 (152 views)
Permalink
Re: [PATCH] sched/rt: fix SCHED_RR across cgroups [In reply to]

On Fri, 2012-05-18 at 10:52 -0700, Colin Cross wrote:

> > OK, fair enough.. one does wonder though, WTH is android doing with
> > SCHED_RR?
>
> Nothing, I was just experimenting with how it interacted with cgroups
> and the numbers didn't make sense.

OK. Thanks anyway!
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


ccross at android

May 18, 2012, 5:13 PM

Post #5 of 11 (156 views)
Permalink
Re: [PATCH] sched/rt: fix SCHED_RR across cgroups [In reply to]

On Fri, May 18, 2012 at 11:37 AM, Peter Zijlstra <a.p.zijlstra [at] chello> wrote:
> On Fri, 2012-05-18 at 10:52 -0700, Colin Cross wrote:
>
>> > OK, fair enough.. one does wonder though, WTH is android doing with
>> > SCHED_RR?
>>
>> Nothing, I was just experimenting with how it interacted with cgroups
>> and the numbers didn't make sense.
>
> OK. Thanks anyway!

Even with this patch, scheduling of SCHED_RR tasks in cgroups is a
little odd. Each cgroup is treated as a schedulable entity alongside
the tasks in the same parent cgroup, and then the tasks inside the
child cgroup round robin through the child cgroup's time slices. So
in the setup:
root_cgroup
task 1
cgroup
task 2
task 3

The RR will be:
task 1, cgroup(task 2), task 1, cgroup(task 3), ...

task 1 will run twice as often, for a full RR_TIMESLICE each time, as
tasks 2 and 3.

Is that the way SCHED_RR is intended to interact with cgroups?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


raistlin at linux

May 19, 2012, 6:11 AM

Post #6 of 11 (157 views)
Permalink
Re: [PATCH] sched/rt: fix SCHED_RR across cgroups [In reply to]

On Fri, 2012-05-18 at 17:13 -0700, Colin Cross wrote:
> Even with this patch, scheduling of SCHED_RR tasks in cgroups is a
> little odd. Each cgroup is treated as a schedulable entity alongside
> the tasks in the same parent cgroup, and then the tasks inside the
> child cgroup round robin through the child cgroup's time slices. So
> in the setup:
> root_cgroup
> task 1
> cgroup
> task 2
> task 3
>
> The RR will be:
> task 1, cgroup(task 2), task 1, cgroup(task 3), ...
>
> task 1 will run twice as often, for a full RR_TIMESLICE each time, as
> tasks 2 and 3.
>
That looks right to me...

> Is that the way SCHED_RR is intended to interact with cgroups?
>
I would say it is. That's what you get because of putting task1 and
cgroup at the same level in the "hierarchy". I'm curious, what kind of
behaviour were you expecting?

Of course, the actual schedule also depends on the real-time priority of
the various tasks (groups don't have a priority, they inherit it from
their tasks, or at least it was like this when I used to work with
it :-P), but I guess you're putting all the tasks in the same queue
(i.e., same rt-prio), is it that the case?

Dario

> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo [at] vger
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Attachments: signature.asc (0.19 KB)


ccross at android

May 19, 2012, 1:37 PM

Post #7 of 11 (147 views)
Permalink
Re: [PATCH] sched/rt: fix SCHED_RR across cgroups [In reply to]

On Sat, May 19, 2012 at 6:11 AM, Dario Faggioli <raistlin [at] linux> wrote:
> On Fri, 2012-05-18 at 17:13 -0700, Colin Cross wrote:
>> Even with this patch, scheduling of SCHED_RR tasks in cgroups is a
>> little odd.  Each cgroup is treated as a schedulable entity alongside
>> the tasks in the same parent cgroup, and then the tasks inside the
>> child cgroup round robin through the child cgroup's time slices.  So
>> in the setup:
>> root_cgroup
>>    task 1
>>    cgroup
>>       task 2
>>       task 3
>>
>> The RR will be:
>> task 1, cgroup(task 2), task 1, cgroup(task 3), ...
>>
>> task 1 will run twice as often, for a full RR_TIMESLICE each time, as
>> tasks 2 and 3.
>>
> That looks right to me...
>
>> Is that the way SCHED_RR is intended to interact with cgroups?
>>
> I would say it is. That's what you get because of putting task1 and
> cgroup at the same level in the "hierarchy". I'm curious, what kind of
> behaviour were you expecting?

That behavior matches exactly with scheduling of normal tasks and
cgroups with default cpu.shares, but doesn't match too well with what
I can see of the posix SCHED_RR description, which suggests all the
SCHED_RR threads go into a single queue. I was just curious if the
behavior my patch restored was correct, since it can't be adjusted by
tweaking any parameters like cpu.shares.

> Of course, the actual schedule also depends on the real-time priority of
> the various tasks (groups don't have a priority, they inherit it from
> their tasks, or at least it was like this when I used to work with
> it :-P), but I guess you're putting all the tasks in the same queue
> (i.e., same rt-prio), is it that the case?

Yes.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


raistlin at linux

May 23, 2012, 6:32 AM

Post #8 of 11 (143 views)
Permalink
Re: [PATCH] sched/rt: fix SCHED_RR across cgroups [In reply to]

On Sat, 2012-05-19 at 13:37 -0700, Colin Cross wrote:
> > I would say it is. That's what you get because of putting task1 and
> > cgroup at the same level in the "hierarchy". I'm curious, what kind of
> > behaviour were you expecting?
>
> That behavior matches exactly with scheduling of normal tasks and
> cgroups with default cpu.shares, but doesn't match too well with what
> I can see of the posix SCHED_RR description, which suggests all the
> SCHED_RR threads go into a single queue. I was just curious if the
> behavior my patch restored was correct, since it can't be adjusted by
> tweaking any parameters like cpu.shares.
>
Again, I really think it is the intended behaviour, and yes, real-time
group scheduling "breaks" the POSIX specification of the SCHED_{FIFO,RR}
policies intentionally (and _proudly_, as Peter would say it, am I
wrong? :-P).

Dario

--
<<This happens because I choose it to happen!>> (Raistlin Majere)
-----------------------------------------------------------------
Dario Faggioli, Ph.D, http://retis.sssup.it/people/faggioli
Senior Software Engineer, Citrix Systems R&D Ltd., Cambridge (UK)
Attachments: signature.asc (0.19 KB)


a.p.zijlstra at chello

May 25, 2012, 4:52 AM

Post #9 of 11 (141 views)
Permalink
Re: [PATCH] sched/rt: fix SCHED_RR across cgroups [In reply to]

On Wed, 2012-05-23 at 15:32 +0200, Dario Faggioli wrote:
> Again, I really think it is the intended behaviour, and yes, real-time
> group scheduling "breaks" the POSIX specification of the SCHED_{FIFO,RR}
> policies intentionally (and _proudly_, as Peter would say it, am I
> wrong? :-P).

No, cgroups are well outside of POSIX ;-) as is SMP in fact.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


rostedt at goodmis

May 25, 2012, 6:12 AM

Post #10 of 11 (142 views)
Permalink
Re: [PATCH] sched/rt: fix SCHED_RR across cgroups [In reply to]

On Fri, 2012-05-25 at 13:52 +0200, Peter Zijlstra wrote:
> On Wed, 2012-05-23 at 15:32 +0200, Dario Faggioli wrote:
> > Again, I really think it is the intended behaviour, and yes, real-time
> > group scheduling "breaks" the POSIX specification of the SCHED_{FIFO,RR}
> > policies intentionally (and _proudly_, as Peter would say it, am I
> > wrong? :-P).
>
> No, cgroups are well outside of POSIX ;-) as is SMP in fact.

That's because the POSIX standards committee is still struggling to come
up with standardized SMP calls to handle NR_CPUS = 0

Isn't Paul on that committee? ;-)

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


paulmck at linux

May 25, 2012, 10:55 AM

Post #11 of 11 (136 views)
Permalink
Re: [PATCH] sched/rt: fix SCHED_RR across cgroups [In reply to]

On Fri, May 25, 2012 at 09:12:06AM -0400, Steven Rostedt wrote:
> On Fri, 2012-05-25 at 13:52 +0200, Peter Zijlstra wrote:
> > On Wed, 2012-05-23 at 15:32 +0200, Dario Faggioli wrote:
> > > Again, I really think it is the intended behaviour, and yes, real-time
> > > group scheduling "breaks" the POSIX specification of the SCHED_{FIFO,RR}
> > > policies intentionally (and _proudly_, as Peter would say it, am I
> > > wrong? :-P).
> >
> > No, cgroups are well outside of POSIX ;-) as is SMP in fact.
>
> That's because the POSIX standards committee is still struggling to come
> up with standardized SMP calls to handle NR_CPUS = 0

;-) ;-) ;-)

> Isn't Paul on that committee? ;-)

I have met with them occasionally, but have spent most of my time on
the C/C++ committees. I didn't try sounding them out on NR_CPUS=0,
partly because they were choking pretty hard on NR_CPUS=4096.

At least part of the problem is that every OS out there has different
SMP feature, so the only way it would be possible to get this sort of
thing through the committee would be to invent something that was
roughly equally incompatible with everyone.

However, there are some SMP features standardized by various random
committees and aggregated by The Open Group, including:

o The pthread_mutex_lock() API
o pthread_getspecific() and pthread_setspecific()
o pthread_getconcurrency() and pthread_setconcurrency()

But yes, even the aggregated standard is quite limiting.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.