Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH] fix granularity of task_u/stime(), v2

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


seto.hidetoshi at jp

Nov 11, 2009, 8:33 PM

Post #1 of 16 (335 views)
Permalink
[PATCH] fix granularity of task_u/stime(), v2

Originally task_s/utime() were designed to return clock_t but later
changed to return cputime_t by following commit:

commit efe567fc8281661524ffa75477a7c4ca9b466c63
Author: Christian Borntraeger <borntraeger [at] de>
Date: Thu Aug 23 15:18:02 2007 +0200

It only changed the type of return value, but not the implementation.
As the result the granularity of task_s/utime() is still that of
clock_t, not that of cputime_t.

So using task_s/utime() in __exit_signal() makes values accumulated
to the signal struct to be rounded and coarse grained.

This patch removes casts to clock_t in task_u/stime(), to keep
granularity of cputime_t over the calculation.

v2:
Use div_u64() to avoid error "undefined reference to `__udivdi3`"
on some 32bit systems.

Signed-off-by: Hidetoshi Seto <seto.hidetoshi [at] jp>
---
kernel/sched.c | 22 +++++++++++++---------
1 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/kernel/sched.c b/kernel/sched.c
index 3c11ae0..1f8d028 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5172,41 +5172,45 @@ cputime_t task_stime(struct task_struct *p)
return p->stime;
}
#else
+
+#ifndef nsecs_to_cputime
+# define nsecs_to_cputime(__nsecs) \
+ msecs_to_cputime(div_u64((__nsecs), NSEC_PER_MSEC))
+#endif
+
cputime_t task_utime(struct task_struct *p)
{
- clock_t utime = cputime_to_clock_t(p->utime),
- total = utime + cputime_to_clock_t(p->stime);
+ cputime_t utime = p->utime, total = utime + p->stime;
u64 temp;

/*
* Use CFS's precise accounting:
*/
- temp = (u64)nsec_to_clock_t(p->se.sum_exec_runtime);
+ temp = (u64)nsecs_to_cputime(p->se.sum_exec_runtime);

if (total) {
temp *= utime;
do_div(temp, total);
}
- utime = (clock_t)temp;
+ utime = (cputime_t)temp;

- p->prev_utime = max(p->prev_utime, clock_t_to_cputime(utime));
+ p->prev_utime = max(p->prev_utime, utime);
return p->prev_utime;
}

cputime_t task_stime(struct task_struct *p)
{
- clock_t stime;
+ cputime_t stime;

/*
* Use CFS's precise accounting. (we subtract utime from
* the total, to make sure the total observed by userspace
* grows monotonically - apps rely on that):
*/
- stime = nsec_to_clock_t(p->se.sum_exec_runtime) -
- cputime_to_clock_t(task_utime(p));
+ stime = nsecs_to_cputime(p->se.sum_exec_runtime) - task_utime(p);

if (stime >= 0)
- p->prev_stime = max(p->prev_stime, clock_t_to_cputime(stime));
+ p->prev_stime = max(p->prev_stime, stime);

return p->prev_stime;
}
--
1.6.5.2


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peterz at infradead

Nov 12, 2009, 6:15 AM

Post #2 of 16 (316 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

On Thu, 2009-11-12 at 13:33 +0900, Hidetoshi Seto wrote:
> Originally task_s/utime() were designed to return clock_t but later
> changed to return cputime_t by following commit:
>
> commit efe567fc8281661524ffa75477a7c4ca9b466c63
> Author: Christian Borntraeger <borntraeger [at] de>
> Date: Thu Aug 23 15:18:02 2007 +0200
>
> It only changed the type of return value, but not the implementation.
> As the result the granularity of task_s/utime() is still that of
> clock_t, not that of cputime_t.
>
> So using task_s/utime() in __exit_signal() makes values accumulated
> to the signal struct to be rounded and coarse grained.
>
> This patch removes casts to clock_t in task_u/stime(), to keep
> granularity of cputime_t over the calculation.
>
> v2:
> Use div_u64() to avoid error "undefined reference to `__udivdi3`"
> on some 32bit systems.
>
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi [at] jp>

/me hates all the cputime_t and clock_t mess.. but I guess the patch is
good.

Thanks.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


sgruszka at redhat

Nov 12, 2009, 6:49 AM

Post #3 of 16 (318 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

Hi

On Thu, Nov 12, 2009 at 01:33:45PM +0900, Hidetoshi Seto wrote:
> Originally task_s/utime() were designed to return clock_t but later
> changed to return cputime_t by following commit:
>
> commit efe567fc8281661524ffa75477a7c4ca9b466c63
> Author: Christian Borntraeger <borntraeger [at] de>
> Date: Thu Aug 23 15:18:02 2007 +0200
>
> It only changed the type of return value, but not the implementation.
> As the result the granularity of task_s/utime() is still that of
> clock_t, not that of cputime_t.
>
> So using task_s/utime() in __exit_signal() makes values accumulated
> to the signal struct to be rounded and coarse grained.
>
> This patch removes casts to clock_t in task_u/stime(), to keep
> granularity of cputime_t over the calculation.
>
> v2:
> Use div_u64() to avoid error "undefined reference to `__udivdi3`"
> on some 32bit systems.
>
> Signed-off-by: Hidetoshi Seto <seto.hidetoshi [at] jp>
> ---
> kernel/sched.c | 22 +++++++++++++---------
> 1 files changed, 13 insertions(+), 9 deletions(-)

Patch not fix the issue on my system. I test it alone, together with

posix-cpu-timers: avoid do_sys_times() races with __exit_signal(

and (further) together with

--- a/kernel/posix-cpu-timers.c
+++ b/kernel/posix-cpu-timers.c
@@ -248,8 +248,8 @@ void thread_group_cputime(struct task_struct *tsk,
struct task_cputime *times)

t = tsk;
do {
- times->utime = cputime_add(times->utime, t->utime);
- times->stime = cputime_add(times->stime, t->stime);
+ times->utime = cputime_add(times->utime, task_utime(t));
+ times->stime = cputime_add(times->stime, task_stime(t));
times->sum_exec_runtime += t->se.sum_exec_runtime;

t = next_thread(t);

What only changed was probability to enter the issue. I can not reproduce
the bug with below patch:

diff --git a/kernel/exit.c b/kernel/exit.c
index f7864ac..b85e384 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -110,8 +110,8 @@ static void __exit_signal(struct task_struct *tsk)
* We won't ever get here for the group leader, since it
* will have been the last reference on the signal_struct.
*/
- sig->utime = cputime_add(sig->utime, task_utime(tsk));
- sig->stime = cputime_add(sig->stime, task_stime(tsk));
+ sig->utime = cputime_add(sig->utime, tsk->utime); //task_utime(tsk));
+ sig->stime = cputime_add(sig->stime, tsk->stime); //task_stime(tsk));
sig->gtime = cputime_add(sig->gtime, task_gtime(tsk));
sig->min_flt += tsk->min_flt;
sig->maj_flt += tsk->maj_flt;
diff --git a/kernel/sys.c b/kernel/sys.c
index ce17760..8be5b75 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -914,8 +914,8 @@ void do_sys_times(struct tms *tms)
struct task_cputime cputime;
cputime_t cutime, cstime;

- thread_group_cputime(current, &cputime);
spin_lock_irq(&current->sighand->siglock);
+ thread_group_cputime(current, &cputime);
cutime = current->signal->cutime;
cstime = current->signal->cstime;
spin_unlock_irq(&current->sighand->siglock);

Perhaps we can remove task_{u,s}time() in some places or maybe at whole ?

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peterz at infradead

Nov 12, 2009, 7:00 AM

Post #4 of 16 (316 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

On Thu, 2009-11-12 at 15:49 +0100, Stanislaw Gruszka wrote:
> I can not reproduce
> the bug with below patch:
>
> diff --git a/kernel/exit.c b/kernel/exit.c
> index f7864ac..b85e384 100644
> --- a/kernel/exit.c
> +++ b/kernel/exit.c
> @@ -110,8 +110,8 @@ static void __exit_signal(struct task_struct *tsk)
> * We won't ever get here for the group leader, since it
> * will have been the last reference on the signal_struct.
> */
> - sig->utime = cputime_add(sig->utime, task_utime(tsk));
> - sig->stime = cputime_add(sig->stime, task_stime(tsk));
> + sig->utime = cputime_add(sig->utime, tsk->utime); //task_utime(tsk));
> + sig->stime = cputime_add(sig->stime, tsk->stime); //task_stime(tsk));
> sig->gtime = cputime_add(sig->gtime, task_gtime(tsk));
> sig->min_flt += tsk->min_flt;
> sig->maj_flt += tsk->maj_flt;

Yes, this is the thing I suggested and makes sense.

> diff --git a/kernel/sys.c b/kernel/sys.c
> index ce17760..8be5b75 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -914,8 +914,8 @@ void do_sys_times(struct tms *tms)
> struct task_cputime cputime;
> cputime_t cutime, cstime;
>
> - thread_group_cputime(current, &cputime);
> spin_lock_irq(&current->sighand->siglock);
> + thread_group_cputime(current, &cputime);
> cutime = current->signal->cutime;
> cstime = current->signal->cstime;
> spin_unlock_irq(&current->sighand->siglock);
>
> Perhaps we can remove task_{u,s}time() in some places or maybe at whole ?

I think task_[us]time() still has value for getrusage() since it avoids
the task hiding from top by never running during the tick crap.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


sgruszka at redhat

Nov 12, 2009, 7:40 AM

Post #5 of 16 (317 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

On Thu, Nov 12, 2009 at 04:00:38PM +0100, Peter Zijlstra wrote:
> On Thu, 2009-11-12 at 15:49 +0100, Stanislaw Gruszka wrote:
> > I can not reproduce
> > the bug with below patch:
> >
> > diff --git a/kernel/exit.c b/kernel/exit.c
> > index f7864ac..b85e384 100644
> > --- a/kernel/exit.c
> > +++ b/kernel/exit.c
> > @@ -110,8 +110,8 @@ static void __exit_signal(struct task_struct *tsk)
> > * We won't ever get here for the group leader, since it
> > * will have been the last reference on the signal_struct.
> > */
> > - sig->utime = cputime_add(sig->utime, task_utime(tsk));
> > - sig->stime = cputime_add(sig->stime, task_stime(tsk));
> > + sig->utime = cputime_add(sig->utime, tsk->utime); //task_utime(tsk));
> > + sig->stime = cputime_add(sig->stime, tsk->stime); //task_stime(tsk));
> > sig->gtime = cputime_add(sig->gtime, task_gtime(tsk));
> > sig->min_flt += tsk->min_flt;
> > sig->maj_flt += tsk->maj_flt;
>
> Yes, this is the thing I suggested and makes sense.

Ok, I'm going to post this.

Spencer,

seems you have more test cases for utime decreasing issues,
could you send links to me ? Somehow I could not find them
by my own. Particularly test case used in development this commit
is interested:

commit 49048622eae698e5c4ae61f7e71200f265ccc529
Author: Balbir Singh <balbir [at] linux>
Date: Fri Sep 5 18:12:23 2008 +0200
sched: fix process time monotonicity

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


spencer at bluehost

Nov 16, 2009, 11:32 AM

Post #6 of 16 (316 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

Stanislaw Gruszka wrote:
> Ok, I'm going to post this.
>
> Spencer,
>
> seems you have more test cases for utime decreasing issues,
> could you send links to me ? Somehow I could not find them
> by my own. Particularly test case used in development this commit
> is interested:
>
> commit 49048622eae698e5c4ae61f7e71200f265ccc529
> Author: Balbir Singh <balbir [at] linux>
> Date: Fri Sep 5 18:12:23 2008 +0200
> sched: fix process time monotonicity

I had originally noticed that in a production web server, so my test
case was designed to mirror what I was seeing there, which was just
running apache with worker mpm, and running a simple apache bench while
watching the utime/stime of the apache children. Unfortunately that
method was not terribly reliable at reproducing the issue, which is why
I felt it necessary to try to come up with a better test case this time
around.

- Spencer
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


sgruszka at redhat

Nov 17, 2009, 5:08 AM

Post #7 of 16 (312 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

On Mon, Nov 16, 2009 at 12:32:43PM -0700, Spencer Candland wrote:
> > seems you have more test cases for utime decreasing issues,
> > could you send links to me ? Somehow I could not find them
> > by my own. Particularly test case used in development this commit
> > is interested:
> >
> > commit 49048622eae698e5c4ae61f7e71200f265ccc529
> > Author: Balbir Singh <balbir [at] linux>
> > Date: Fri Sep 5 18:12:23 2008 +0200
> > sched: fix process time monotonicity
>
> I had originally noticed that in a production web server, so my test
> case was designed to mirror what I was seeing there, which was just
> running apache with worker mpm, and running a simple apache bench while
> watching the utime/stime of the apache children. Unfortunately that
> method was not terribly reliable at reproducing the issue, which is why
> I felt it necessary to try to come up with a better test case this time
> around.

No wonder I could not find anything on google and in mailing list
archives :)

Seems issue reported then was exactly the same as reported now by
you. Looks like commit 49048622eae698e5c4ae61f7e71200f265ccc529 just
make probability of bug smaller and you did not note it until now.

Could you please test this patch, if it solve all utime decrease
problems for you:

http://patchwork.kernel.org/patch/59795/

If you confirm it work, I think we should apply it. Otherwise
we need to go to propagate task_{u,s}time everywhere, which is not
(my) preferred solution.

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peterz at infradead

Nov 17, 2009, 5:24 AM

Post #8 of 16 (310 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

On Tue, 2009-11-17 at 14:08 +0100, Stanislaw Gruszka wrote:
> On Mon, Nov 16, 2009 at 12:32:43PM -0700, Spencer Candland wrote:
> > > seems you have more test cases for utime decreasing issues,
> > > could you send links to me ? Somehow I could not find them
> > > by my own. Particularly test case used in development this commit
> > > is interested:
> > >
> > > commit 49048622eae698e5c4ae61f7e71200f265ccc529
> > > Author: Balbir Singh <balbir [at] linux>
> > > Date: Fri Sep 5 18:12:23 2008 +0200
> > > sched: fix process time monotonicity
> >
> > I had originally noticed that in a production web server, so my test
> > case was designed to mirror what I was seeing there, which was just
> > running apache with worker mpm, and running a simple apache bench while
> > watching the utime/stime of the apache children. Unfortunately that
> > method was not terribly reliable at reproducing the issue, which is why
> > I felt it necessary to try to come up with a better test case this time
> > around.
>
> No wonder I could not find anything on google and in mailing list
> archives :)
>
> Seems issue reported then was exactly the same as reported now by
> you. Looks like commit 49048622eae698e5c4ae61f7e71200f265ccc529 just
> make probability of bug smaller and you did not note it until now.
>
> Could you please test this patch, if it solve all utime decrease
> problems for you:
>
> http://patchwork.kernel.org/patch/59795/
>
> If you confirm it work, I think we should apply it. Otherwise
> we need to go to propagate task_{u,s}time everywhere, which is not
> (my) preferred solution.

That patch will create another issue, it will allow a process to hide
from top by arranging to never run when the tick hits.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


spencer at bluehost

Nov 18, 2009, 2:38 PM

Post #9 of 16 (299 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

Stanislaw Gruszka wrote:
> Could you please test this patch, if it solve all utime decrease
> problems for you:
>
> http://patchwork.kernel.org/patch/59795/

Yes, this seems to solve the problem.

- Spencer
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


sgruszka at redhat

Nov 19, 2009, 10:17 AM

Post #10 of 16 (289 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

On Tue, Nov 17, 2009 at 02:24:48PM +0100, Peter Zijlstra wrote:
> > Seems issue reported then was exactly the same as reported now by
> > you. Looks like commit 49048622eae698e5c4ae61f7e71200f265ccc529 just
> > make probability of bug smaller and you did not note it until now.
> >
> > Could you please test this patch, if it solve all utime decrease
> > problems for you:
> >
> > http://patchwork.kernel.org/patch/59795/
> >
> > If you confirm it work, I think we should apply it. Otherwise
> > we need to go to propagate task_{u,s}time everywhere, which is not
> > (my) preferred solution.
>
> That patch will create another issue, it will allow a process to hide
> from top by arranging to never run when the tick hits.

What about that?

diff --git a/kernel/sched.c b/kernel/sched.c
index 1f8d028..9db1cbc 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5194,7 +5194,7 @@ cputime_t task_utime(struct task_struct *p)
}
utime = (cputime_t)temp;

- p->prev_utime = max(p->prev_utime, utime);
+ p->prev_utime = max(p->prev_utime, max(p->utime, utime));
return p->prev_utime;
}

diff --git a/kernel/sys.c b/kernel/sys.c
index ce17760..8be5b75 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -914,8 +914,8 @@ void do_sys_times(struct tms *tms)
struct task_cputime cputime;
cputime_t cutime, cstime;

- thread_group_cputime(current, &cputime);
spin_lock_irq(&current->sighand->siglock);
+ thread_group_cputime(current, &cputime);
cutime = current->signal->cutime;
cstime = current->signal->cstime;
spin_unlock_irq(&current->sighand->siglock);

It's on top of Hidetoshi patch and fix utime decrease problem
on my system.

Are we not doing something nasty here?

cputime_t utime = p->utime, total = utime + p->stime;
u64 temp;

/*
* Use CFS's precise accounting:
*/
temp = (u64)nsecs_to_cputime(p->se.sum_exec_runtime);

if (total) {
temp *= utime;
do_div(temp, total);
}
utime = (cputime_t)temp;

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


seto.hidetoshi at jp

Nov 19, 2009, 6:00 PM

Post #11 of 16 (291 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

Stanislaw Gruszka wrote:
> On Tue, Nov 17, 2009 at 02:24:48PM +0100, Peter Zijlstra wrote:
>>> Seems issue reported then was exactly the same as reported now by
>>> you. Looks like commit 49048622eae698e5c4ae61f7e71200f265ccc529 just
>>> make probability of bug smaller and you did not note it until now.
>>>
>>> Could you please test this patch, if it solve all utime decrease
>>> problems for you:
>>>
>>> http://patchwork.kernel.org/patch/59795/
>>>
>>> If you confirm it work, I think we should apply it. Otherwise
>>> we need to go to propagate task_{u,s}time everywhere, which is not
>>> (my) preferred solution.
>> That patch will create another issue, it will allow a process to hide
>> from top by arranging to never run when the tick hits.
>

Yes, nowadays there are many threads on high speed hardware,
such process can exist all around, easier than before.

E.g. assume that there are 2 tasks:

Task A: interrupted by timer few times
(utime, stime, se.sum_sched_runtime) = (50, 50, 1000000000)

Task B: interrupted by timer many times
(utime, stime, se.sum_sched_runtime) = (50, 50, 10000000)

You can see task_[su]time() works well for these tasks.

> What about that?
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 1f8d028..9db1cbc 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -5194,7 +5194,7 @@ cputime_t task_utime(struct task_struct *p)
> }
> utime = (cputime_t)temp;
>
> - p->prev_utime = max(p->prev_utime, utime);
> + p->prev_utime = max(p->prev_utime, max(p->utime, utime));
> return p->prev_utime;
> }

I think this makes things worse.

without this patch:
Task A prev_utime: 500 ms (= accurate)
Task B prev_utime: 5 ms (= accurate)
with this patch:
Task A prev_utime: 500 ms (= accurate)
Task B prev_utime: 50 ms (= not accurate)

Note that task_stime() calculates prev_stime using this prev_utime:

without this patch:
Task A prev_stime: 500 ms (= accurate)
Task B prev_stime: 5 ms (= not accurate)
with this patch:
Task A prev_stime: 500 ms (= accurate)
Task B prev_stime: 0 ms (= not accurate)

>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index ce17760..8be5b75 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -914,8 +914,8 @@ void do_sys_times(struct tms *tms)
> struct task_cputime cputime;
> cputime_t cutime, cstime;
>
> - thread_group_cputime(current, &cputime);
> spin_lock_irq(&current->sighand->siglock);
> + thread_group_cputime(current, &cputime);
> cutime = current->signal->cutime;
> cstime = current->signal->cstime;
> spin_unlock_irq(&current->sighand->siglock);
>
> It's on top of Hidetoshi patch and fix utime decrease problem
> on my system.

How about the stime decrease problem which can be caused by same
logic?

According to my labeling, there are 2 unresolved problem [1]
"thread_group_cputime() vs exit" and [2] "use of task_s/utime()".

Still I believe the real fix for this problem is combination of
above fix for do_sys_times() (for problem[1]) and (I know it is
not preferred, but for [2]) the following:

>> diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
>> >> index 5c9dc22..e065b8a 100644
>> >> --- a/kernel/posix-cpu-timers.c
>> >> +++ b/kernel/posix-cpu-timers.c
>> >> @@ -248,8 +248,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>> >>
>> >> t = tsk;
>> >> do {
>> >> - times->utime = cputime_add(times->utime, t->utime);
>> >> - times->stime = cputime_add(times->stime, t->stime);
>> >> + times->utime = cputime_add(times->utime, task_utime(t));
>> >> + times->stime = cputime_add(times->stime, task_stime(t));
>> >> times->sum_exec_runtime += t->se.sum_exec_runtime;
>> >>
>> >> t = next_thread(t);

Think about this diff, assuming task C is in same group of task A and B:

sys_times() on C while A and B are living returns:
(utime, stime)
= task_[su]time(C) + ([su]time(A)+[su]time(B)+...) + in_signal(exited)
= task_[su]time(C) + ( (50,50) + (50,50) +...) + in_signal(exited)
If A exited, it increases:
(utime, stime)
= task_[su]time(C) + ([su]time(B)+...) + in_signal(exited)+task_[su]time(A)
= task_[su]time(C) + ( (50,50) +...) + in_signal(exited)+(500,500)
Otherwise if B exited, it decreases:
(utime, stime)
= task_[su]time(C) + ([su]time(A)+...) + in_signal(exited)+task_[su]time(B)
= task_[su]time(C) + ( (50,50) +...) + in_signal(exited)+(5,5)

With this fix, sys_times() returns:
(utime, stime)
= task_[su]time(C) + (task_[su]time(A)+task_[su]time(B)+...) + in_signal(exited)
= task_[su]time(C) + ( (500,500) + (5,5) +...) + in_signal(exited)

> Are we not doing something nasty here?
>
> cputime_t utime = p->utime, total = utime + p->stime;
> u64 temp;
>
> /*
> * Use CFS's precise accounting:
> */
> temp = (u64)nsecs_to_cputime(p->se.sum_exec_runtime);
>
> if (total) {
> temp *= utime;
> do_div(temp, total);
> }
> utime = (cputime_t)temp;

Not here, but doing do_div() for each thread could be said nasty.
I mean
__task_[su]time(sum(A, B, ...))
would be better than:
sum(task_[su]time(A)+task_[su]time(B)+...)

However it would bring another issue, because
__task_[su]time(sum(A, B, ...))
might not equal to
__task_[su]time(sum(B, ...)) + task_[su]time(A)


Thanks,
H.Seto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


sgruszka at redhat

Nov 23, 2009, 1:52 AM

Post #12 of 16 (281 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

On Thu, Nov 12, 2009 at 03:49:19PM +0100, Stanislaw Gruszka wrote:
> On Thu, Nov 12, 2009 at 01:33:45PM +0900, Hidetoshi Seto wrote:
> > Originally task_s/utime() were designed to return clock_t but later
> > changed to return cputime_t by following commit:
> >
> > commit efe567fc8281661524ffa75477a7c4ca9b466c63
> > Author: Christian Borntraeger <borntraeger [at] de>
> > Date: Thu Aug 23 15:18:02 2007 +0200
> >
> > It only changed the type of return value, but not the implementation.
> > As the result the granularity of task_s/utime() is still that of
> > clock_t, not that of cputime_t.
> >
> > So using task_s/utime() in __exit_signal() makes values accumulated
> > to the signal struct to be rounded and coarse grained.
> >
> > This patch removes casts to clock_t in task_u/stime(), to keep
> > granularity of cputime_t over the calculation.
> >
> > v2:
> > Use div_u64() to avoid error "undefined reference to `__udivdi3`"
> > on some 32bit systems.
> >
> > Signed-off-by: Hidetoshi Seto <seto.hidetoshi [at] jp>
> > ---
> > kernel/sched.c | 22 +++++++++++++---------
> > 1 files changed, 13 insertions(+), 9 deletions(-)
>
> Patch not fix the issue on my system. I test it alone, together with
>
> posix-cpu-timers: avoid do_sys_times() races with __exit_signal(
>
> and (further) together with
>
> --- a/kernel/posix-cpu-timers.c
> +++ b/kernel/posix-cpu-timers.c
> @@ -248,8 +248,8 @@ void thread_group_cputime(struct task_struct *tsk,
> struct task_cputime *times)
>
> t = tsk;
> do {
> - times->utime = cputime_add(times->utime, t->utime);
> - times->stime = cputime_add(times->stime, t->stime);
> + times->utime = cputime_add(times->utime, task_utime(t));
> + times->stime = cputime_add(times->stime, task_stime(t));
> times->sum_exec_runtime += t->se.sum_exec_runtime;
>
> t = next_thread(t);
>
> What only changed was probability to enter the issue.

I was wrong here, that combination fix the problem on my system. I don't
know how I was testing it before, perhaps I booted wrong kernel.

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


sgruszka at redhat

Nov 23, 2009, 2:09 AM

Post #13 of 16 (287 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

On Fri, Nov 20, 2009 at 11:00:21AM +0900, Hidetoshi Seto wrote:
> >>> Could you please test this patch, if it solve all utime decrease
> >>> problems for you:
> >>>
> >>> http://patchwork.kernel.org/patch/59795/
> >>>
> >>> If you confirm it work, I think we should apply it. Otherwise
> >>> we need to go to propagate task_{u,s}time everywhere, which is not
> >>> (my) preferred solution.
> >> That patch will create another issue, it will allow a process to hide
> >> from top by arranging to never run when the tick hits.
> >
>
> Yes, nowadays there are many threads on high speed hardware,
> such process can exist all around, easier than before.
>
> E.g. assume that there are 2 tasks:
>
> Task A: interrupted by timer few times
> (utime, stime, se.sum_sched_runtime) = (50, 50, 1000000000)
> => total of runtime is 1 sec, but utime + stime is 100 ms
>
> Task B: interrupted by timer many times
> (utime, stime, se.sum_sched_runtime) = (50, 50, 10000000)
> => total of runtime is 10 ms, but utime + stime is 100 ms

How tis is probable, that task is running very long, but not getting
the ticks ? I know this is possible, otherwise we will not see utime
decreasing after do_sys_times() siglock fix, but how probable?

> You can see task_[su]time() works well for these tasks.
>
> > What about that?
> >
> > diff --git a/kernel/sched.c b/kernel/sched.c
> > index 1f8d028..9db1cbc 100644
> > --- a/kernel/sched.c
> > +++ b/kernel/sched.c
> > @@ -5194,7 +5194,7 @@ cputime_t task_utime(struct task_struct *p)
> > }
> > utime = (cputime_t)temp;
> >
> > - p->prev_utime = max(p->prev_utime, utime);
> > + p->prev_utime = max(p->prev_utime, max(p->utime, utime));
> > return p->prev_utime;
> > }
>
> I think this makes things worse.
>
> without this patch:
> Task A prev_utime: 500 ms (= accurate)
> Task B prev_utime: 5 ms (= accurate)
> with this patch:
> Task A prev_utime: 500 ms (= accurate)
> Task B prev_utime: 50 ms (= not accurate)
>
> Note that task_stime() calculates prev_stime using this prev_utime:
>
> without this patch:
> Task A prev_stime: 500 ms (= accurate)
> Task B prev_stime: 5 ms (= not accurate)
> with this patch:
> Task A prev_stime: 500 ms (= accurate)
> Task B prev_stime: 0 ms (= not accurate)
>
> >
> > diff --git a/kernel/sys.c b/kernel/sys.c
> > index ce17760..8be5b75 100644
> > --- a/kernel/sys.c
> > +++ b/kernel/sys.c
> > @@ -914,8 +914,8 @@ void do_sys_times(struct tms *tms)
> > struct task_cputime cputime;
> > cputime_t cutime, cstime;
> >
> > - thread_group_cputime(current, &cputime);
> > spin_lock_irq(&current->sighand->siglock);
> > + thread_group_cputime(current, &cputime);
> > cutime = current->signal->cutime;
> > cstime = current->signal->cstime;
> > spin_unlock_irq(&current->sighand->siglock);
> >
> > It's on top of Hidetoshi patch and fix utime decrease problem
> > on my system.
>
> How about the stime decrease problem which can be caused by same
> logic?

Yes, above patch screw up stime. Below should be a bit better, but
not solve objections you have:

diff --git a/kernel/exit.c b/kernel/exit.c
index f7864ac..17491ad 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -91,6 +91,8 @@ static void __exit_signal(struct task_struct *tsk)
if (atomic_dec_and_test(&sig->count))
posix_cpu_timers_exit_group(tsk);
else {
+ cputime_t utime, stime;
+
/*
* If there is any task waiting for the group exit
* then notify it:
@@ -110,8 +112,16 @@ static void __exit_signal(struct task_struct *tsk)
* We won't ever get here for the group leader, since it
* will have been the last reference on the signal_struct.
*/
- sig->utime = cputime_add(sig->utime, task_utime(tsk));
- sig->stime = cputime_add(sig->stime, task_stime(tsk));
+
+ utime = task_utime(tsk);
+ stime = task_stime(tsk);
+ if (tsk->utime > utime || tsk->stime > stime) {
+ utime = tsk->utime;
+ stime = tsk->stime;
+ }
+
+ sig->utime = cputime_add(sig->utime, utime);
+ sig->stime = cputime_add(sig->stime, stime);
sig->gtime = cputime_add(sig->gtime, task_gtime(tsk));
sig->min_flt += tsk->min_flt;
sig->maj_flt += tsk->maj_flt;

> According to my labeling, there are 2 unresolved problem [1]
> "thread_group_cputime() vs exit" and [2] "use of task_s/utime()".
>
> Still I believe the real fix for this problem is combination of
> above fix for do_sys_times() (for problem[1]) and (I know it is
> not preferred, but for [2]) the following:
>
> >> diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
> >> >> index 5c9dc22..e065b8a 100644
> >> >> --- a/kernel/posix-cpu-timers.c
> >> >> +++ b/kernel/posix-cpu-timers.c
> >> >> @@ -248,8 +248,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
> >> >>
> >> >> t = tsk;
> >> >> do {
> >> >> - times->utime = cputime_add(times->utime, t->utime);
> >> >> - times->stime = cputime_add(times->stime, t->stime);
> >> >> + times->utime = cputime_add(times->utime, task_utime(t));
> >> >> + times->stime = cputime_add(times->stime, task_stime(t));
> >> >> times->sum_exec_runtime += t->se.sum_exec_runtime;
> >> >>
> >> >> t = next_thread(t);
>

That works for me and I agree that this is right fix. Peter had concerns
about p->prev_utime races and additional need for further propagation of
task_{s,u}time() to posix-cpu-timers code. However I do not understand
these problems.

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


balbir at linux

Nov 23, 2009, 2:25 AM

Post #14 of 16 (278 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

* Stanislaw Gruszka <sgruszka [at] redhat> [2009-11-23 11:09:26]:
[snip]
> +
> + utime = task_utime(tsk);
> + stime = task_stime(tsk);
> + if (tsk->utime > utime || tsk->stime > stime) {
> + utime = tsk->utime;
> + stime = tsk->stime;
> + }

Won't this condition be problematic, since it can reset stime
when tsk->utime > utime for example?
--
Balbir
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


sgruszka at redhat

Nov 23, 2009, 2:46 AM

Post #15 of 16 (279 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

On Mon, 23 Nov 2009 15:55:28 +0530
Balbir Singh <balbir [at] linux> wrote:

> * Stanislaw Gruszka <sgruszka [at] redhat> [2009-11-23 11:09:26]:
> [snip]
> > +
> > + utime = task_utime(tsk);
> > + stime = task_stime(tsk);
> > + if (tsk->utime > utime || tsk->stime > stime) {
> > + utime = tsk->utime;
> > + stime = tsk->stime;
> > + }
>
> Won't this condition be problematic, since it can reset stime
> when tsk->utime > utime for example?

We use both {u,s}time adjusted values or both not adjusted values.
This seems to be right to me. Adjusting not help anyway, if we have
stime/utime ticks disproportion, e.g. task only gets stime ticks but
not gets utime ticks, but it runs in user and kernel space in
the same amount of time.

If this solution whold be accepted, patch will contain additional
comment.

Stanislaw
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


seto.hidetoshi at jp

Nov 23, 2009, 9:33 PM

Post #16 of 16 (278 views)
Permalink
Re: [PATCH] fix granularity of task_u/stime(), v2 [In reply to]

Stanislaw Gruszka wrote:
> On Fri, Nov 20, 2009 at 11:00:21AM +0900, Hidetoshi Seto wrote:
>> E.g. assume that there are 2 tasks:
>>
>> Task A: interrupted by timer few times
>> (utime, stime, se.sum_sched_runtime) = (50, 50, 1000000000)
>> => total of runtime is 1 sec, but utime + stime is 100 ms
>>
>> Task B: interrupted by timer many times
>> (utime, stime, se.sum_sched_runtime) = (50, 50, 10000000)
>> => total of runtime is 10 ms, but utime + stime is 100 ms
>
> How tis is probable, that task is running very long, but not getting
> the ticks ? I know this is possible, otherwise we will not see utime
> decreasing after do_sys_times() siglock fix, but how probable?

For example, assume a task like watchdog that calls sleep soon after
its work. Such task will be woken up by a timer interrupt on other
task and queued to run queue. Once it get a cpu it can finish its
work before next tick. So it can run long without getting any ticks
on it. I suppose you can find such tasks in monitoring tool which
contains sampling threads that behaves like watchdog.

As the side effect, since such tasks tend to use cpu between tick
period, they make other tasks to more likely be interrupted by ticks.

>>>> diff --git a/kernel/posix-cpu-timers.c b/kernel/posix-cpu-timers.c
>>>>>> index 5c9dc22..e065b8a 100644
>>>>>> --- a/kernel/posix-cpu-timers.c
>>>>>> +++ b/kernel/posix-cpu-timers.c
>>>>>> @@ -248,8 +248,8 @@ void thread_group_cputime(struct task_struct *tsk, struct task_cputime *times)
>>>>>>
>>>>>> t = tsk;
>>>>>> do {
>>>>>> - times->utime = cputime_add(times->utime, t->utime);
>>>>>> - times->stime = cputime_add(times->stime, t->stime);
>>>>>> + times->utime = cputime_add(times->utime, task_utime(t));
>>>>>> + times->stime = cputime_add(times->stime, task_stime(t));
>>>>>> times->sum_exec_runtime += t->se.sum_exec_runtime;
>>>>>>
>>>>>> t = next_thread(t);
>
> That works for me and I agree that this is right fix. Peter had concerns
> about p->prev_utime races and additional need for further propagation of
> task_{s,u}time() to posix-cpu-timers code. However I do not understand
> these problems.

I think that one of our concerns is the cost of task_{s,u}time(), which
might bring other problem if they are propagated. But I found we can reduce
the cost (about the half, or more), that is why I posted task_times() patch
in other thread in LKML.


Thanks,
H.Seto

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.