Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

[PATCH] slow_work_thread() should do the exclusive wait

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


oleg at redhat

Apr 13, 2009, 11:17 AM

Post #1 of 13 (367 views)
Permalink
[PATCH] slow_work_thread() should do the exclusive wait

slow_work_thread() sleeps on slow_work_thread_wq without WQ_FLAG_EXCLUSIVE,
this means that slow_work_enqueue()->__wake_up(nr_exclusive => 1) wakes up
all kslowd threads. Afaics this is not what we want, change slow_work_thread()
to use prepare_to_wait_exclusive().

Signed-off-by: Oleg Nesterov <oleg [at] redhat>

--- 6.30/kernel/slow-work.c~1_SW_EXCLUSIVE 2009-04-06 00:03:42.000000000 +0200
+++ 6.30/kernel/slow-work.c 2009-04-13 19:40:20.000000000 +0200
@@ -372,8 +372,8 @@ static int slow_work_thread(void *_data)
vsmax *= atomic_read(&slow_work_thread_count);
vsmax /= 100;

- prepare_to_wait(&slow_work_thread_wq, &wait,
- TASK_INTERRUPTIBLE);
+ prepare_to_wait_exclusive(&slow_work_thread_wq, &wait,
+ TASK_INTERRUPTIBLE);
if (!freezing(current) &&
!slow_work_threads_should_exit &&
!slow_work_available(vsmax) &&

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


Trond.Myklebust at netapp

Apr 13, 2009, 12:03 PM

Post #2 of 13 (345 views)
Permalink
Re: [PATCH] slow_work_thread() should do the exclusive wait [In reply to]

On Mon, 2009-04-13 at 20:17 +0200, Oleg Nesterov wrote:
> slow_work_thread() sleeps on slow_work_thread_wq without WQ_FLAG_EXCLUSIVE,
> this means that slow_work_enqueue()->__wake_up(nr_exclusive => 1) wakes up
> all kslowd threads. Afaics this is not what we want, change slow_work_thread()
> to use prepare_to_wait_exclusive().
>
> Signed-off-by: Oleg Nesterov <oleg [at] redhat>
>
> --- 6.30/kernel/slow-work.c~1_SW_EXCLUSIVE 2009-04-06 00:03:42.000000000 +0200
> +++ 6.30/kernel/slow-work.c 2009-04-13 19:40:20.000000000 +0200
> @@ -372,8 +372,8 @@ static int slow_work_thread(void *_data)
> vsmax *= atomic_read(&slow_work_thread_count);
> vsmax /= 100;
>
> - prepare_to_wait(&slow_work_thread_wq, &wait,
> - TASK_INTERRUPTIBLE);
> + prepare_to_wait_exclusive(&slow_work_thread_wq, &wait,
> + TASK_INTERRUPTIBLE);
> if (!freezing(current) &&
> !slow_work_threads_should_exit &&
> !slow_work_available(vsmax) &&
>

Should that really be TASK_INTERRUPTIBLE? I don't see anything obvious
in the enclosing for(;;) loop that checks for or handles signals...

Cheers
Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust [at] netapp
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


oleg at redhat

Apr 13, 2009, 12:14 PM

Post #3 of 13 (343 views)
Permalink
Re: [PATCH] slow_work_thread() should do the exclusive wait [In reply to]

On 04/13, Trond Myklebust wrote:
>
> On Mon, 2009-04-13 at 20:17 +0200, Oleg Nesterov wrote:
> > slow_work_thread() sleeps on slow_work_thread_wq without WQ_FLAG_EXCLUSIVE,
> > this means that slow_work_enqueue()->__wake_up(nr_exclusive => 1) wakes up
> > all kslowd threads. Afaics this is not what we want, change slow_work_thread()
> > to use prepare_to_wait_exclusive().
> >
> > Signed-off-by: Oleg Nesterov <oleg [at] redhat>
> >
> > --- 6.30/kernel/slow-work.c~1_SW_EXCLUSIVE 2009-04-06 00:03:42.000000000 +0200
> > +++ 6.30/kernel/slow-work.c 2009-04-13 19:40:20.000000000 +0200
> > @@ -372,8 +372,8 @@ static int slow_work_thread(void *_data)
> > vsmax *= atomic_read(&slow_work_thread_count);
> > vsmax /= 100;
> >
> > - prepare_to_wait(&slow_work_thread_wq, &wait,
> > - TASK_INTERRUPTIBLE);
> > + prepare_to_wait_exclusive(&slow_work_thread_wq, &wait,
> > + TASK_INTERRUPTIBLE);
> > if (!freezing(current) &&
> > !slow_work_threads_should_exit &&
> > !slow_work_available(vsmax) &&
> >
>
> Should that really be TASK_INTERRUPTIBLE? I don't see anything obvious
> in the enclosing for(;;) loop that checks for or handles signals...

I guess TASK_INTERRUPTIBLE was chosen to not contribute to calc_load(),
nr_active() returns nr_running + nr_uninterruptible.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dhowells at redhat

Apr 13, 2009, 2:35 PM

Post #4 of 13 (335 views)
Permalink
Re: [PATCH] slow_work_thread() should do the exclusive wait [In reply to]

Oleg Nesterov <oleg [at] redhat> wrote:

> slow_work_thread() sleeps on slow_work_thread_wq without WQ_FLAG_EXCLUSIVE,
> this means that slow_work_enqueue()->__wake_up(nr_exclusive => 1) wakes up
> all kslowd threads. Afaics this is not what we want, change slow_work_thread()
> to use prepare_to_wait_exclusive().

Hmmm... I think you may be right. I think I was assuming that wake_up()
would only wake up the first item on the queue, but that's not strictly what
it does...

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dhowells at redhat

Apr 13, 2009, 2:40 PM

Post #5 of 13 (333 views)
Permalink
Re: [PATCH] slow_work_thread() should do the exclusive wait [In reply to]

Trond Myklebust <Trond.Myklebust [at] netapp> wrote:

> Should that really be TASK_INTERRUPTIBLE? I don't see anything obvious
> in the enclosing for(;;) loop that checks for or handles signals...

If it were TASK_UNINTERRUPTIBLE, it would sit there in the D-state when not
doing anything. I must admit, I thought I was calling daemonize(), but that
seems to have got lost somewhere.

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


oleg at redhat

Apr 13, 2009, 2:48 PM

Post #6 of 13 (338 views)
Permalink
Re: [PATCH] slow_work_thread() should do the exclusive wait [In reply to]

On 04/13, David Howells wrote:
>
> Trond Myklebust <Trond.Myklebust [at] netapp> wrote:
>
> > Should that really be TASK_INTERRUPTIBLE? I don't see anything obvious
> > in the enclosing for(;;) loop that checks for or handles signals...
>
> If it were TASK_UNINTERRUPTIBLE, it would sit there in the D-state when not
> doing anything. I must admit, I thought I was calling daemonize(), but that
> seems to have got lost somewhere.

daemonize() is not needed, kthread_create() creates the kernel thread which
ignores all signals. So it doesn't matter which state we use to sleep,
TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


Trond.Myklebust at netapp

Apr 13, 2009, 2:57 PM

Post #7 of 13 (336 views)
Permalink
Re: [PATCH] slow_work_thread() should do the exclusive wait [In reply to]

On Mon, 2009-04-13 at 23:48 +0200, Oleg Nesterov wrote:
> On 04/13, David Howells wrote:
> >
> > Trond Myklebust <Trond.Myklebust [at] netapp> wrote:
> >
> > > Should that really be TASK_INTERRUPTIBLE? I don't see anything obvious
> > > in the enclosing for(;;) loop that checks for or handles signals...
> >
> > If it were TASK_UNINTERRUPTIBLE, it would sit there in the D-state when not
> > doing anything. I must admit, I thought I was calling daemonize(), but that
> > seems to have got lost somewhere.
>
> daemonize() is not needed, kthread_create() creates the kernel thread which
> ignores all signals. So it doesn't matter which state we use to sleep,
> TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE.

Yes, but that is precisely why it is cleaner to use
TASK_UNINTERRUPTIBLE. It documents the fact that signal handling isn't
needed (whether or not the thread is blocking them).

Trond
--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust [at] netapp
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


oleg at redhat

Apr 13, 2009, 3:24 PM

Post #8 of 13 (339 views)
Permalink
Re: [PATCH] slow_work_thread() should do the exclusive wait [In reply to]

On 04/13, Trond Myklebust wrote:
>
> On Mon, 2009-04-13 at 23:48 +0200, Oleg Nesterov wrote:
> > On 04/13, David Howells wrote:
> > >
> > > Trond Myklebust <Trond.Myklebust [at] netapp> wrote:
> > >
> > > > Should that really be TASK_INTERRUPTIBLE? I don't see anything obvious
> > > > in the enclosing for(;;) loop that checks for or handles signals...
> > >
> > > If it were TASK_UNINTERRUPTIBLE, it would sit there in the D-state when not
> > > doing anything. I must admit, I thought I was calling daemonize(), but that
> > > seems to have got lost somewhere.
> >
> > daemonize() is not needed, kthread_create() creates the kernel thread which
> > ignores all signals. So it doesn't matter which state we use to sleep,
> > TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE.
>
> Yes, but that is precisely why it is cleaner to use
> TASK_UNINTERRUPTIBLE. It documents the fact that signal handling isn't
> needed (whether or not the thread is blocking them).

Agreed. But TASK_UNINTERRUPTIBLE can confuse a user which does
"cat /proc/loadavg" on the idle machine...

Note that, for example, worker_thread() uses TASK_INTERRUPTIBLE too, and I
think for the same reason.

I dunno.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


akpm at linux-foundation

Apr 15, 2009, 4:27 PM

Post #9 of 13 (316 views)
Permalink
Re: [PATCH] slow_work_thread() should do the exclusive wait [In reply to]

On Tue, 14 Apr 2009 00:24:51 +0200
Oleg Nesterov <oleg [at] redhat> wrote:

> On 04/13, Trond Myklebust wrote:
> >
> > On Mon, 2009-04-13 at 23:48 +0200, Oleg Nesterov wrote:
> > > On 04/13, David Howells wrote:
> > > >
> > > > Trond Myklebust <Trond.Myklebust [at] netapp> wrote:
> > > >
> > > > > Should that really be TASK_INTERRUPTIBLE? I don't see anything obvious
> > > > > in the enclosing for(;;) loop that checks for or handles signals...
> > > >
> > > > If it were TASK_UNINTERRUPTIBLE, it would sit there in the D-state when not
> > > > doing anything. I must admit, I thought I was calling daemonize(), but that
> > > > seems to have got lost somewhere.
> > >
> > > daemonize() is not needed, kthread_create() creates the kernel thread which
> > > ignores all signals. So it doesn't matter which state we use to sleep,
> > > TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE.
> >
> > Yes, but that is precisely why it is cleaner to use
> > TASK_UNINTERRUPTIBLE. It documents the fact that signal handling isn't
> > needed (whether or not the thread is blocking them).
>
> Agreed. But TASK_UNINTERRUPTIBLE can confuse a user which does
> "cat /proc/loadavg" on the idle machine...
>
> Note that, for example, worker_thread() uses TASK_INTERRUPTIBLE too, and I
> think for the same reason.
>

Yup. It's a very common pattern for kernel threads to sleep in state
TASK_INTERRUPTIBLE. It is "well known" (lol) that kernel threads don't
accept signals, and having a kernel thread sleep in state
TASK_UNINTERRUPTIBLE will indeed contribute to load average and we get
distressed emails quite promptly when we do that.

The patch itself is a little worrisome. The wake-all semantics are
very good at covering up little race bugs. And switching to wake-once
is a great way of exposing hitherto-unsuspected races.

<looks for races>

Nothing immediately leaps out, but you know how these things are.

I wonder if slow_work_cull_timeout() should have some sort of barrier,
so the write is suitably visible to the woken thread. Bearing in mind
that the thread might _already_ have been woken by someone else?


off-topic: afacit the code will cull a maximum of one thread per five
seconds. But the rate of thread _creation_ is, afacit, unbound. Are
there scenarios in which we can get a runaway thread count?



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dhowells at redhat

Apr 16, 2009, 2:10 AM

Post #10 of 13 (317 views)
Permalink
Re: [PATCH] slow_work_thread() should do the exclusive wait [In reply to]

Andrew Morton <akpm [at] linux-foundation> wrote:

> The patch itself is a little worrisome. The wake-all semantics are
> very good at covering up little race bugs. And switching to wake-once
> is a great way of exposing hitherto-unsuspected races.

It's something I'm intending to test, once I get MN10300 working again (which
for some reason it isn't).

> I wonder if slow_work_cull_timeout() should have some sort of barrier,
> so the write is suitably visible to the woken thread.

That's an interesting question. Should wake_up() imply a barrier of any sort,
I wonder. Well, __wake_up() does impose a barrier as it uses a spinlock, but
I wonder if that's sufficient.

> Bearing in mind that the thread might _already_ have been woken by someone
> else?

If the thread is woken by someone else, there must be work for it to do, in
which case it wouldn't be culled anyway.

> off-topic: afacit the code will cull a maximum of one thread per five
> seconds. But the rate of thread _creation_ is, afacit, unbound. Are
> there scenarios in which we can get a runaway thread count?

The maximum number of threads is limited (slow_work_max_threads).

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


oleg at redhat

Apr 16, 2009, 7:33 AM

Post #11 of 13 (317 views)
Permalink
Re: [PATCH] slow_work_thread() should do the exclusive wait [In reply to]

On 04/16, David Howells wrote:
>
> Andrew Morton <akpm [at] linux-foundation> wrote:
>
> > I wonder if slow_work_cull_timeout() should have some sort of barrier,
> > so the write is suitably visible to the woken thread.
>
> That's an interesting question. Should wake_up() imply a barrier of any sort,
> I wonder. Well, __wake_up() does impose a barrier as it uses a spinlock, but
> I wonder if that's sufficient.

wake_up() does imply the barrier. Note the smp_wmb() in try_to_wake_up().
And in fact this wmb() implies mb(), because spin_lock() itself is STORE,
and the futher LOADs can't leak up before spin_lock().

But afaics, this doesn't matter? prepare_to_wait() sets task->state under
wait_queue_head_t->lock and wake_up() takes this look too, so we can't miss
the event.

Or I completely misunderstood the issue...

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


dhowells at redhat

Apr 23, 2009, 9:00 AM

Post #12 of 13 (288 views)
Permalink
Re: [PATCH] slow_work_thread() should do the exclusive wait [In reply to]

Andrew Morton <akpm [at] linux-foundation> wrote:

> I wonder if slow_work_cull_timeout() should have some sort of barrier,
> so the write is suitably visible to the woken thread. Bearing in mind
> that the thread might _already_ have been woken by someone else?

Perhaps the attached patch?

David
---
From: David Howells <dhowells [at] redhat>
Subject: [PATCH] slow_work_cull_timeout() should have a memory barrier

slow_work_cull_timeout() should have a write memory barrier so that the setting
of the cull flag is seen before the wakeup takes place. This is required
because wake_up() does not guarantee any memory barriership at all.

Concomitant to this, slow_work_thread() should have a read memory barrier
between its return from schedule() and its testing of slow_work_cull() as
finish_wait() isn't a guaranteed barrier either.

Signed-off-by: David Howells <dhowells [at] redhat>
---

kernel/slow-work.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)


diff --git a/kernel/slow-work.c b/kernel/slow-work.c
index 521ed20..96e418d 100644
--- a/kernel/slow-work.c
+++ b/kernel/slow-work.c
@@ -382,6 +382,7 @@ static int slow_work_thread(void *_data)
finish_wait(&slow_work_thread_wq, &wait);

try_to_freeze();
+ smp_rmb();

vsmax = vslow_work_proportion;
vsmax *= atomic_read(&slow_work_thread_count);
@@ -416,6 +417,7 @@ static int slow_work_thread(void *_data)
static void slow_work_cull_timeout(unsigned long data)
{
slow_work_cull = true;
+ smp_wmb();
wake_up(&slow_work_thread_wq);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


oleg at redhat

Apr 23, 2009, 9:18 AM

Post #13 of 13 (276 views)
Permalink
Re: [PATCH] slow_work_thread() should do the exclusive wait [In reply to]

(add Ingo)

On 04/23, David Howells wrote:
>
> Andrew Morton <akpm [at] linux-foundation> wrote:
>
> > I wonder if slow_work_cull_timeout() should have some sort of barrier,
> > so the write is suitably visible to the woken thread. Bearing in mind
> > that the thread might _already_ have been woken by someone else?
>
> Perhaps the attached patch?
>
> David
> ---
> From: David Howells <dhowells [at] redhat>
> Subject: [PATCH] slow_work_cull_timeout() should have a memory barrier
>
> slow_work_cull_timeout() should have a write memory barrier so that the setting
> of the cull flag is seen before the wakeup takes place. This is required
> because wake_up() does not guarantee any memory barriership at all.
>
> Concomitant to this, slow_work_thread() should have a read memory barrier
> between its return from schedule() and its testing of slow_work_cull() as
> finish_wait() isn't a guaranteed barrier either.
>
> Signed-off-by: David Howells <dhowells [at] redhat>
> ---
>
> kernel/slow-work.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
>
> diff --git a/kernel/slow-work.c b/kernel/slow-work.c
> index 521ed20..96e418d 100644
> --- a/kernel/slow-work.c
> +++ b/kernel/slow-work.c
> @@ -382,6 +382,7 @@ static int slow_work_thread(void *_data)
> finish_wait(&slow_work_thread_wq, &wait);
>
> try_to_freeze();
> + smp_rmb();
>
> vsmax = vslow_work_proportion;
> vsmax *= atomic_read(&slow_work_thread_count);
> @@ -416,6 +417,7 @@ static int slow_work_thread(void *_data)
> static void slow_work_cull_timeout(unsigned long data)
> {
> slow_work_cull = true;
> + smp_wmb();
> wake_up(&slow_work_thread_wq);
> }

Confused. If we need this barrier, a lot of similar code is broken.

slow_work_cull_timeout:

slow_work_cull = true;
wake_up(&slow_work_thread_wq);


slow_work_thread:

prepare_to_wait(&slow_work_thread_wq);
if (!slow_work_cull)
schedule();
finish_wait(&slow_work_thread_wq);

if (slow_work_cull)
.....

Both wake_up() and prepare_to_wait() take the same wait_queue_head_t->lock,
and prepare_to_wait() does set_current_state() under this lock.

How can we miss the event? If wake_up() happens before prepare_to_wait(),
slow_work_thread() must see slow_work_cull = T, otherwise the subsequent
wake_up() must see the result of list_add() + set_current_state() and
wake up the sleeping thread.

Could you please clarify?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.