Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly caused by netem)

 

 

First page Previous page 1 2 3 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded


andres at anarazel

Jul 2, 2009, 6:31 PM

Post #1 of 52 (832 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly caused by netem)

On 07/02/2009 01:59 PM, Andres Freund wrote:
> On 07/02/2009 01:54 PM, Jarek Poplawski wrote:
>> On Thu, Jul 02, 2009 at 01:43:49PM +0200, Andres Freund wrote: ...
>>> I will start trying to place the issue by testing with existing
>>> kernels between 2.6.30 and now.
>> If you can afford your time of course this would be very helpful.
> Well. Waiting for the issue to resolve itself would cost time as well
> ;-) I wont be able to finish this today, but perhaps some reduction
> of the search space will be enough.
I lied.

> I placed it between 2.6.30 and
> 03347e2592078a90df818670fddf97a33eec70fb (v2.6.30-5415-g03347e2) so
> far.
Ok. I finally see the light. I bisected the issue down to
eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
pinned timers

Disabling timer migration like provided in the earlier commit stops the issue
from occuring.

That it is related to timers is sensible in the light of my findings, that I
could trigger the issue only when using delay in netem - that is the codepath
using qdisc_watchdog...

Andres

Repasted original problem description for newly CC'ed people:
> While playing around with netem (time, not packet count based loss-
> bursts) I experienced soft lockups several times - to exclude it was
> my modifications causing this I recompiled with the original and it
> is still locking up. I captured several of those traces via the
> thankfully still working netconsole. The simplest policy I could
> reproduce the error with was: tc qdisc add dev eth0 root handle 1:
> netem delay 10ms loss 0
>
> I could not reproduce the error without delay - but that may only be
> a timing issue, as the host I was mainly transferring data to was on
> a local network. I could not reproduce the issue on lo.
>
> The time to reproduce the error varied from seconds after executing
> tc to several minutes.
>
> Traces 5+6 are made with vanilla
> 52989765629e7d182b4f146050ebba0abf2cb0b7
>
> The earlier traces are made with parts of my patches applied, and
> only included for completeness as I don't believe my modifications
> were causing this and all traces are different, so it may give some
> clues.
>
> Lockdep was enabled but did not diagnose anything relevant (one dvb
> warning during bootup).
>
> Any ideas for debugging?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jarkao2 at gmail

Jul 2, 2009, 11:12 PM

Post #2 of 52 (804 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
...
> Ok. I finally see the light. I bisected the issue down to
> eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
> pinned timers
>
> Disabling timer migration like provided in the earlier commit stops the
> issue from occuring.
>
> That it is related to timers is sensible in the light of my findings,
> that I could trigger the issue only when using delay in netem - that is
> the codepath using qdisc_watchdog...

Andres, thanks for your work and time. It saved me a lot of searching,
because I wasn't able to trigger this on my old box.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


andres at anarazel

Jul 3, 2009, 4:26 AM

Post #3 of 52 (801 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
> On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
> ...
>
> > Ok. I finally see the light. I bisected the issue down to
> > eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
> > pinned timers
> >
> > Disabling timer migration like provided in the earlier commit stops the
> > issue from occuring.
> >
> > That it is related to timers is sensible in the light of my findings,
> > that I could trigger the issue only when using delay in netem - that is
> > the codepath using qdisc_watchdog...
>
> Andres, thanks for your work and time. It saved me a lot of searching,
> because I wasn't able to trigger this on my old box.
Thanks. It allowed me to go through some of my remaining paperwork ;-)

Does anybody of you have an idea where the problem actually resides?
qdisc_watchdog_schedule looks innocent enough for my uneducated eyes - and the
patch/infrastructure from Arun goes over my head...
I will happily test some ideas/patches.

Aside from that - is the whole PSCHED_TICKS2NS/PSCHED_NS2TICKS conversion
business purely backward compatibility?

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jarkao2 at gmail

Jul 3, 2009, 5:03 AM

Post #4 of 52 (797 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Fri, Jul 03, 2009 at 01:26:21PM +0200, Andres Freund wrote:
> On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
> > On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
> > ...
> >
> > > Ok. I finally see the light. I bisected the issue down to
> > > eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
> > > pinned timers
> > >
> > > Disabling timer migration like provided in the earlier commit stops the
> > > issue from occuring.
> > >
> > > That it is related to timers is sensible in the light of my findings,
> > > that I could trigger the issue only when using delay in netem - that is
> > > the codepath using qdisc_watchdog...
> >
> > Andres, thanks for your work and time. It saved me a lot of searching,
> > because I wasn't able to trigger this on my old box.
> Thanks. It allowed me to go through some of my remaining paperwork ;-)
>
> Does anybody of you have an idea where the problem actually resides?

Do you mean possibly broken timers are not enough?

> qdisc_watchdog_schedule looks innocent enough for my uneducated eyes - and the
> patch/infrastructure from Arun goes over my head...
> I will happily test some ideas/patches.
>
> Aside from that - is the whole PSCHED_TICKS2NS/PSCHED_NS2TICKS conversion
> business purely backward compatibility?

The whole PSCHED_ conversion was to get finer resolution without
breaking backward compatibility, I hope.;-)

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


andres at anarazel

Jul 3, 2009, 5:30 AM

Post #5 of 52 (812 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Friday 03 July 2009 14:03:01 Jarek Poplawski wrote:
> On Fri, Jul 03, 2009 at 01:26:21PM +0200, Andres Freund wrote:
> > On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
> > > On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
> > > ...
> > >
> > > > Ok. I finally see the light. I bisected the issue down to
> > > > eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
> > > > pinned timers
> > > >
> > > > Disabling timer migration like provided in the earlier commit stops
> > > > the issue from occuring.
> > > >
> > > > That it is related to timers is sensible in the light of my findings,
> > > > that I could trigger the issue only when using delay in netem - that
> > > > is the codepath using qdisc_watchdog...
> > > Andres, thanks for your work and time. It saved me a lot of searching,
> > > because I wasn't able to trigger this on my old box.
> > Thanks. It allowed me to go through some of my remaining paperwork ;-)
> > Does anybody of you have an idea where the problem actually resides?
> Do you mean possibly broken timers are not enough?
I have no ideas how/if the timers are actually broken or if the problem does
reside somewhere else and is just made visible by the timer changes.

I would have expected more problem with completely borked timers ;-)


> > Aside from that - is the whole PSCHED_TICKS2NS/PSCHED_NS2TICKS conversion
> > business purely backward compatibility?
> The whole PSCHED_ conversion was to get finer resolution without
> breaking backward compatibility, I hope.;-)
I haven't seen any problems - just curious ;-)

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


davem at davemloft

Jul 3, 2009, 1:22 PM

Post #6 of 52 (793 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

From: Jarek Poplawski <jarkao2 [at] gmail>
Date: Fri, 3 Jul 2009 12:03:01 +0000

> On Fri, Jul 03, 2009 at 01:26:21PM +0200, Andres Freund wrote:
>> On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
>> > On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
>> > ...
>> >
>> > > Ok. I finally see the light. I bisected the issue down to
>> > > eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
>> > > pinned timers
>> > >
>> > > Disabling timer migration like provided in the earlier commit stops the
>> > > issue from occuring.
>> > >
>> > > That it is related to timers is sensible in the light of my findings,
>> > > that I could trigger the issue only when using delay in netem - that is
>> > > the codepath using qdisc_watchdog...
>> >
>> > Andres, thanks for your work and time. It saved me a lot of searching,
>> > because I wasn't able to trigger this on my old box.
>> Thanks. It allowed me to go through some of my remaining paperwork ;-)
>>
>> Does anybody of you have an idea where the problem actually resides?
>
> Do you mean possibly broken timers are not enough?

Well, if you look at that commit the bisect pointed to Jarek, it is a
change which starts causing a situation which never happened before.
Namely, timers added on one cpu can be migrated and fire on another.

So this could be exposing races in the networking that technically
always existed.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jarkao2 at gmail

Jul 3, 2009, 3:56 PM

Post #7 of 52 (791 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Fri, Jul 03, 2009 at 01:22:20PM -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2 [at] gmail>
> Date: Fri, 3 Jul 2009 12:03:01 +0000
>
> > On Fri, Jul 03, 2009 at 01:26:21PM +0200, Andres Freund wrote:
> >> On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
> >> > On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
> >> > ...
> >> >
> >> > > Ok. I finally see the light. I bisected the issue down to
> >> > > eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
> >> > > pinned timers
> >> > >
> >> > > Disabling timer migration like provided in the earlier commit stops the
> >> > > issue from occuring.
> >> > >
> >> > > That it is related to timers is sensible in the light of my findings,
> >> > > that I could trigger the issue only when using delay in netem - that is
> >> > > the codepath using qdisc_watchdog...
> >> >
> >> > Andres, thanks for your work and time. It saved me a lot of searching,
> >> > because I wasn't able to trigger this on my old box.
> >> Thanks. It allowed me to go through some of my remaining paperwork ;-)
> >>
> >> Does anybody of you have an idea where the problem actually resides?
> >
> > Do you mean possibly broken timers are not enough?
>
> Well, if you look at that commit the bisect pointed to Jarek, it is a
> change which starts causing a situation which never happened before.
> Namely, timers added on one cpu can be migrated and fire on another.
>
> So this could be exposing races in the networking that technically
> always existed.

I'm not sure I get your point; could you give some example?
Actually, I've suspected races in timers code.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


davem at davemloft

Jul 3, 2009, 6:55 PM

Post #8 of 52 (788 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

From: Jarek Poplawski <jarkao2 [at] gmail>
Date: Sat, 4 Jul 2009 00:56:40 +0200

> On Fri, Jul 03, 2009 at 01:22:20PM -0700, David Miller wrote:
>> Well, if you look at that commit the bisect pointed to Jarek, it is a
>> change which starts causing a situation which never happened before.
>> Namely, timers added on one cpu can be migrated and fire on another.
>>
>> So this could be exposing races in the networking that technically
>> always existed.
>
> I'm not sure I get your point; could you give some example?
> Actually, I've suspected races in timers code.

Let's say that a particular networking timer always gets
re-added on the cpu where the timer fires.

In that case, beforehand, no inter-cpu races could possibly
be tested. But with the new timer code, such races could
now be potentially triggered.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jarkao2 at gmail

Jul 3, 2009, 11:36 PM

Post #9 of 52 (784 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Fri, Jul 03, 2009 at 06:55:53PM -0700, David Miller wrote:
> From: Jarek Poplawski <jarkao2 [at] gmail>
> Date: Sat, 4 Jul 2009 00:56:40 +0200
>
> > On Fri, Jul 03, 2009 at 01:22:20PM -0700, David Miller wrote:
> >> Well, if you look at that commit the bisect pointed to Jarek, it is a
> >> change which starts causing a situation which never happened before.
> >> Namely, timers added on one cpu can be migrated and fire on another.
> >>
> >> So this could be exposing races in the networking that technically
> >> always existed.
> >
> > I'm not sure I get your point; could you give some example?
> > Actually, I've suspected races in timers code.
>
> Let's say that a particular networking timer always gets
> re-added on the cpu where the timer fires.
>
> In that case, beforehand, no inter-cpu races could possibly
> be tested. But with the new timer code, such races could
> now be potentially triggered.

Maybe I still miss something, but even if it were possible, lockdep
should have reported such things long ago.

Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jarkao2 at gmail

Jul 4, 2009, 8:18 AM

Post #10 of 52 (777 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

Andres Freund wrote, On 07/03/2009 01:26 PM:

> On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
>> On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
>> ...
>>
>>> Ok. I finally see the light. I bisected the issue down to
>>> eea08f32adb3f97553d49a4f79a119833036000a : timers: Logic to move non
>>> pinned timers
>>>
>>> Disabling timer migration like provided in the earlier commit stops the
>>> issue from occuring.
>>>
>>> That it is related to timers is sensible in the light of my findings,
>>> that I could trigger the issue only when using delay in netem - that is
>>> the codepath using qdisc_watchdog...
>> Andres, thanks for your work and time. It saved me a lot of searching,
>> because I wasn't able to trigger this on my old box.
> Thanks. It allowed me to go through some of my remaining paperwork ;-)
>
> Does anybody of you have an idea where the problem actually resides?
> qdisc_watchdog_schedule looks innocent enough for my uneducated eyes - and the
> patch/infrastructure from Arun goes over my head...
> I will happily test some ideas/patches.

Since there are still no 100% proofs nor suspects, here are some
suggestions of additional checking. This bisected commit could
probably be additionally verified by applying to 2.6.30 with a
preceding one; I attach both of them below.

Another suggestion is to try this without lockdep e.g. by setting
/sys/module/lockdep/parameters/lock_stat and prove_locking to 0.

Thanks,
Jarek P.

--------------->
commit 597d0275736dad9c3bda6f0a00a1c477dc0f37b1
Author: Arun R Bharadwaj <arun [at] linux>
Date: Thu Apr 16 12:13:26 2009 +0530

timers: Framework for identifying pinned timers

* Arun R Bharadwaj <arun [at] linux> [2009-04-16 12:11:36]:

This patch creates a new framework for identifying cpu-pinned timers
and hrtimers.

This framework is needed because pinned timers are expected to fire on
the same CPU on which they are queued. So it is essential to identify
these and not migrate them, in case there are any.

For regular timers, the currently existing add_timer_on() can be used
queue pinned timers and subsequently mod_timer_pinned() can be used
to modify the 'expires' field.

For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are
added to queue cpu-pinned hrtimer.

[. tglx: use .._PINNED mode argument instead of creating tons of new
functions ]

Signed-off-by: Arun R Bharadwaj <arun [at] linux>
Signed-off-by: Thomas Gleixner <tglx [at] linutronix>

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 0d2f7c8..7400900 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -30,8 +30,11 @@ struct hrtimer_cpu_base;
* Mode arguments of xxx_hrtimer functions:
*/
enum hrtimer_mode {
- HRTIMER_MODE_ABS, /* Time value is absolute */
- HRTIMER_MODE_REL, /* Time value is relative to now */
+ HRTIMER_MODE_ABS = 0x0, /* Time value is absolute */
+ HRTIMER_MODE_REL = 0x1, /* Time value is relative to now */
+ HRTIMER_MODE_PINNED = 0x02, /* Timer is bound to CPU */
+ HRTIMER_MODE_ABS_PINNED = 0x02,
+ HRTIMER_MODE_REL_PINNED = 0x03,
};

/*
diff --git a/include/linux/timer.h b/include/linux/timer.h
index 6cdb6f3..ccf882e 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -163,7 +163,10 @@ extern void add_timer_on(struct timer_list *timer, int cpu);
extern int del_timer(struct timer_list * timer);
extern int mod_timer(struct timer_list *timer, unsigned long expires);
extern int mod_timer_pending(struct timer_list *timer, unsigned long expires);
+extern int mod_timer_pinned(struct timer_list *timer, unsigned long expires);

+#define TIMER_NOT_PINNED 0
+#define TIMER_PINNED 1
/*
* The jiffies value which is added to now, when there is no timer
* in the timer wheel:
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index cb8a15c..c71bcd5 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -193,7 +193,8 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
* Switch the timer base to the current CPU when possible.
*/
static inline struct hrtimer_clock_base *
-switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base)
+switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
+ int pinned)
{
struct hrtimer_clock_base *new_base;
struct hrtimer_cpu_base *new_cpu_base;
@@ -907,9 +908,9 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
ret = remove_hrtimer(timer, base);

/* Switch the timer base, if necessary: */
- new_base = switch_hrtimer_base(timer, base);
+ new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);

- if (mode == HRTIMER_MODE_REL) {
+ if (mode & HRTIMER_MODE_REL) {
tim = ktime_add_safe(tim, new_base->get_time());
/*
* CONFIG_TIME_LOW_RES is a temporary way for architectures
diff --git a/kernel/timer.c b/kernel/timer.c
index 5c1e84b..3424dfd 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -604,7 +604,8 @@ static struct tvec_base *lock_timer_base(struct timer_list *timer,
}

static inline int
-__mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
+__mod_timer(struct timer_list *timer, unsigned long expires,
+ bool pending_only, int pinned)
{
struct tvec_base *base, *new_base;
unsigned long flags;
@@ -668,7 +669,7 @@ out_unlock:
*/
int mod_timer_pending(struct timer_list *timer, unsigned long expires)
{
- return __mod_timer(timer, expires, true);
+ return __mod_timer(timer, expires, true, TIMER_NOT_PINNED);
}
EXPORT_SYMBOL(mod_timer_pending);

@@ -702,11 +703,33 @@ int mod_timer(struct timer_list *timer, unsigned long expires)
if (timer->expires == expires && timer_pending(timer))
return 1;

- return __mod_timer(timer, expires, false);
+ return __mod_timer(timer, expires, false, TIMER_NOT_PINNED);
}
EXPORT_SYMBOL(mod_timer);

/**
+ * mod_timer_pinned - modify a timer's timeout
+ * @timer: the timer to be modified
+ * @expires: new timeout in jiffies
+ *
+ * mod_timer_pinned() is a way to update the expire field of an
+ * active timer (if the timer is inactive it will be activated)
+ * and not allow the timer to be migrated to a different CPU.
+ *
+ * mod_timer_pinned(timer, expires) is equivalent to:
+ *
+ * del_timer(timer); timer->expires = expires; add_timer(timer);
+ */
+int mod_timer_pinned(struct timer_list *timer, unsigned long expires)
+{
+ if (timer->expires == expires && timer_pending(timer))
+ return 1;
+
+ return __mod_timer(timer, expires, false, TIMER_PINNED);
+}
+EXPORT_SYMBOL(mod_timer_pinned);
+
+/**
* add_timer - start a timer
* @timer: the timer to be added
*
@@ -1356,7 +1379,7 @@ signed long __sched schedule_timeout(signed long timeout)
expire = timeout + jiffies;

setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
- __mod_timer(&timer, expire, false);
+ __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
schedule();
del_singleshot_timer_sync(&timer);


commit eea08f32adb3f97553d49a4f79a119833036000a
Author: Arun R Bharadwaj <arun [at] linux>
Date: Thu Apr 16 12:16:41 2009 +0530

timers: Logic to move non pinned timers

* Arun R Bharadwaj <arun [at] linux> [2009-04-16 12:11:36]:

This patch migrates all non pinned timers and hrtimers to the current
idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
are not migrated.

While migrating hrtimers, care should be taken to check if migrating
a hrtimer would result in a latency or not. So we compare the expiry of the
hrtimer with the next timer interrupt on the target cpu and migrate the
hrtimer only if it expires *after* the next interrupt on the target cpu.
So, added a clockevents_get_next_event() helper function to return the
next_event on the target cpu's clock_event_device.

[ tglx: cleanups and simplifications ]

Signed-off-by: Arun R Bharadwaj <arun [at] linux>
Signed-off-by: Thomas Gleixner <tglx [at] linutronix>

diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
index 3a1dbba..20a100f 100644
--- a/include/linux/clockchips.h
+++ b/include/linux/clockchips.h
@@ -143,3 +143,12 @@ extern void clockevents_notify(unsigned long reason, void *arg);
#endif

#endif
+
+#ifdef CONFIG_GENERIC_CLOCKEVENTS
+extern ktime_t clockevents_get_next_event(int cpu);
+#else
+static inline ktime_t clockevents_get_next_event(int cpu)
+{
+ return (ktime_t) { .tv64 = KTIME_MAX };
+}
+#endif
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 6185040..311dec1 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -257,6 +257,7 @@ extern void task_rq_unlock_wait(struct task_struct *p);
extern cpumask_var_t nohz_cpu_mask;
#if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
extern int select_nohz_load_balancer(int cpu);
+extern int get_nohz_load_balancer(void);
#else
static inline int select_nohz_load_balancer(int cpu)
{
@@ -1772,6 +1773,17 @@ int sched_nr_latency_handler(struct ctl_table *table, int write,
struct file *file, void __user *buffer, size_t *length,
loff_t *ppos);
#endif
+#ifdef CONFIG_SCHED_DEBUG
+static inline unsigned int get_sysctl_timer_migration(void)
+{
+ return sysctl_timer_migration;
+}
+#else
+static inline unsigned int get_sysctl_timer_migration(void)
+{
+ return 1;
+}
+#endif
extern unsigned int sysctl_sched_rt_period;
extern int sysctl_sched_rt_runtime;

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index c71bcd5..b675a67 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -43,6 +43,8 @@
#include <linux/seq_file.h>
#include <linux/err.h>
#include <linux/debugobjects.h>
+#include <linux/sched.h>
+#include <linux/timer.h>

#include <asm/uaccess.h>

@@ -198,8 +200,19 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
{
struct hrtimer_clock_base *new_base;
struct hrtimer_cpu_base *new_cpu_base;
+ int cpu, preferred_cpu = -1;
+
+ cpu = smp_processor_id();
+#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+ if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
+ preferred_cpu = get_nohz_load_balancer();
+ if (preferred_cpu >= 0)
+ cpu = preferred_cpu;
+ }
+#endif

- new_cpu_base = &__get_cpu_var(hrtimer_bases);
+again:
+ new_cpu_base = &per_cpu(hrtimer_bases, cpu);
new_base = &new_cpu_base->clock_base[base->index];

if (base != new_base) {
@@ -219,6 +232,40 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
timer->base = NULL;
spin_unlock(&base->cpu_base->lock);
spin_lock(&new_base->cpu_base->lock);
+
+ /* Optimized away for NOHZ=n SMP=n */
+ if (cpu == preferred_cpu) {
+ /* Calculate clock monotonic expiry time */
+#ifdef CONFIG_HIGH_RES_TIMERS
+ ktime_t expires = ktime_sub(hrtimer_get_expires(timer),
+ new_base->offset);
+#else
+ ktime_t expires = hrtimer_get_expires(timer);
+#endif
+
+ /*
+ * Get the next event on target cpu from the
+ * clock events layer.
+ * This covers the highres=off nohz=on case as well.
+ */
+ ktime_t next = clockevents_get_next_event(cpu);
+
+ ktime_t delta = ktime_sub(expires, next);
+
+ /*
+ * We do not migrate the timer when it is expiring
+ * before the next event on the target cpu because
+ * we cannot reprogram the target cpu hardware and
+ * we would cause it to fire late.
+ */
+ if (delta.tv64 < 0) {
+ cpu = smp_processor_id();
+ spin_unlock(&new_base->cpu_base->lock);
+ spin_lock(&base->cpu_base->lock);
+ timer->base = base;
+ goto again;
+ }
+ }
timer->base = new_base;
}
return new_base;
@@ -236,7 +283,7 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags)
return base;
}

-# define switch_hrtimer_base(t, b) (b)
+# define switch_hrtimer_base(t, b, p) (b)

#endif /* !CONFIG_SMP */

diff --git a/kernel/sched.c b/kernel/sched.c
index 7f1dd56..9fe3774 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -4244,6 +4244,11 @@ static struct {
.load_balancer = ATOMIC_INIT(-1),
};

+int get_nohz_load_balancer(void)
+{
+ return atomic_read(&nohz.load_balancer);
+}
+
/*
* This routine will try to nominate the ilb (idle load balancing)
* owner among the cpus whose ticks are stopped. ilb owner will do the idle
diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
index d13be21..ab20ded 100644
--- a/kernel/time/clockevents.c
+++ b/kernel/time/clockevents.c
@@ -18,6 +18,7 @@
#include <linux/notifier.h>
#include <linux/smp.h>
#include <linux/sysdev.h>
+#include <linux/tick.h>

/* The registered clock event devices */
static LIST_HEAD(clockevent_devices);
@@ -251,4 +252,15 @@ void clockevents_notify(unsigned long reason, void *arg)
spin_unlock(&clockevents_lock);
}
EXPORT_SYMBOL_GPL(clockevents_notify);
+
+ktime_t clockevents_get_next_event(int cpu)
+{
+ struct tick_device *td;
+ struct clock_event_device *dev;
+
+ td = &per_cpu(tick_cpu_device, cpu);
+ dev = td->evtdev;
+
+ return dev->next_event;
+}
#endif
diff --git a/kernel/timer.c b/kernel/timer.c
index 3424dfd..3f841db 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -37,6 +37,7 @@
#include <linux/delay.h>
#include <linux/tick.h>
#include <linux/kallsyms.h>
+#include <linux/sched.h>

#include <asm/uaccess.h>
#include <asm/unistd.h>
@@ -609,9 +610,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
{
struct tvec_base *base, *new_base;
unsigned long flags;
- int ret;
-
- ret = 0;
+ int ret = 0 , cpu;

timer_stats_timer_set_start_info(timer);
BUG_ON(!timer->function);
@@ -630,6 +629,18 @@ __mod_timer(struct timer_list *timer, unsigned long expires,

new_base = __get_cpu_var(tvec_bases);

+ cpu = smp_processor_id();
+
+#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+ if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
+ int preferred_cpu = get_nohz_load_balancer();
+
+ if (preferred_cpu >= 0)
+ cpu = preferred_cpu;
+ }
+#endif
+ new_base = per_cpu(tvec_bases, cpu);
+
if (base != new_base) {
/*
* We are trying to schedule the timer on the local CPU.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


joaomiguelcorreia at gmail

Jul 5, 2009, 9:53 PM

Post #11 of 52 (762 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

Hello

System freezes immediatly after grub, no init processing at all, after
applying those patches on top of vanilla 2.6.30 on my box.

Also, 2nd patch, this piece:

+#ifdef CONFIG_SCHED_DEBUG
+static inline unsigned int get_sysctl_timer_migration(void)
+{
+ return sysctl_timer_migration;
+}
+#else
+static inline unsigned int get_sysctl_timer_migration(void)
+{
+ return 1;
+}
+#endif

doesnt work on top of 2.6.30. It complains, while compiling, that
sysctl_timer_migration is not defined. So i just replaced that call
with return 1, like on the not debug case. Hope this doesnt defeat
your test case, but it wouldnt compile otherwise. Probably that was
just introduced after 2.6.30?

(Have also tried latest .31-rc2, original bug still exists.)

Thank you for your time,
Joao Correia
Centro de Informatica
Universidade da Beira Interior
Portugal


On Sat, Jul 4, 2009 at 4:18 PM, Jarek Poplawski<jarkao2 [at] gmail> wrote:
> Andres Freund wrote, On 07/03/2009 01:26 PM:
>
>> On Friday 03 July 2009 08:12:13 Jarek Poplawski wrote:
>>> On Fri, Jul 03, 2009 at 03:31:31AM +0200, Andres Freund wrote:
>>> ...
>>>
>>>> Ok. I finally see the light. I bisected the issue down to
>>>> eea08f32adb3f97553d49a4f79a119833036000a :  timers: Logic to move non
>>>> pinned timers
>>>>
>>>> Disabling timer migration like provided in the earlier commit stops the
>>>> issue from occuring.
>>>>
>>>> That it is related to timers is sensible in the light of my findings,
>>>> that I could trigger the issue only when using delay in netem - that is
>>>> the codepath using qdisc_watchdog...
>>> Andres, thanks for your work and time. It saved me a lot of searching,
>>> because I wasn't able to trigger this on my old box.
>> Thanks. It allowed me to go through some of my remaining paperwork ;-)
>>
>> Does anybody of you have an idea where the problem actually resides?
>> qdisc_watchdog_schedule looks innocent enough for my uneducated eyes - and the
>> patch/infrastructure from Arun goes over my head...
>> I will happily test some ideas/patches.
>
> Since there are still no 100% proofs nor suspects, here are some
> suggestions of additional checking. This bisected commit could
> probably be additionally verified by applying to 2.6.30 with a
> preceding one; I attach both of them below.
>
> Another suggestion is to try this without lockdep e.g. by setting
> /sys/module/lockdep/parameters/lock_stat and prove_locking to 0.
>
> Thanks,
> Jarek P.
>
> --------------->
> commit 597d0275736dad9c3bda6f0a00a1c477dc0f37b1
> Author: Arun R Bharadwaj <arun [at] linux>
> Date:   Thu Apr 16 12:13:26 2009 +0530
>
>    timers: Framework for identifying pinned timers
>
>    * Arun R Bharadwaj <arun [at] linux> [2009-04-16 12:11:36]:
>
>    This patch creates a new framework for identifying cpu-pinned timers
>    and hrtimers.
>
>    This framework is needed because pinned timers are expected to fire on
>    the same CPU on which they are queued. So it is essential to identify
>    these and not migrate them, in case there are any.
>
>    For regular timers, the currently existing add_timer_on() can be used
>    queue pinned timers and subsequently mod_timer_pinned() can be used
>    to modify the 'expires' field.
>
>    For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are
>    added to queue cpu-pinned hrtimer.
>
>    [. tglx: use .._PINNED mode argument instead of creating tons of new
>    functions ]
>
>    Signed-off-by: Arun R Bharadwaj <arun [at] linux>
>    Signed-off-by: Thomas Gleixner <tglx [at] linutronix>
>
> diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
> index 0d2f7c8..7400900 100644
> --- a/include/linux/hrtimer.h
> +++ b/include/linux/hrtimer.h
> @@ -30,8 +30,11 @@ struct hrtimer_cpu_base;
>  * Mode arguments of xxx_hrtimer functions:
>  */
>  enum hrtimer_mode {
> -       HRTIMER_MODE_ABS,       /* Time value is absolute */
> -       HRTIMER_MODE_REL,       /* Time value is relative to now */
> +       HRTIMER_MODE_ABS = 0x0,         /* Time value is absolute */
> +       HRTIMER_MODE_REL = 0x1,         /* Time value is relative to now */
> +       HRTIMER_MODE_PINNED = 0x02,     /* Timer is bound to CPU */
> +       HRTIMER_MODE_ABS_PINNED = 0x02,
> +       HRTIMER_MODE_REL_PINNED = 0x03,
>  };
>
>  /*
> diff --git a/include/linux/timer.h b/include/linux/timer.h
> index 6cdb6f3..ccf882e 100644
> --- a/include/linux/timer.h
> +++ b/include/linux/timer.h
> @@ -163,7 +163,10 @@ extern void add_timer_on(struct timer_list *timer, int cpu);
>  extern int del_timer(struct timer_list * timer);
>  extern int mod_timer(struct timer_list *timer, unsigned long expires);
>  extern int mod_timer_pending(struct timer_list *timer, unsigned long expires);
> +extern int mod_timer_pinned(struct timer_list *timer, unsigned long expires);
>
> +#define TIMER_NOT_PINNED       0
> +#define TIMER_PINNED           1
>  /*
>  * The jiffies value which is added to now, when there is no timer
>  * in the timer wheel:
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index cb8a15c..c71bcd5 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -193,7 +193,8 @@ struct hrtimer_clock_base *lock_hrtimer_base(const struct hrtimer *timer,
>  * Switch the timer base to the current CPU when possible.
>  */
>  static inline struct hrtimer_clock_base *
> -switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base)
> +switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
> +                   int pinned)
>  {
>        struct hrtimer_clock_base *new_base;
>        struct hrtimer_cpu_base *new_cpu_base;
> @@ -907,9 +908,9 @@ int __hrtimer_start_range_ns(struct hrtimer *timer, ktime_t tim,
>        ret = remove_hrtimer(timer, base);
>
>        /* Switch the timer base, if necessary: */
> -       new_base = switch_hrtimer_base(timer, base);
> +       new_base = switch_hrtimer_base(timer, base, mode & HRTIMER_MODE_PINNED);
>
> -       if (mode == HRTIMER_MODE_REL) {
> +       if (mode & HRTIMER_MODE_REL) {
>                tim = ktime_add_safe(tim, new_base->get_time());
>                /*
>                 * CONFIG_TIME_LOW_RES is a temporary way for architectures
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 5c1e84b..3424dfd 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -604,7 +604,8 @@ static struct tvec_base *lock_timer_base(struct timer_list *timer,
>  }
>
>  static inline int
> -__mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
> +__mod_timer(struct timer_list *timer, unsigned long expires,
> +                                               bool pending_only, int pinned)
>  {
>        struct tvec_base *base, *new_base;
>        unsigned long flags;
> @@ -668,7 +669,7 @@ out_unlock:
>  */
>  int mod_timer_pending(struct timer_list *timer, unsigned long expires)
>  {
> -       return __mod_timer(timer, expires, true);
> +       return __mod_timer(timer, expires, true, TIMER_NOT_PINNED);
>  }
>  EXPORT_SYMBOL(mod_timer_pending);
>
> @@ -702,11 +703,33 @@ int mod_timer(struct timer_list *timer, unsigned long expires)
>        if (timer->expires == expires && timer_pending(timer))
>                return 1;
>
> -       return __mod_timer(timer, expires, false);
> +       return __mod_timer(timer, expires, false, TIMER_NOT_PINNED);
>  }
>  EXPORT_SYMBOL(mod_timer);
>
>  /**
> + * mod_timer_pinned - modify a timer's timeout
> + * @timer: the timer to be modified
> + * @expires: new timeout in jiffies
> + *
> + * mod_timer_pinned() is a way to update the expire field of an
> + * active timer (if the timer is inactive it will be activated)
> + * and not allow the timer to be migrated to a different CPU.
> + *
> + * mod_timer_pinned(timer, expires) is equivalent to:
> + *
> + *     del_timer(timer); timer->expires = expires; add_timer(timer);
> + */
> +int mod_timer_pinned(struct timer_list *timer, unsigned long expires)
> +{
> +       if (timer->expires == expires && timer_pending(timer))
> +               return 1;
> +
> +       return __mod_timer(timer, expires, false, TIMER_PINNED);
> +}
> +EXPORT_SYMBOL(mod_timer_pinned);
> +
> +/**
>  * add_timer - start a timer
>  * @timer: the timer to be added
>  *
> @@ -1356,7 +1379,7 @@ signed long __sched schedule_timeout(signed long timeout)
>        expire = timeout + jiffies;
>
>        setup_timer_on_stack(&timer, process_timeout, (unsigned long)current);
> -       __mod_timer(&timer, expire, false);
> +       __mod_timer(&timer, expire, false, TIMER_NOT_PINNED);
>        schedule();
>        del_singleshot_timer_sync(&timer);
>
>
> commit eea08f32adb3f97553d49a4f79a119833036000a
> Author: Arun R Bharadwaj <arun [at] linux>
> Date:   Thu Apr 16 12:16:41 2009 +0530
>
>    timers: Logic to move non pinned timers
>
>    * Arun R Bharadwaj <arun [at] linux> [2009-04-16 12:11:36]:
>
>    This patch migrates all non pinned timers and hrtimers to the current
>    idle load balancer, from all the idle CPUs. Timers firing on busy CPUs
>    are not migrated.
>
>    While migrating hrtimers, care should be taken to check if migrating
>    a hrtimer would result in a latency or not. So we compare the expiry of the
>    hrtimer with the next timer interrupt on the target cpu and migrate the
>    hrtimer only if it expires *after* the next interrupt on the target cpu.
>    So, added a clockevents_get_next_event() helper function to return the
>    next_event on the target cpu's clock_event_device.
>
>    [ tglx: cleanups and simplifications ]
>
>    Signed-off-by: Arun R Bharadwaj <arun [at] linux>
>    Signed-off-by: Thomas Gleixner <tglx [at] linutronix>
>
> diff --git a/include/linux/clockchips.h b/include/linux/clockchips.h
> index 3a1dbba..20a100f 100644
> --- a/include/linux/clockchips.h
> +++ b/include/linux/clockchips.h
> @@ -143,3 +143,12 @@ extern void clockevents_notify(unsigned long reason, void *arg);
>  #endif
>
>  #endif
> +
> +#ifdef CONFIG_GENERIC_CLOCKEVENTS
> +extern ktime_t clockevents_get_next_event(int cpu);
> +#else
> +static inline ktime_t clockevents_get_next_event(int cpu)
> +{
> +       return (ktime_t) { .tv64 = KTIME_MAX };
> +}
> +#endif
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 6185040..311dec1 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -257,6 +257,7 @@ extern void task_rq_unlock_wait(struct task_struct *p);
>  extern cpumask_var_t nohz_cpu_mask;
>  #if defined(CONFIG_SMP) && defined(CONFIG_NO_HZ)
>  extern int select_nohz_load_balancer(int cpu);
> +extern int get_nohz_load_balancer(void);
>  #else
>  static inline int select_nohz_load_balancer(int cpu)
>  {
> @@ -1772,6 +1773,17 @@ int sched_nr_latency_handler(struct ctl_table *table, int write,
>                struct file *file, void __user *buffer, size_t *length,
>                loff_t *ppos);
>  #endif
> +#ifdef CONFIG_SCHED_DEBUG
> +static inline unsigned int get_sysctl_timer_migration(void)
> +{
> +       return sysctl_timer_migration;
> +}
> +#else
> +static inline unsigned int get_sysctl_timer_migration(void)
> +{
> +       return 1;
> +}
> +#endif
>  extern unsigned int sysctl_sched_rt_period;
>  extern int sysctl_sched_rt_runtime;
>
> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index c71bcd5..b675a67 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -43,6 +43,8 @@
>  #include <linux/seq_file.h>
>  #include <linux/err.h>
>  #include <linux/debugobjects.h>
> +#include <linux/sched.h>
> +#include <linux/timer.h>
>
>  #include <asm/uaccess.h>
>
> @@ -198,8 +200,19 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
>  {
>        struct hrtimer_clock_base *new_base;
>        struct hrtimer_cpu_base *new_cpu_base;
> +       int cpu, preferred_cpu = -1;
> +
> +       cpu = smp_processor_id();
> +#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
> +       if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
> +               preferred_cpu = get_nohz_load_balancer();
> +               if (preferred_cpu >= 0)
> +                       cpu = preferred_cpu;
> +       }
> +#endif
>
> -       new_cpu_base = &__get_cpu_var(hrtimer_bases);
> +again:
> +       new_cpu_base = &per_cpu(hrtimer_bases, cpu);
>        new_base = &new_cpu_base->clock_base[base->index];
>
>        if (base != new_base) {
> @@ -219,6 +232,40 @@ switch_hrtimer_base(struct hrtimer *timer, struct hrtimer_clock_base *base,
>                timer->base = NULL;
>                spin_unlock(&base->cpu_base->lock);
>                spin_lock(&new_base->cpu_base->lock);
> +
> +               /* Optimized away for NOHZ=n SMP=n */
> +               if (cpu == preferred_cpu) {
> +                       /* Calculate clock monotonic expiry time */
> +#ifdef CONFIG_HIGH_RES_TIMERS
> +                       ktime_t expires = ktime_sub(hrtimer_get_expires(timer),
> +                                                       new_base->offset);
> +#else
> +                       ktime_t expires = hrtimer_get_expires(timer);
> +#endif
> +
> +                       /*
> +                        * Get the next event on target cpu from the
> +                        * clock events layer.
> +                        * This covers the highres=off nohz=on case as well.
> +                        */
> +                       ktime_t next = clockevents_get_next_event(cpu);
> +
> +                       ktime_t delta = ktime_sub(expires, next);
> +
> +                       /*
> +                        * We do not migrate the timer when it is expiring
> +                        * before the next event on the target cpu because
> +                        * we cannot reprogram the target cpu hardware and
> +                        * we would cause it to fire late.
> +                        */
> +                       if (delta.tv64 < 0) {
> +                               cpu = smp_processor_id();
> +                               spin_unlock(&new_base->cpu_base->lock);
> +                               spin_lock(&base->cpu_base->lock);
> +                               timer->base = base;
> +                               goto again;
> +                       }
> +               }
>                timer->base = new_base;
>        }
>        return new_base;
> @@ -236,7 +283,7 @@ lock_hrtimer_base(const struct hrtimer *timer, unsigned long *flags)
>        return base;
>  }
>
> -# define switch_hrtimer_base(t, b)     (b)
> +# define switch_hrtimer_base(t, b, p)  (b)
>
>  #endif /* !CONFIG_SMP */
>
> diff --git a/kernel/sched.c b/kernel/sched.c
> index 7f1dd56..9fe3774 100644
> --- a/kernel/sched.c
> +++ b/kernel/sched.c
> @@ -4244,6 +4244,11 @@ static struct {
>        .load_balancer = ATOMIC_INIT(-1),
>  };
>
> +int get_nohz_load_balancer(void)
> +{
> +       return atomic_read(&nohz.load_balancer);
> +}
> +
>  /*
>  * This routine will try to nominate the ilb (idle load balancing)
>  * owner among the cpus whose ticks are stopped. ilb owner will do the idle
> diff --git a/kernel/time/clockevents.c b/kernel/time/clockevents.c
> index d13be21..ab20ded 100644
> --- a/kernel/time/clockevents.c
> +++ b/kernel/time/clockevents.c
> @@ -18,6 +18,7 @@
>  #include <linux/notifier.h>
>  #include <linux/smp.h>
>  #include <linux/sysdev.h>
> +#include <linux/tick.h>
>
>  /* The registered clock event devices */
>  static LIST_HEAD(clockevent_devices);
> @@ -251,4 +252,15 @@ void clockevents_notify(unsigned long reason, void *arg)
>        spin_unlock(&clockevents_lock);
>  }
>  EXPORT_SYMBOL_GPL(clockevents_notify);
> +
> +ktime_t clockevents_get_next_event(int cpu)
> +{
> +       struct tick_device *td;
> +       struct clock_event_device *dev;
> +
> +       td = &per_cpu(tick_cpu_device, cpu);
> +       dev = td->evtdev;
> +
> +       return dev->next_event;
> +}
>  #endif
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 3424dfd..3f841db 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -37,6 +37,7 @@
>  #include <linux/delay.h>
>  #include <linux/tick.h>
>  #include <linux/kallsyms.h>
> +#include <linux/sched.h>
>
>  #include <asm/uaccess.h>
>  #include <asm/unistd.h>
> @@ -609,9 +610,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
>  {
>        struct tvec_base *base, *new_base;
>        unsigned long flags;
> -       int ret;
> -
> -       ret = 0;
> +       int ret = 0 , cpu;
>
>        timer_stats_timer_set_start_info(timer);
>        BUG_ON(!timer->function);
> @@ -630,6 +629,18 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
>
>        new_base = __get_cpu_var(tvec_bases);
>
> +       cpu = smp_processor_id();
> +
> +#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
> +       if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
> +               int preferred_cpu = get_nohz_load_balancer();
> +
> +               if (preferred_cpu >= 0)
> +                       cpu = preferred_cpu;
> +       }
> +#endif
> +       new_base = per_cpu(tvec_bases, cpu);
> +
>        if (base != new_base) {
>                /*
>                 * We are trying to schedule the timer on the local CPU.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jarkao2 at gmail

Jul 6, 2009, 1:14 AM

Post #12 of 52 (757 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Mon, Jul 06, 2009 at 05:53:51AM +0100, Joao Correia wrote:
> Hello
Hi,

> System freezes immediatly after grub, no init processing at all, after
> applying those patches on top of vanilla 2.6.30 on my box.
>
> Also, 2nd patch, this piece:
>
> +#ifdef CONFIG_SCHED_DEBUG
> +static inline unsigned int get_sysctl_timer_migration(void)
> +{
> + return sysctl_timer_migration;
> +}
> +#else
> +static inline unsigned int get_sysctl_timer_migration(void)
> +{
> + return 1;
> +}
> +#endif
>
> doesnt work on top of 2.6.30. It complains, while compiling, that
> sysctl_timer_migration is not defined. So i just replaced that call
> with return 1, like on the not debug case. Hope this doesnt defeat
> your test case, but it wouldnt compile otherwise. Probably that was
> just introduced after 2.6.30?

Yes, it would be too easy if it had worked just like this... (I only
test-compiled it, without such warnings, but without CONFIG_SCHED_DEBUG.)
I'll look at this again.

>
> (Have also tried latest .31-rc2, original bug still exists.)

Btw., could you send your .config? Do you have lockdep enabled (if so
did you try without it)? Is it 'a real' box or something virtual?

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


joaomiguelcorreia at gmail

Jul 6, 2009, 4:28 AM

Post #13 of 52 (754 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

Hello

.config is attached. This is a real box, a fedora 11 with rawhide
updates applied. Its a quad core system (amd phenom 9600) with 4gb
ddr2 ram. The board is an asus m3a-h/hdmi. We have a couple of these
boxes just to test run linux before dropping into production, hence
the availability for tests.

Haven't tried without lockdep, will try later and let you know if any
difference.

Thank you for your time,
Joao Correia
Centro de Informatica
Universidade da Beira Interior
Portugal

On Mon, Jul 6, 2009 at 9:14 AM, Jarek Poplawski<jarkao2 [at] gmail> wrote:
> On Mon, Jul 06, 2009 at 05:53:51AM +0100, Joao Correia wrote:
>> Hello
> Hi,
>
>> System freezes immediatly after grub, no init processing at all, after
>> applying those patches on top of vanilla 2.6.30 on my box.
>>
>> Also, 2nd patch, this piece:
>>
>> +#ifdef CONFIG_SCHED_DEBUG
>> +static inline unsigned int get_sysctl_timer_migration(void)
>> +{
>> +       return sysctl_timer_migration;
>> +}
>> +#else
>> +static inline unsigned int get_sysctl_timer_migration(void)
>> +{
>> +       return 1;
>> +}
>> +#endif
>>
>> doesnt work on top of 2.6.30. It complains, while compiling, that
>> sysctl_timer_migration is not defined. So i just replaced that call
>> with return 1, like on the not debug case. Hope this doesnt defeat
>> your test case, but it wouldnt compile otherwise. Probably that was
>> just introduced after 2.6.30?
>
> Yes, it would be too easy if it had worked just like this... (I only
> test-compiled it, without such warnings, but without CONFIG_SCHED_DEBUG.)
> I'll look at this again.
>
>>
>> (Have also tried latest .31-rc2, original bug still exists.)
>
> Btw., could you send your .config? Do you have lockdep enabled (if so
> did you try without it)? Is it 'a real' box or something virtual?
>
> Thanks,
> Jarek P.
>
Attachments: config (97.9 KB)


jarkao2 at gmail

Jul 6, 2009, 7:19 AM

Post #14 of 52 (754 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Mon, Jul 06, 2009 at 05:53:51AM +0100, Joao Correia wrote:
> Hello
>
> System freezes immediatly after grub, no init processing at all, after
> applying those patches on top of vanilla 2.6.30 on my box.
...
> doesnt work on top of 2.6.30. It complains, while compiling, that
> sysctl_timer_migration is not defined. So i just replaced that call
> with return 1, like on the not debug case. Hope this doesnt defeat
> your test case, but it wouldnt compile otherwise. Probably that was
> just introduced after 2.6.30?

Yes, my bad, sorry. I've found 2 more patches from this series; can't
guarantee that's all, but seems to work & migrate within my one and
only core without any problems ;-)

Jarek P.
Attachments: timers.597d027 (5.45 KB)
  timers.5c333864a (3.51 KB)
  timers.cd1bb94b4a (1.97 KB)
  timers.eea08f32 (6.95 KB)


andres at anarazel

Jul 6, 2009, 9:13 AM

Post #15 of 52 (751 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Monday 06 July 2009 16:19:16 Jarek Poplawski wrote:
> On Mon, Jul 06, 2009 at 05:53:51AM +0100, Joao Correia wrote:
> > Hello
> >
> > System freezes immediatly after grub, no init processing at all, after
> > applying those patches on top of vanilla 2.6.30 on my box.
>
> ...
>
> > doesnt work on top of 2.6.30. It complains, while compiling, that
> > sysctl_timer_migration is not defined. So i just replaced that call
> > with return 1, like on the not debug case. Hope this doesnt defeat
> > your test case, but it wouldnt compile otherwise. Probably that was
> > just introduced after 2.6.30?

I stupidly sent two emails in private to Jarek. Reposting here:

Jarek:
> > > > > Yes, my bad, sorry. I've found 2 more patches from this series;
can't
> > > > > guarantee that's all, but seems to work & migrate within my one and
> > > > > only core without any problems ;-)
Andres:
> > > > I have some doubt that this will give us new information:
> > > > The commit i bisected the failure to:
> > > > eea08f32adb3f97553d49a4f79a119833036000a
> > > > Is just 2.6.30-rc4 + the four commits you listed...
Jarek:
> > > I guess, you mean 2.6.31-rc1?
Andres:
> > No - I tested the timer development branch to exclude its a problem caused
> > by some other change between 2.6.30 and 2.6.31-git
> > And that branch is based on rc4...
Jarek:
> I misunderstood, sorry! That's just what I needed to know!

Andres:
> > > > And I seperately tested eea08f32adb3f97553d49a4f79a119833036000a^ to
> > > > be sure. So I am pretty sure its those commits which trigger the
> > > > problem - whats causing it is another matter.
Jarek:
> > > It might be true, but it isn't 100% proof. This patchset is special:
> > > by moving timers to other cores it generates much more SMP concurrency,
> > > so it could trigger some hidden races, which otherwise need much more
> > > time to show up. So I'm trying to establish if this could be the case.
> > > Btw., I guess there is nothing to hide from the lists, plus somebody
> > > could verify this idea?
Andres:
> > No, absolutely not. Just hit the wrong key. Sorry.
> > Btw, I ran netem with delay for more than 48h on around 80mbit... That
> > does not exclude such a rarely triggered race, but makes it a bit more
> > unlikely. (With migration thats around 3sec or so)
> This is a very important information: it should give timers' guys some
> incentive to start looking for this, and me less incentive to verify
> network code ;-)
Jarek:
> Btw., there were some strange traces of lockdep and stack overruning;
> did you try if without lockdep maybe there are some more readable
> warnings?
Lockdep was not enabled at first. Actually I think most if not all of the
traces I posted at first were without.

Will verify.
> And once again, consider resending this to the public, please. (At
> least Joao might be interested.)
Sorry once more.


Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jarkao2 at gmail

Jul 6, 2009, 9:31 AM

Post #16 of 52 (752 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Mon, Jul 06, 2009 at 06:13:29PM +0200, Andres Freund wrote:
...
> > > Btw, I ran netem with delay for more than 48h on around 80mbit... That
> > > does not exclude such a rarely triggered race, but makes it a bit more
> > > unlikely. (With migration thats around 3sec or so)
...
> Sorry once more.

Andres, the bisection + the above - you did 'the whole lotta work' and
you shouldn't be sorry at all! ;-)

Thanks again,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


joaomiguelcorreia at gmail

Jul 6, 2009, 10:23 AM

Post #17 of 52 (750 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

Hello

Since i already had the kernel compiled and ready to boot when i read
this, i gave it a go anyway :-).

I can reproduce the freeze with those 4 patches applied, so i can
confirm that its, at least, related to, or exposed by, those patches.
There must be something else too, or its just too much fuzziness, but
the freeze takes a bit more time (approximately five minutes, give or
take) compared to the instant freeze before, but its there with the
patches, and without them, no freeze.

I assume there isnt a "safe" way to get them out of current .31-rc's, right?

Thank you very much for your time,
Joao Correia


On Mon, Jul 6, 2009 at 5:31 PM, Jarek Poplawski<jarkao2 [at] gmail> wrote:
> On Mon, Jul 06, 2009 at 06:13:29PM +0200, Andres Freund wrote:
> ...
>> > > Btw, I ran netem with delay for more than 48h on around 80mbit... That
>> > > does  not exclude such a rarely triggered race, but makes it a bit more
>> > > unlikely. (With migration thats around 3sec or so)
> ...
>> Sorry once more.
>
> Andres, the bisection + the above - you did 'the whole lotta work' and
> you shouldn't be sorry at all! ;-)
>
> Thanks again,
> Jarek P.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


andres at anarazel

Jul 6, 2009, 10:24 AM

Post #18 of 52 (752 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Monday 06 July 2009 18:31:06 Jarek Poplawski wrote:
> On Mon, Jul 06, 2009 at 06:13:29PM +0200, Andres Freund wrote:
> > > > Btw, I ran netem with delay for more than 48h on around 80mbit...
> > > > That does not exclude such a rarely triggered race, but makes it a
> > > > bit more unlikely. (With migration thats around 3sec or so)
> > Sorry once more.
> Andres, the bisection + the above - you did 'the whole lotta work' and
> you shouldn't be sorry at all! ;-)
To be fair I have to admit I did run netem for 48hours for other reasons ;-)
I also ran my modified netem, but all it currently does is adjusing
delay/jitter by other means, so that should have no influence.
I am trying to produce artificial loss patterns similar to really occuring
network loss.

I just switched mail client btw ;-)

ATM I have no further ideas except maybe trying different timers or a lower
timer resolution?

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


andres at anarazel

Jul 6, 2009, 10:26 AM

Post #19 of 52 (750 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Monday 06 July 2009 19:23:18 Joao Correia wrote:
> Hello
>
> Since i already had the kernel compiled and ready to boot when i read
> this, i gave it a go anyway :-).
>
> I can reproduce the freeze with those 4 patches applied, so i can
> confirm that its, at least, related to, or exposed by, those patches.
> There must be something else too, or its just too much fuzziness, but
> the freeze takes a bit more time (approximately five minutes, give or
> take) compared to the instant freeze before, but its there with the
> patches, and without them, no freeze.
>
> I assume there isnt a "safe" way to get them out of current .31-rc's,
> right?
`echo 0 > /proc/sys/kernel/timer_migration` should mitigate the problem.

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jarkao2 at gmail

Jul 6, 2009, 11:50 PM

Post #20 of 52 (736 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Mon, Jul 06, 2009 at 07:26:43PM +0200, Andres Freund wrote:
> On Monday 06 July 2009 19:23:18 Joao Correia wrote:
> > Hello
> >
> > Since i already had the kernel compiled and ready to boot when i read
> > this, i gave it a go anyway :-).
> >
> > I can reproduce the freeze with those 4 patches applied, so i can
> > confirm that its, at least, related to, or exposed by, those patches.
> > There must be something else too, or its just too much fuzziness, but
> > the freeze takes a bit more time (approximately five minutes, give or
> > take) compared to the instant freeze before, but its there with the
> > patches, and without them, no freeze.
> >
> > I assume there isnt a "safe" way to get them out of current .31-rc's,
> > right?
> `echo 0 > /proc/sys/kernel/timer_migration` should mitigate the problem.

I guess it should fix it entirely. Btw., here is a patch disabling the
timers' part, so to make it hrtimers only. Could you try?

Thanks,
Jarek P.
---

diff --git a/kernel/timer.c b/kernel/timer.c
index 0b36b9e..011429c 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -634,7 +634,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,

cpu = smp_processor_id();

-#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
+#if 0
if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
int preferred_cpu = get_nohz_load_balancer();

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


joaomiguelcorreia at gmail

Jul 7, 2009, 3:40 AM

Post #21 of 52 (737 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

I am now running 2.6.31-rc2 for a couple of hours, no freeze.

Let me know what/if i can help with tracking down the original source
of the problem.

Thank you very much for your time,
Joao Correia

On Tue, Jul 7, 2009 at 7:50 AM, Jarek Poplawski<jarkao2 [at] gmail> wrote:
> On Mon, Jul 06, 2009 at 07:26:43PM +0200, Andres Freund wrote:
>> On Monday 06 July 2009 19:23:18 Joao Correia wrote:
>> > Hello
>> >
>> > Since i already had the kernel compiled and ready to boot when i read
>> > this, i gave it a go anyway :-).
>> >
>> > I can reproduce the freeze with those 4 patches applied, so i can
>> > confirm that its, at least, related to, or exposed by, those patches.
>> > There must be something else too, or its just too much fuzziness, but
>> > the freeze takes a bit more time (approximately five minutes, give or
>> > take) compared to the instant freeze before, but its there with the
>> > patches, and without them, no freeze.
>> >
>> > I assume there isnt a "safe" way to get them out of current .31-rc's,
>> > right?
>> `echo 0 > /proc/sys/kernel/timer_migration` should mitigate the problem.
>
> I guess it should fix it entirely. Btw., here is a patch disabling the
> timers' part, so to make it hrtimers only. Could you try?
>
> Thanks,
> Jarek P.
> ---
>
> diff --git a/kernel/timer.c b/kernel/timer.c
> index 0b36b9e..011429c 100644
> --- a/kernel/timer.c
> +++ b/kernel/timer.c
> @@ -634,7 +634,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
>
>        cpu = smp_processor_id();
>
> -#if defined(CONFIG_NO_HZ) && defined(CONFIG_SMP)
> +#if 0
>        if (!pinned && get_sysctl_timer_migration() && idle_cpu(cpu)) {
>                int preferred_cpu = get_nohz_load_balancer();
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


andres at anarazel

Jul 7, 2009, 3:47 AM

Post #22 of 52 (736 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Tuesday 07 July 2009 12:40:16 Joao Correia wrote:
> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>
> Let me know what/if i can help with tracking down the original source
> of the problem.
You dont see the problem anymore with the `echo 0 >
/proc/sys/kernel/timer_migration` change (or equivalently with the patch from
Jarek) or has the problem vanished completely?

Andres
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jarkao2 at gmail

Jul 7, 2009, 6:18 AM

Post #23 of 52 (722 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>
> Let me know what/if i can help with tracking down the original source
> of the problem.

OK, so we know it's only about timers. Here is another tiny patch
(the previous one should be removed), which could tell (with oops) if
there's something while migrating. Anyway, the bug should be back :-(

Thanks,
Jarek P.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


jarkao2 at gmail

Jul 7, 2009, 6:20 AM

Post #24 of 52 (723 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

Sorry, here is this tiny patch!

On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> I am now running 2.6.31-rc2 for a couple of hours, no freeze.
>
> Let me know what/if i can help with tracking down the original source
> of the problem.

OK, so we know it's only about timers. Here is another tiny patch
(the previous one should be removed), which could tell (with oops) if
there's something while migrating. Anyway, the bug should be back :-(

Thanks,
Jarek P.
---

kernel/timer.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/timer.c b/kernel/timer.c
index 0b36b9e..61ba855 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -658,6 +658,7 @@ __mod_timer(struct timer_list *timer, unsigned long expires,
spin_unlock(&base->lock);
base = new_base;
spin_lock(&base->lock);
+ BUG_ON(tbase_get_base(timer->base));
timer_set_base(timer, base);
}
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


andres at anarazel

Jul 7, 2009, 6:22 AM

Post #25 of 52 (723 views)
Permalink
Re: Soft-Lockup/Race in networking in 2.6.31-rc1+195 ( possibly?caused by netem) [In reply to]

On Tuesday 07 July 2009 15:18:03 Jarek Poplawski wrote:
> On Tue, Jul 07, 2009 at 11:40:16AM +0100, Joao Correia wrote:
> > I am now running 2.6.31-rc2 for a couple of hours, no freeze.
> > Let me know what/if i can help with tracking down the original source
> > of the problem.
> OK, so we know it's only about timers. Here is another tiny patch
> (the previous one should be removed), which could tell (with oops) if
> there's something while migrating. Anyway, the bug should be back :-(
How do we know this? It still could be a race uncovered by timer migration,
right?

Andres

PS: You forgot the patch ;-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First page Previous page 1 2 3 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.