Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

2.6.32.21 - uptime related crashes?

 

 

First page Previous page 1 2 3 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded


johnstul at us

Oct 25, 2011, 3:44 PM

Post #51 of 58 (1439 views)
Permalink
Re: [stable] 2.6.32.21 - uptime related crashes? [In reply to]

On Sun, 2011-10-23 at 20:31 +0200, Ruben Kerkhof wrote:
> On Mon, Sep 5, 2011 at 01:26, Faidon Liambotis <paravoid [at] debian> wrote:
> > On Tue, Aug 30, 2011 at 03:38:29PM -0700, Greg KH wrote:
> >> On Thu, Aug 25, 2011 at 09:56:16PM +0300, Faidon Liambotis wrote:
> >> > On Thu, Jul 21, 2011 at 08:45:25PM +0200, Ingo Molnar wrote:
> >> > > * Peter Zijlstra <peterz [at] infradead> wrote:
> >> > >
> >> > > > On Thu, 2011-07-21 at 14:50 +0200, Nikola Ciprich wrote:
> >> > > > > thanks for the patch! I'll put this on our testing boxes...
> >> > > >
> >> > > > With a patch that frobs the starting value close to overflowing I hope,
> >> > > > otherwise we'll not hear from you in like 7 months ;-)
> >> > > >
> >> > > > > Are You going to push this upstream so we can ask Greg to push this to
> >> > > > > -stable?
> >> > > >
> >> > > > Yeah, I think we want to commit this with a -stable tag, Ingo?
> >> > >
> >> > > yeah - and we also want a Reported-by tag and an explanation of how
> >> > > it can crash and why it matters in practice. I can then stick it into
> >> > > the urgent branch for Linus. (probably will only hit upstream in the
> >> > > merge window though.)
> >> >
> >> > Has this been pushed or has the problem been solved somehow? Time is
> >> > against us on this bug as more boxes will crash as they reach 200 days
> >> > of uptime...
> >> >
> >> > In any case, feel free to use me as a Reported-by, my full report of the
> >> > problem being <20110430173905.GA25641 [at] tty>.
> >> >
> >> > FWIW and if I understand correctly, my symptoms were caused by *two*
> >> > different bugs:
> >> > a) the 54 bits wraparound at 208 days that Peter fixed above,
> >> > b) a kernel crash at ~215 days related to RT tasks, fixed by
> >> > 305e6835e05513406fa12820e40e4a8ecb63743c (already in -stable).
> >>
> >> So, what do I do here as part of the .32-longterm kernel? Is there a
> >> fix that is in Linus's tree that I need to apply here?
> >>
> >> confused,
> >
> > Is this even pushed upstream? I checked Linus' tree and the proposed
> > patch is *not* merged there. I'm not really sure if it was fixed some
> > other way, though. I thought this was intended to be an "urgent" fix or
> > something?
> >
> > Regards,
> > Faidon
>
> I just had two crashes on two different machines, both with an uptime
> of 208 days.
> Both were 5520's running 2.6.34.8, but with a CONFIG_HZ of 1000
>
> 2011-10-23T16:49:18.618029+02:00 phy001 kernel: BUG: soft lockup -
> CPU#0 stuck for 17163091968s! [qemu-kvm:16949]

So were these actual crashes, or just softlockup false positives?

I had thought the earlier crash issue (div by zero) fix from PeterZ had
been already pushed upstream, but maybe that was just against 2.6.32 and
not 2.6.33?

The softlockup false positive issue should have been fixed by Peter's
"x86, intel: Don't mark sched_clock() as stable" below. But I'm not
seeing it upstream. Peter, is this still the right fix?

thanks
-john


From: Peter Zijlstra <a.p.zijlstra [at] chello>
Subject: x86, intel: Don't mark sched_clock() as stable

Because the x86 sched_clock() implementation wraps at 54 bits and the
scheduler code assumes it wraps at the full 64bits we can get into
trouble after 208 days (~7 months) of uptime.

Signed-off-by: Peter Zijlstra <a.p.zijlstra [at] chello>
---
arch/x86/kernel/cpu/intel.c | 7 +++++++
1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index ed6086e..c8dc48b 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -91,8 +91,15 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
if (c->x86_power & (1 << 8)) {
set_cpu_cap(c, X86_FEATURE_CONSTANT_TSC);
set_cpu_cap(c, X86_FEATURE_NONSTOP_TSC);
+ /*
+ * Unfortunately our __cycles_2_ns() implementation makes
+ * the raw sched_clock() interface wrap at 54-bits, which
+ * makes it unsuitable for direct use, so disable this
+ * for now.
+ *
if (!check_tsc_unstable())
sched_clock_stable = 1;
+ */
}

/*



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


w at 1wt

Oct 25, 2011, 4:25 PM

Post #52 of 58 (1417 views)
Permalink
Re: [stable] 2.6.32.21 - uptime related crashes? [In reply to]

Hi John,

On Tue, Oct 25, 2011 at 03:44:30PM -0700, john stultz wrote:
> The softlockup false positive issue should have been fixed by Peter's
> "x86, intel: Don't mark sched_clock() as stable" below. But I'm not
> seeing it upstream. Peter, is this still the right fix?

I've not seen any other one proposed, and both you and Peter appeared
to like it. I understood that Ingo was waiting for the merge window to
submit it and I think that it simply got lost.

Ingo, can you confirm ?

Thanks,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


ruben at rubenkerkhof

Oct 26, 2011, 11:21 AM

Post #53 of 58 (1421 views)
Permalink
Re: [stable] 2.6.32.21 - uptime related crashes? [In reply to]

On Wed, Oct 26, 2011 at 00:44, john stultz <johnstul [at] us> wrote:
> On Sun, 2011-10-23 at 20:31 +0200, Ruben Kerkhof wrote:

>> I just had two crashes on two different machines, both with an uptime
>> of 208 days.
>> Both were 5520's running 2.6.34.8, but with a CONFIG_HZ of 1000
>>
>> 2011-10-23T16:49:18.618029+02:00 phy001 kernel: BUG: soft lockup -
>> CPU#0 stuck for 17163091968s! [qemu-kvm:16949]
>
> So were these actual crashes, or just softlockup false positives?

Just softlockups, I haven't seen the divide_by_zero crash.

Thanks,

Ruben
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


greg at kroah

Dec 2, 2011, 3:45 PM

Post #54 of 58 (1381 views)
Permalink
Re: [stable] 2.6.32.21 - uptime related crashes? [In reply to]

On Wed, Oct 26, 2011 at 01:25:45AM +0200, Willy Tarreau wrote:
> Hi John,
>
> On Tue, Oct 25, 2011 at 03:44:30PM -0700, john stultz wrote:
> > The softlockup false positive issue should have been fixed by Peter's
> > "x86, intel: Don't mark sched_clock() as stable" below. But I'm not
> > seeing it upstream. Peter, is this still the right fix?
>
> I've not seen any other one proposed, and both you and Peter appeared
> to like it. I understood that Ingo was waiting for the merge window to
> submit it and I think that it simply got lost.
>
> Ingo, can you confirm ?

I'm totally confused here, what's the status of this, and what exactly
is the patch?

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


johnstul at us

Dec 2, 2011, 4:02 PM

Post #55 of 58 (1374 views)
Permalink
Re: [stable] 2.6.32.21 - uptime related crashes? [In reply to]

On Fri, 2011-12-02 at 15:45 -0800, Greg KH wrote:
> On Wed, Oct 26, 2011 at 01:25:45AM +0200, Willy Tarreau wrote:
> > Hi John,
> >
> > On Tue, Oct 25, 2011 at 03:44:30PM -0700, john stultz wrote:
> > > The softlockup false positive issue should have been fixed by Peter's
> > > "x86, intel: Don't mark sched_clock() as stable" below. But I'm not
> > > seeing it upstream. Peter, is this still the right fix?
> >
> > I've not seen any other one proposed, and both you and Peter appeared
> > to like it. I understood that Ingo was waiting for the merge window to
> > submit it and I think that it simply got lost.
> >
> > Ingo, can you confirm ?
>
> I'm totally confused here, what's the status of this, and what exactly
> is the patch?

Ingo has the fix from Salman queued in -tip, but I'm not sure why its
not been pushed to Linus yet.

http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commit;h=4cecf6d401a01d054afc1e5f605bcbfe553cb9b9

thanks
-john

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


greg at kroah

Dec 2, 2011, 5:02 PM

Post #56 of 58 (1366 views)
Permalink
Re: [stable] 2.6.32.21 - uptime related crashes? [In reply to]

On Fri, Dec 02, 2011 at 04:02:23PM -0800, john stultz wrote:
> On Fri, 2011-12-02 at 15:45 -0800, Greg KH wrote:
> > On Wed, Oct 26, 2011 at 01:25:45AM +0200, Willy Tarreau wrote:
> > > Hi John,
> > >
> > > On Tue, Oct 25, 2011 at 03:44:30PM -0700, john stultz wrote:
> > > > The softlockup false positive issue should have been fixed by Peter's
> > > > "x86, intel: Don't mark sched_clock() as stable" below. But I'm not
> > > > seeing it upstream. Peter, is this still the right fix?
> > >
> > > I've not seen any other one proposed, and both you and Peter appeared
> > > to like it. I understood that Ingo was waiting for the merge window to
> > > submit it and I think that it simply got lost.
> > >
> > > Ingo, can you confirm ?
> >
> > I'm totally confused here, what's the status of this, and what exactly
> > is the patch?
>
> Ingo has the fix from Salman queued in -tip, but I'm not sure why its
> not been pushed to Linus yet.
>
> http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commit;h=4cecf6d401a01d054afc1e5f605bcbfe553cb9b9

Wonderful, thanks for pointing this out to me.

Ingo, any idea when this will go to Linus's tree?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


w at 1wt

Dec 2, 2011, 11:00 PM

Post #57 of 58 (1360 views)
Permalink
Re: [stable] 2.6.32.21 - uptime related crashes? [In reply to]

On Fri, Dec 02, 2011 at 05:02:32PM -0800, Greg KH wrote:
> On Fri, Dec 02, 2011 at 04:02:23PM -0800, john stultz wrote:
> > On Fri, 2011-12-02 at 15:45 -0800, Greg KH wrote:
> > > On Wed, Oct 26, 2011 at 01:25:45AM +0200, Willy Tarreau wrote:
> > > > Hi John,
> > > >
> > > > On Tue, Oct 25, 2011 at 03:44:30PM -0700, john stultz wrote:
> > > > > The softlockup false positive issue should have been fixed by Peter's
> > > > > "x86, intel: Don't mark sched_clock() as stable" below. But I'm not
> > > > > seeing it upstream. Peter, is this still the right fix?
> > > >
> > > > I've not seen any other one proposed, and both you and Peter appeared
> > > > to like it. I understood that Ingo was waiting for the merge window to
> > > > submit it and I think that it simply got lost.
> > > >
> > > > Ingo, can you confirm ?
> > >
> > > I'm totally confused here, what's the status of this, and what exactly
> > > is the patch?
> >
> > Ingo has the fix from Salman queued in -tip, but I'm not sure why its
> > not been pushed to Linus yet.
> >
> > http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commit;h=4cecf6d401a01d054afc1e5f605bcbfe553cb9b9
>
> Wonderful, thanks for pointing this out to me.
>
> Ingo, any idea when this will go to Linus's tree?

Yes please Ingo, do not delay it any further, this is becoming a real
problem, there are people who monitor their uptime to plan a reboot
before 200 days. We shouldn't need to wait for the next merge window,
the patch is already 15 days old and is a fix for a real-world stability
issue !

Thanks,
Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mingo at elte

Dec 5, 2011, 8:53 AM

Post #58 of 58 (1370 views)
Permalink
Re: [stable] 2.6.32.21 - uptime related crashes? [In reply to]

* Greg KH <greg [at] kroah> wrote:

> On Fri, Dec 02, 2011 at 04:02:23PM -0800, john stultz wrote:
> > On Fri, 2011-12-02 at 15:45 -0800, Greg KH wrote:
> > > On Wed, Oct 26, 2011 at 01:25:45AM +0200, Willy Tarreau wrote:
> > > > Hi John,
> > > >
> > > > On Tue, Oct 25, 2011 at 03:44:30PM -0700, john stultz wrote:
> > > > > The softlockup false positive issue should have been fixed by Peter's
> > > > > "x86, intel: Don't mark sched_clock() as stable" below. But I'm not
> > > > > seeing it upstream. Peter, is this still the right fix?
> > > >
> > > > I've not seen any other one proposed, and both you and Peter appeared
> > > > to like it. I understood that Ingo was waiting for the merge window to
> > > > submit it and I think that it simply got lost.
> > > >
> > > > Ingo, can you confirm ?
> > >
> > > I'm totally confused here, what's the status of this, and what exactly
> > > is the patch?
> >
> > Ingo has the fix from Salman queued in -tip, but I'm not sure why its
> > not been pushed to Linus yet.
> >
> > http://git.kernel.org/?p=linux/kernel/git/tip/tip.git;a=commit;h=4cecf6d401a01d054afc1e5f605bcbfe553cb9b9
>
> Wonderful, thanks for pointing this out to me.
>
> Ingo, any idea when this will go to Linus's tree?

today if everything goes fine.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First page Previous page 1 2 3 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.