Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

BUG: soft lockup detected on CPU#1, ipmi_si

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


peter at palfrader

Jun 13, 2006, 4:35 PM

Post #1 of 9 (6524 views)
Permalink
BUG: soft lockup detected on CPU#1, ipmi_si

Hi,

On 2.6.17-rc6 I get these soft lockup warnings quite regularly on one of
my DL145-G2 systems (dual dual-core (opteron 275), x86_64).

Here's an example of one:

Jun 14 00:57:55 laura kernel: [97133.246571] BUG: soft lockup detected on CPU#1!
Jun 14 00:57:55 laura kernel: [97133.247456]
Jun 14 00:57:55 laura kernel: [97133.247457] Call Trace: <IRQ> <ffffffff802a6eba>{softlockup_tick+250}
Jun 14 00:57:55 laura kernel: [97133.247496] <ffffffff80293bb7>{update_process_times+87} <ffffffff80279923>{smp_local_timer_interrupt+35}
Jun 14 00:57:55 laura kernel: [97133.247546] <ffffffff80279e61>{smp_apic_timer_interrupt+65} <ffffffff8026712e>{apic_timer_interrupt+98} <EOI>
Jun 14 00:57:55 laura kernel: [97133.247599] <ffffffff802526c3>{try_to_del_timer_sync+83} <ffffffff8026be38>{_spin_unlock_irqrestore+8}
Jun 14 00:57:55 laura kernel: [97133.247652] <ffffffff880eab78>{:ipmi_si:ipmi_thread+72} <ffffffff880eab30>{:ipmi_si:ipmi_thread+0}
Jun 14 00:57:55 laura kernel: [97133.247707] <ffffffff80237099>{kthread+217} <ffffffff8026743a>{child_rip+8}
Jun 14 00:57:55 laura kernel: [97133.247757] <ffffffff8029d280>{keventd_create_kthread+0} <ffffffff80236fc0>{kthread+0}
Jun 14 00:57:55 laura kernel: [97133.247807] <ffffffff80267432>{child_rip+0}

I have several more, all of which include ipmi_si in one way or
another. Usually these lockups happened on CPUs 2 and 3, tho today I
got one of CPU 1.

It's interesting to note that these logs are happening every 30 minutes
(munin queries 'openipmi -I open sensors' about once every 5 minutes).

Jun 12 18:45:40 laura kernel: [ 2876.040688] BUG: soft lockup detected on CPU#2!
Jun 12 19:15:54 laura kernel: [ 4688.141966] BUG: soft lockup detected on CPU#2!
Jun 12 19:46:06 laura kernel: [ 6499.159958] BUG: soft lockup detected on CPU#3!
Jun 12 20:16:17 laura kernel: [ 8309.394462] BUG: soft lockup detected on CPU#2!
Jun 12 20:46:29 laura kernel: [10120.468415] BUG: soft lockup detected on CPU#3!
Jun 12 21:16:42 laura kernel: [11931.614321] BUG: soft lockup detected on CPU#2!
Jun 12 21:46:54 laura kernel: [13742.672285] BUG: soft lockup detected on CPU#2!
[rebooted in between]
Jun 14 00:57:55 laura kernel: [97133.246571] BUG: soft lockup detected on CPU#1!
Jun 14 01:28:07 laura kernel: [98944.316467] BUG: soft lockup detected on CPU#3!

Not having the ipmi_si module loaded means these warnings don't happen.

I suspect this started happening when I upgraded the firmware of the
iLo 100i BMC from 1.00 to version 1.23 (the other system's firmware
wasn't upgraded yet). I'm still reporting it here on the chance that it
actually is a kernel/ipmi_si module bug.

Cheers,
Peter
--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minyard at acm

Jun 18, 2006, 8:59 PM

Post #2 of 9 (6416 views)
Permalink
Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si [In reply to]

Sorry, I've been on vacation for a week without internet, so I'm just
getting caught up.

The IPMI driver spawns a low-priority thread that will poll the driver
when it finds there is something to do. It's possible that the hardware
is not setting things properly and is always telling the driver it has
to do something. It's possible that the new version of the firmware
enabled interrupts; I think there's a problem with the driver here; it
should not really enable the kernel thread if interrupts are working.
The driver should also probably call schedule() instead of udelay() in
the kernel thread when a short timeout is requested by the state machine.

In either situation, the kernel thread will sit there and spin, and if
nothing else is scheduled for 10 seconds on that CPU you will get that
warning. Can you check a few things for me?

cat /proc/ipmi/0/si_stats and send me the output.

If you do "top", is the kipmi0 always running?

Is your IPMI interface KCS or SMIC? The IPMI driver should report this
in the system log at startup.

Thanks,

-Corey

Peter Palfrader wrote:
> Hi,
>
> On 2.6.17-rc6 I get these soft lockup warnings quite regularly on one of
> my DL145-G2 systems (dual dual-core (opteron 275), x86_64).
>
> Here's an example of one:
>
> Jun 14 00:57:55 laura kernel: [97133.246571] BUG: soft lockup detected on CPU#1!
> Jun 14 00:57:55 laura kernel: [97133.247456]
> Jun 14 00:57:55 laura kernel: [97133.247457] Call Trace: <IRQ> <ffffffff802a6eba>{softlockup_tick+250}
> Jun 14 00:57:55 laura kernel: [97133.247496] <ffffffff80293bb7>{update_process_times+87} <ffffffff80279923>{smp_local_timer_interrupt+35}
> Jun 14 00:57:55 laura kernel: [97133.247546] <ffffffff80279e61>{smp_apic_timer_interrupt+65} <ffffffff8026712e>{apic_timer_interrupt+98} <EOI>
> Jun 14 00:57:55 laura kernel: [97133.247599] <ffffffff802526c3>{try_to_del_timer_sync+83} <ffffffff8026be38>{_spin_unlock_irqrestore+8}
> Jun 14 00:57:55 laura kernel: [97133.247652] <ffffffff880eab78>{:ipmi_si:ipmi_thread+72} <ffffffff880eab30>{:ipmi_si:ipmi_thread+0}
> Jun 14 00:57:55 laura kernel: [97133.247707] <ffffffff80237099>{kthread+217} <ffffffff8026743a>{child_rip+8}
> Jun 14 00:57:55 laura kernel: [97133.247757] <ffffffff8029d280>{keventd_create_kthread+0} <ffffffff80236fc0>{kthread+0}
> Jun 14 00:57:55 laura kernel: [97133.247807] <ffffffff80267432>{child_rip+0}
>
> I have several more, all of which include ipmi_si in one way or
> another. Usually these lockups happened on CPUs 2 and 3, tho today I
> got one of CPU 1.
>
> It's interesting to note that these logs are happening every 30 minutes
> (munin queries 'openipmi -I open sensors' about once every 5 minutes).
>
> Jun 12 18:45:40 laura kernel: [ 2876.040688] BUG: soft lockup detected on CPU#2!
> Jun 12 19:15:54 laura kernel: [ 4688.141966] BUG: soft lockup detected on CPU#2!
> Jun 12 19:46:06 laura kernel: [ 6499.159958] BUG: soft lockup detected on CPU#3!
> Jun 12 20:16:17 laura kernel: [ 8309.394462] BUG: soft lockup detected on CPU#2!
> Jun 12 20:46:29 laura kernel: [10120.468415] BUG: soft lockup detected on CPU#3!
> Jun 12 21:16:42 laura kernel: [11931.614321] BUG: soft lockup detected on CPU#2!
> Jun 12 21:46:54 laura kernel: [13742.672285] BUG: soft lockup detected on CPU#2!
> [rebooted in between]
> Jun 14 00:57:55 laura kernel: [97133.246571] BUG: soft lockup detected on CPU#1!
> Jun 14 01:28:07 laura kernel: [98944.316467] BUG: soft lockup detected on CPU#3!
>
> Not having the ipmi_si module loaded means these warnings don't happen.
>
> I suspect this started happening when I upgraded the firmware of the
> iLo 100i BMC from 1.00 to version 1.23 (the other system's firmware
> wasn't upgraded yet). I'm still reporting it here on the chance that it
> actually is a kernel/ipmi_si module bug.
>
> Cheers,
> Peter
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peter at palfrader

Jun 19, 2006, 2:38 AM

Post #3 of 9 (6417 views)
Permalink
Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si [In reply to]

On Sun, 18 Jun 2006, Corey Minyard wrote:

> The IPMI driver spawns a low-priority thread that will poll the driver
> when it finds there is something to do. It's possible that the hardware
> is not setting things properly and is always telling the driver it has
> to do something. It's possible that the new version of the firmware
> enabled interrupts; I think there's a problem with the driver here; it
> should not really enable the kernel thread if interrupts are working.
> The driver should also probably call schedule() instead of udelay() in
> the kernel thread when a short timeout is requested by the state machine.
>
> In either situation, the kernel thread will sit there and spin, and if
> nothing else is scheduled for 10 seconds on that CPU you will get that
> warning. Can you check a few things for me?
>
> cat /proc/ipmi/0/si_stats and send me the output.

After running for about 35 minutes (and one instance of the soft lockup
warning):

| interrupts_enabled: 0
| short_timeouts: 8835
| long_timeouts: 263709
| timeout_restarts: 0
| idles: 793108
| interrupts: 0
| attentions: 0
| flag_fetches: 2137
| hosed_count: 0
| complete_transactions: 3516
| events: 0
| watchdog_pretimeouts: 0
| incoming_messages: 0


> If you do "top", is the kipmi0 always running?

Yes, running since the system started around 11:00:

| root 1331 0.8 0.0 0 0 ? SN 10:59 0:17 \_ [kipmi0]

> Is your IPMI interface KCS or SMIC? The IPMI driver should report this
> in the system log at startup.

It's KCS:

| laura:~# dmesg | grep -i ipmi
| [ 85.110244] ipmi message handler version 39.0
| [ 85.111491] ipmi device interface
| [ 85.127866] IPMI System Interface driver.
| [ 85.127929] ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address 0xca2, slave address 0x20, irq 0
| [ 85.274699] ipmi: Found new BMC (man_id: 0x000f85, prod_id: 0x0000, dev_id: 0x00)
| [ 85.274852] IPMI KCS interface initialized
| [ 85.284710] IPMI Watchdog: driver initialized

Cheers,
Peter
--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minyard at acm

Jun 22, 2006, 7:03 AM

Post #4 of 9 (6425 views)
Permalink
Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si [In reply to]

Peter, can you make a code change for me and try something out?

If possible, could you change the call to udelay(1) in the function
ipmi_thread() in drivers/char/ipmi_si_intf.c to be a call to schedule()
instead? I'm guessing that will fix this problem.

-Corey

Peter Palfrader wrote:
> On Sun, 18 Jun 2006, Corey Minyard wrote:
>
>
>> The IPMI driver spawns a low-priority thread that will poll the driver
>> when it finds there is something to do. It's possible that the hardware
>> is not setting things properly and is always telling the driver it has
>> to do something. It's possible that the new version of the firmware
>> enabled interrupts; I think there's a problem with the driver here; it
>> should not really enable the kernel thread if interrupts are working.
>> The driver should also probably call schedule() instead of udelay() in
>> the kernel thread when a short timeout is requested by the state machine.
>>
>> In either situation, the kernel thread will sit there and spin, and if
>> nothing else is scheduled for 10 seconds on that CPU you will get that
>> warning. Can you check a few things for me?
>>
>> cat /proc/ipmi/0/si_stats and send me the output.
>>
>
> After running for about 35 minutes (and one instance of the soft lockup
> warning):
>
> | interrupts_enabled: 0
> | short_timeouts: 8835
> | long_timeouts: 263709
> | timeout_restarts: 0
> | idles: 793108
> | interrupts: 0
> | attentions: 0
> | flag_fetches: 2137
> | hosed_count: 0
> | complete_transactions: 3516
> | events: 0
> | watchdog_pretimeouts: 0
> | incoming_messages: 0
>
>
>
>> If you do "top", is the kipmi0 always running?
>>
>
> Yes, running since the system started around 11:00:
>
> | root 1331 0.8 0.0 0 0 ? SN 10:59 0:17 \_ [kipmi0]
>
>
>> Is your IPMI interface KCS or SMIC? The IPMI driver should report this
>> in the system log at startup.
>>
>
> It's KCS:
>
> | laura:~# dmesg | grep -i ipmi
> | [ 85.110244] ipmi message handler version 39.0
> | [ 85.111491] ipmi device interface
> | [ 85.127866] IPMI System Interface driver.
> | [ 85.127929] ipmi_si: Trying SMBIOS-specified KCS state machine at I/O address 0xca2, slave address 0x20, irq 0
> | [ 85.274699] ipmi: Found new BMC (man_id: 0x000f85, prod_id: 0x0000, dev_id: 0x00)
> | [ 85.274852] IPMI KCS interface initialized
> | [ 85.284710] IPMI Watchdog: driver initialized
>
> Cheers,
> Peter
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


Matt_Domsch at dell

Jun 22, 2006, 7:56 PM

Post #5 of 9 (6424 views)
Permalink
Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si [In reply to]

On Thu, Jun 22, 2006 at 09:03:12AM -0500, Corey Minyard wrote:
> Peter, can you make a code change for me and try something out?
>
> If possible, could you change the call to udelay(1) in the function
> ipmi_thread() in drivers/char/ipmi_si_intf.c to be a call to schedule()
> instead? I'm guessing that will fix this problem.

won't that cause the thread to be scheduled out for at least one timer
tick (1-10ms depending on HZ), especially as it's at nice 19? This
will cause the commands to be quite slow, which was the primary reason
for the kthread here in the first place.

Thanks,
Matt

--
Matt Domsch
Software Architect
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minyard at acm

Jun 23, 2006, 7:55 AM

Post #6 of 9 (6433 views)
Permalink
Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si [In reply to]

Matt Domsch wrote:
> On Thu, Jun 22, 2006 at 09:03:12AM -0500, Corey Minyard wrote:
>
>> Peter, can you make a code change for me and try something out?
>>
>> If possible, could you change the call to udelay(1) in the function
>> ipmi_thread() in drivers/char/ipmi_si_intf.c to be a call to schedule()
>> instead? I'm guessing that will fix this problem.
>>
>
> won't that cause the thread to be scheduled out for at least one timer
> tick (1-10ms depending on HZ), especially as it's at nice 19? This
> will cause the commands to be quite slow, which was the primary reason
> for the kthread here in the first place.
>
I think that will only happen if there are other things to run that are
higher priority, and you want those to run, anyway. This doesn't affect
the processes priority like yield() would.

-Corey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


minyard at acm

Jun 23, 2006, 4:44 PM

Post #7 of 9 (6414 views)
Permalink
Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si [In reply to]

I also had a report from Matt that running the driver full-bore caused
the mouse to go haywire. I did some testing and the mouse didn't go
haywire, but my keyboard did. I changed the udelay(1) to schedule() and
kipmi0 is running at 100% as I type right now, and things seem to be
running smoothly. Testing the performance, I got 4.5ms per message for
a get device id, which was the same as I saw before the change. So I
think this change is a good idea.

I'll let Peter test it out when he gets his machines back, and if it all
looks good I'll do a patch.

Thanks,

-Corey

Matt Domsch wrote:
> On Thu, Jun 22, 2006 at 09:03:12AM -0500, Corey Minyard wrote:
>
>> Peter, can you make a code change for me and try something out?
>>
>> If possible, could you change the call to udelay(1) in the function
>> ipmi_thread() in drivers/char/ipmi_si_intf.c to be a call to schedule()
>> instead? I'm guessing that will fix this problem.
>>
>
> won't that cause the thread to be scheduled out for at least one timer
> tick (1-10ms depending on HZ), especially as it's at nice 19? This
> will cause the commands to be quite slow, which was the primary reason
> for the kthread here in the first place.
>
> Thanks,
> Matt
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


Matt_Domsch at dell

Jun 24, 2006, 12:37 PM

Post #8 of 9 (6415 views)
Permalink
Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si [In reply to]

On Fri, Jun 23, 2006 at 06:44:48PM -0500, Corey Minyard wrote:
> I also had a report from Matt that running the driver full-bore caused
> the mouse to go haywire. I did some testing and the mouse didn't go
> haywire, but my keyboard did. I changed the udelay(1) to schedule() and
> kipmi0 is running at 100% as I type right now, and things seem to be
> running smoothly. Testing the performance, I got 4.5ms per message for
> a get device id, which was the same as I saw before the change. So I
> think this change is a good idea.

I tried this here too, and while I'm not at the console to try the
mouse, the times for 'ipmitool sensor list' are still within the same
range (5-10 seconds, average is about 6) as with udelay(1) instead.
Running some load generators to keep the CPUs pegged didn't change
this result either. So I'm inclined to agree changing to schedule()
is the right approach.

Thanks,
Matt

--
Matt Domsch
Software Architect
Dell Linux Solutions linux.dell.com & www.dell.com/linux
Linux on Dell mailing lists @ http://lists.us.dell.com
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


peter at palfrader

Jun 24, 2006, 6:00 PM

Post #9 of 9 (6417 views)
Permalink
Re: [Openipmi-developer] BUG: soft lockup detected on CPU#1, ipmi_si [In reply to]

On Thu, 22 Jun 2006, Corey Minyard wrote:

> Peter, can you make a code change for me and try something out?
>
> If possible, could you change the call to udelay(1) in the function
> ipmi_thread() in drivers/char/ipmi_si_intf.c to be a call to schedule()
> instead? I'm guessing that will fix this problem.

That appears to work. I can still get sensor data using ipmitool, and I
haven't gotten any of the soft lockup warnings in 4 hours.

Thanks!
--
| .''`. ** Debian GNU/Linux **
Peter Palfrader | : :' : The universal
http://www.palfrader.org/ | `. `' Operating System
| `- http://www.debian.org/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.