Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

Warning in worker_enter_idle()

 

 

Linux kernel RSS feed   Index | Next | Previous | View Threaded


paulmck at linux

May 6, 2012, 8:38 AM

Post #1 of 4 (59 views)
Permalink
Warning in worker_enter_idle()

Hello!

The worker_enter_idle() is complaining that there all workers are idle,
but that there is work remaining:

/* sanity check nr_running */
WARN_ON_ONCE(gcwq->nr_workers == gcwq->nr_idle &&
atomic_read(get_gcwq_nr_running(gcwq->cpu)));

This is running on Power, .config attached. I must confess that I don't
see any sort of synchronization or memory barriers that would keep the
counts straight on a weakly ordered system. Or is there some clever
design constraint that prevents worker_enter_idle() from accessing other
CPUs' gcwq_nr_running variables?

Thanx, Paul

[ 1773.881934] ------------[ cut here ]------------
[ 1773.881954] WARNING: at kernel/workqueue.c:1215
[ 1773.881963] Modules linked in: rcutorture ipv6 dm_mirror dm_region_hash dm_log ses enclosure ehea ext3 jbd mbcache sg sd_mod crc_t10dif ipr radeon drm_kms_helper ttm drm hwmon i2c_algo_bit i2c_core power_supply dm_mod [last unloaded: scsi_wait_scan]
[ 1773.882068] NIP: c00000000009b0f8 LR: c00000000009b124 CTR: c0000000000621b0
[ 1773.882083] REGS: c0000003cacf7b00 TRAP: 0700 Not tainted (3.4.0-rc4-autokern1)
[ 1773.882095] MSR: 8000000000029032 <SF,EE,ME,IR,DR,RI> CR: 28000044 XER: 00000020
[ 1773.882135] SOFTE: 0
[ 1773.882142] CFAR: c00000000009b068
[ 1773.882151] TASK = c0000003c6fdcff0[30348] 'kworker/5:1' THREAD: c0000003cacf4000 CPU: 5
[ 1773.882166] GPR00: 0000000000000001 c0000003cacf7d80 c000000000e7e7e8 0000000000000000
[ 1773.882193] GPR04: 0000000000000000 c0000001d609f798 0000000000000000 0000000000000000
[ 1773.882220] GPR08: 0000000000000000 c000000000dc6c04 0000000000000002 c0000000011b5b00
[ 1773.882247] GPR12: 0000000000000002 c00000000f550f00 0000000001a5fa78 00000000020c9400
[ 1773.882274] GPR16: 0000000003300000 000000000021eeef 000000000021f19b 000000000021f064
[ 1773.882301] GPR20: 0000000000220000 c0000003c94cfbd0 0000000000000000 0000000000000000
[ 1773.882327] GPR24: 0000000000000001 0000000000000001 0000000000000000 c0000000011a4208
[ 1773.882354] GPR28: c000000000ec2580 c0000003c9fe4900 c000000000e09430 c0000000011a4200
[ 1773.882391] NIP [c00000000009b0f8] .worker_enter_idle+0x158/0x1b0
[ 1773.882404] LR [c00000000009b124] .worker_enter_idle+0x184/0x1b0
[ 1773.882415] Call Trace:
[ 1773.882424] [c0000003cacf7d80] [c00000000009b124] .worker_enter_idle+0x184/0x1b0 (unreliable)
[ 1773.882444] [c0000003cacf7e10] [c00000000009fb08] .worker_thread+0x238/0x460
[ 1773.882462] [c0000003cacf7ed0] [c0000000000a992c] .kthread+0xbc/0xd0
[ 1773.882479] [c0000003cacf7f90] [c0000000000216e4] .kernel_thread+0x54/0x70
[ 1773.882493] Instruction dump:
[ 1773.882502] e97e8010 78091f24 e81e8008 7d2b482a 7c004a14 7c0b0378 800b0000 2f800000
[ 1773.882536] 419eff88 e93e8030 88090003 68000001 <0b000000> 2fa00000 41feff70 38000001
[ 1773.882569] ---[ end trace 5e41f99db128c10a ]---
Attachments: .config (84.2 KB)


tj at kernel

May 7, 2012, 12:40 PM

Post #2 of 4 (56 views)
Permalink
Re: Warning in worker_enter_idle() [In reply to]

Hello, Paul.

On Sun, May 06, 2012 at 08:38:14AM -0700, Paul E. McKenney wrote:
> Hello!
>
> The worker_enter_idle() is complaining that there all workers are idle,
> but that there is work remaining:
>
> /* sanity check nr_running */
> WARN_ON_ONCE(gcwq->nr_workers == gcwq->nr_idle &&
> atomic_read(get_gcwq_nr_running(gcwq->cpu)));
>
> This is running on Power, .config attached. I must confess that I don't
> see any sort of synchronization or memory barriers that would keep the
> counts straight on a weakly ordered system. Or is there some clever
> design constraint that prevents worker_enter_idle() from accessing other
> CPUs' gcwq_nr_running variables?

Workers are tied to global cpu workqueues (gcwqs). There's one gcwq
per cpu and one unbound one, so yeah, workers access these counters
under gcwq->lock. Atomic accesses to nr_running is depended on only
while nr_idle is adjusted under gcwq->lock, so there shouldn't be a
discrepancy there. Can you reproduce the problem? What was going on
the system? Was CPU being brought up or down?

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


paulmck at linux

May 7, 2012, 1:55 PM

Post #3 of 4 (53 views)
Permalink
Re: Warning in worker_enter_idle() [In reply to]

On Mon, May 07, 2012 at 12:40:42PM -0700, Tejun Heo wrote:
> Hello, Paul.
>
> On Sun, May 06, 2012 at 08:38:14AM -0700, Paul E. McKenney wrote:
> > Hello!
> >
> > The worker_enter_idle() is complaining that there all workers are idle,
> > but that there is work remaining:
> >
> > /* sanity check nr_running */
> > WARN_ON_ONCE(gcwq->nr_workers == gcwq->nr_idle &&
> > atomic_read(get_gcwq_nr_running(gcwq->cpu)));
> >
> > This is running on Power, .config attached. I must confess that I don't
> > see any sort of synchronization or memory barriers that would keep the
> > counts straight on a weakly ordered system. Or is there some clever
> > design constraint that prevents worker_enter_idle() from accessing other
> > CPUs' gcwq_nr_running variables?
>
> Workers are tied to global cpu workqueues (gcwqs). There's one gcwq
> per cpu and one unbound one, so yeah, workers access these counters
> under gcwq->lock. Atomic accesses to nr_running is depended on only
> while nr_idle is adjusted under gcwq->lock, so there shouldn't be a
> discrepancy there. Can you reproduce the problem? What was going on
> the system? Was CPU being brought up or down?

I was running rcutorture with CPU hotplug operations. It has happened
a couple of times on the .config that I attached, but never under any
of the other 13 .configs that I run.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


tj at kernel

May 7, 2012, 2:34 PM

Post #4 of 4 (55 views)
Permalink
Re: Warning in worker_enter_idle() [In reply to]

Hello,

On Mon, May 07, 2012 at 01:55:16PM -0700, Paul E. McKenney wrote:
> I was running rcutorture with CPU hotplug operations. It has happened
> a couple of times on the .config that I attached, but never under any
> of the other 13 .configs that I run.

Ah, okay. The invariant breaks when CPU detaches. The WARN_ON()
probably just needs a condition to disable it if the gcwq is detached.
I'll look into it tomorrow.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.