Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux: Kernel

crash in kmem_cache_init

 

 

First page Previous page 1 2 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded


olaf at aepfle

Jan 15, 2008, 7:09 AM

Post #1 of 43 (2832 views)
Permalink
crash in kmem_cache_init

Current linus tree crashes in kmem_cache_init, as shown below. The
system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram.
Firmware is 240_332, 2.6.23 boots ok with the same config.

There is a series of mm related patches in 2.6.24-rc1:
commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it,

==> .git/BISECT_LOG <==
git-bisect start
# good: [0b8bc8b91cf6befea20fe78b90367ca7b61cfa0d] Linux 2.6.23
git-bisect good 0b8bc8b91cf6befea20fe78b90367ca7b61cfa0d
# bad: [cebdeed27b068dcc3e7c311d7ec0d9c33b5138c2] Linux 2.6.24-rc1
git-bisect bad cebdeed27b068dcc3e7c311d7ec0d9c33b5138c2
# good: [9ac52315d4cf5f561f36dabaf0720c00d3553162] sched: guest CPU accounting: add guest-CPU /proc/<pid>/stat fields
git-bisect good 9ac52315d4cf5f561f36dabaf0720c00d3553162
# bad: [b9ec0339d8e22cadf2d9d1b010b51dc53837dfb0] add consts where appropriate in fs/nls/Kconfig fs/nls/Makefile fs/nls/nls_ascii.c fs/nls/nls_base.c fs/nls/nls_cp1250.c fs/nls/nls_cp1251.c fs/nls/nls_cp1255.c fs/nls/nls_cp437.c fs/nls/nls_cp737.c fs/nls/nls_cp775.c fs/nls/nls_cp850.c fs/nls/nls_cp852.c fs/nls/nls_cp855.c fs/nls/nls_cp857.c fs/nls/nls_cp860.c fs/nls/nls_cp861.c fs/nls/nls_cp862.c fs/nls/nls_cp863.c fs/nls/nls_cp864.c fs/nls/nls_cp865.c fs/nls/nls_cp866.c fs/nls/nls_cp869.c fs/nls/nls_cp874.c fs/nls/nls_cp932.c fs/nls/nls_cp936.c fs/nls/nls_cp949.c fs/nls/nls_cp950.c fs/nls/nls_euc-jp.c fs/nls/nls_iso8859-1.c fs/nls/nls_iso8859-13.c fs/nls/nls_iso8859-14.c fs/nls/nls_iso8859-15.c fs/nls/nls_iso8859-2.c fs/nls/nls_iso8859-3.c fs/nls/nls_iso8859-4.c fs/nls/nls_iso8859-5.c fs/nls/nls_iso8859-6.c fs/nls/nls_iso8859-7.c fs/nls/nls_iso8859-9.c fs/nls/nls_koi8-r.c fs/nls/nls_koi8-ru.c fs/nls/nls_koi8-u.c fs/nls/nls_utf8.c
git-bisect bad b9ec0339d8e22cadf2d9d1b010b51dc53837dfb0
# bad: [78a26e25ce4837a03ac3b6c32cdae1958e547639] uml: separate timer initialization
git-bisect bad 78a26e25ce4837a03ac3b6c32cdae1958e547639
# good: [4acad72ded8e3f0211bd2a762e23c28229c61a51] [IPV6]: Consolidate the ip6_pol_route_(input|output) pair
git-bisect good 4acad72ded8e3f0211bd2a762e23c28229c61a51
# good: [64da82efae0d7b5f7c478021840fd329f76d965d] Add support for PCMCIA card Sierra WIreless AC850
git-bisect good 64da82efae0d7b5f7c478021840fd329f76d965d
# bad: [37b07e4163f7306aa735a6e250e8d22293e5b8de] memoryless nodes: fixup uses of node_online_map in generic code
git-bisect bad 37b07e4163f7306aa735a6e250e8d22293e5b8de
# good: [64649a58919e66ec21792dbb6c48cb3da22cbd7f] mm: trim more holes
git-bisect good 64649a58919e66ec21792dbb6c48cb3da22cbd7f
# good: [fb53b3094888be0cf8ddf052277654268904bdf5] smbfs: convert to new aops
git-bisect good fb53b3094888be0cf8ddf052277654268904bdf5
# good: [13808910713a98cc1159291e62cdfec92cc94d05] Memoryless nodes: Generic management of nodemasks for various purposes




.............
Please wait, loading kernel...
Allocated 00a00000 bytes for kernel @ 00200000
Elf64 kernel loaded...
OF stdout device is: /vdevice/vty [at] 3000000
Hypertas detected, assuming LPAR !
command line: panic=1 debug xmon=on
memory layout at init:
alloc_bottom : 0000000000ac1000
alloc_top : 0000000010000000
alloc_top_hi : 00000000da000000
rmo_top : 0000000010000000
ram_top : 00000000da000000
Looking for displays
found display : /pci [at] 80000002000000/pci@2/pci@1/display@0, opening ... done
instantiating rtas at 0x000000000f6a1000 ... done
0000000000000000 : boot cpu 0000000000000000
0000000000000002 : starting cpu hw idx 0000000000000002... done
0000000000000004 : starting cpu hw idx 0000000000000004... done
0000000000000006 : starting cpu hw idx 0000000000000006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000000cc2000 -> 0x0000000000cc34e4
Device tree struct 0x0000000000cc4000 -> 0x0000000000cd6000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #2 SMP Tue Jan 15 14:23:02 CET 2008
-----------------------------------------------------
ppc64_pft_size = 0x1c
physicalMemorySize = 0xda000000
htab_hash_mask = 0x1fffff
-----------------------------------------------------
Linux version 2.6.24-rc7-ppc64 (olaf [at] lingonberr) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #2 SMP Tue Jan 15 14:23:02 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
DMA 0 -> 892928
Normal 892928 -> 892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
1: 0 -> 892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on. Total pages: 880720
Policy zone: DMA
Kernel command line: panic=1 debug xmon=on
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.070000 MHz
time_init: processor frequency = 2197.800000 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
Unable to handle kernel paging request for data at address 0x00000040
Faulting instruction address: 0xc000000000437470
cpu 0x0: Vector: 300 (Data Access) at [c00000000075b830]
pc: c000000000437470: ._spin_lock+0x20/0x88
lr: c0000000000f78a8: .cache_grow+0x7c/0x338
sp: c00000000075bab0
msr: 8000000000009032
dar: 40
dsisr: 40000000
current = 0xc000000000665a50
paca = 0xc000000000666380
pid = 0, comm = swapper
enter ? for help
[c00000000075bb30] c0000000000f78a8 .cache_grow+0x7c/0x338
[c00000000075bbf0] c0000000000f7d04 .fallback_alloc+0x1a0/0x1f4
[c00000000075bca0] c0000000000f8544 .kmem_cache_alloc+0xec/0x150
[c00000000075bd40] c0000000000fb1c0 .kmem_cache_create+0x208/0x478
[c00000000075be20] c0000000005e670c .kmem_cache_init+0x218/0x4f4
[c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc
[c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0
0:mon>

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


olaf at aepfle

Jan 15, 2008, 7:58 AM

Post #2 of 43 (2783 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Tue, Jan 15, Olaf Hering wrote:

>
> Current linus tree crashes in kmem_cache_init, as shown below. The
> system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram.
> Firmware is 240_332, 2.6.23 boots ok with the same config.
>
> There is a series of mm related patches in 2.6.24-rc1:
> commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it,

2.6.24-rc6-mm1-ppc64 boots past this point, but crashes later.
Likely unrelated to the kmem_cache_init bug:

...
matroxfb: 640x480x8bpp (virtual: 640x26214)
matroxfb: framebuffer at 0x40178000000, mapped to 0xd000080080080000, size 33554432
Console: switching to colour frame buffer device 80x30
fb0: MATROX frame buffer device
matroxfb_crtc2: secondary head of fb0 was registered as fb1
vio_register_driver: driver hvc_console registering
HVSI: registered 0 devices
Generic RTC Driver v1.07
Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing disabled
pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh [at] kernel>)
input: Macintosh mouse button emulation as /devices/virtual/input/input0
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
ehci_hcd 0000:c8:01.2: EHCI Host Controller
ehci_hcd 0000:c8:01.2: new USB bus registered, assigned bus number 1
ehci_hcd 0000:c8:01.2: irq 85, io mem 0x400a0002000
ehci_hcd 0000:c8:01.2: USB 2.0 started, EHCI 1.00, driver 10 Dec 2004
usb usb1: configuration #1 chosen from 1 choice
hub 1-0:1.0: USB hub found
hub 1-0:1.0: 5 ports detected
Unable to handle kernel paging request for data at address 0x00000050
Faulting instruction address: 0xc0000000000fa1c4
cpu 0x7: Vector: 300 (Data Access) at [c0000000d82e7a70]
pc: c0000000000fa1c4: .cache_reap+0x74/0x29c
lr: c0000000000fa198: .cache_reap+0x48/0x29c
sp: c0000000d82e7cf0
msr: 8000000000009032
dar: 50
dsisr: 40000000
current = 0xc0000000d82d85c0
paca = 0xc000000000668e00
pid = 27, comm = events/7
enter ? for help
[c0000000d82e7cf0] c00000000070be98 vmstat_update+0x0/0x18 (unreliable)
[c0000000d82e7da0] c000000000092994 .run_workqueue+0x120/0x210
[c0000000d82e7e40] c000000000093bb8 .worker_thread+0xcc/0xf0
[c0000000d82e7f00] c000000000097b70 .kthread+0x78/0xc4
[c0000000d82e7f90] c00000000002ab74 .kernel_thread+0x4c/0x68
7:mon>
...
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


penberg at cs

Jan 17, 2008, 4:14 AM

Post #3 of 43 (2768 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

Hi Olaf,

[Adding Christoph as cc.]

On Jan 15, 2008 5:09 PM, Olaf Hering <olaf [at] aepfle> wrote:
> Current linus tree crashes in kmem_cache_init, as shown below. The
> system is a 8cpu 2.2GHz POWER5 system, model 9117-570, with 4GB ram.
> Firmware is 240_332, 2.6.23 boots ok with the same config.
>
> There is a series of mm related patches in 2.6.24-rc1:
> commit 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 seems to break it,

So that's the "Memoryless nodes: Slab support" patch that I think
cause a similar oops while ago.

> Unable to handle kernel paging request for data at address 0x00000040
> Faulting instruction address: 0xc000000000437470
> cpu 0x0: Vector: 300 (Data Access) at [c00000000075b830]
> pc: c000000000437470: ._spin_lock+0x20/0x88
> lr: c0000000000f78a8: .cache_grow+0x7c/0x338
> sp: c00000000075bab0
> msr: 8000000000009032
> dar: 40
> dsisr: 40000000
> current = 0xc000000000665a50
> paca = 0xc000000000666380
> pid = 0, comm = swapper
> enter ? for help
> [c00000000075bb30] c0000000000f78a8 .cache_grow+0x7c/0x338
> [c00000000075bbf0] c0000000000f7d04 .fallback_alloc+0x1a0/0x1f4
> [c00000000075bca0] c0000000000f8544 .kmem_cache_alloc+0xec/0x150
> [c00000000075bd40] c0000000000fb1c0 .kmem_cache_create+0x208/0x478
> [c00000000075be20] c0000000005e670c .kmem_cache_init+0x218/0x4f4
> [c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc
> [c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0

Looks similar to the one discussed on linux-mm ("[BUG] at
mm/slab.c:3320" thread). Christoph?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clameter at sgi

Jan 17, 2008, 6:30 AM

Post #4 of 43 (2771 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Thu, 17 Jan 2008, Pekka Enberg wrote:

> Looks similar to the one discussed on linux-mm ("[BUG] at
> mm/slab.c:3320" thread). Christoph?

Right. Try the latest version of the patch to fix it:

Index: linux-2.6/mm/slab.c
===================================================================
--- linux-2.6.orig/mm/slab.c 2008-01-03 12:26:42.000000000 -0800
+++ linux-2.6/mm/slab.c 2008-01-09 15:59:49.000000000 -0800
@@ -2977,7 +2977,10 @@ retry:
}
l3 = cachep->nodelists[node];

- BUG_ON(ac->avail > 0 || !l3);
+ if (!l3)
+ return NULL;
+
+ BUG_ON(ac->avail > 0);
spin_lock(&l3->list_lock);

/* See if we can refill from the shared array */
@@ -3224,7 +3227,7 @@ static void *alternate_node_alloc(struct
nid_alloc = cpuset_mem_spread_node();
else if (current->mempolicy)
nid_alloc = slab_node(current->mempolicy);
- if (nid_alloc != nid_here)
+ if (nid_alloc != nid_here && node_state(nid_alloc, N_NORMAL_MEMORY))
return ____cache_alloc_node(cachep, flags, nid_alloc);
return NULL;
}
@@ -3439,8 +3442,14 @@ __do_cache_alloc(struct kmem_cache *cach
* We may just have run out of memory on the local node.
* ____cache_alloc_node() knows how to locate memory on other nodes
*/
- if (!objp)
- objp = ____cache_alloc_node(cache, flags, numa_node_id());
+ if (!objp) {
+ int node_id = numa_node_id();
+ if (likely(cache->nodelists[node_id])) /* fast path */
+ objp = ____cache_alloc_node(cache, flags, node_id);
+ else /* this function can do good fallback */
+ objp = __cache_alloc_node(cache, flags, node_id,
+ __builtin_return_address(0));
+ }

out:
return objp;
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


olaf at aepfle

Jan 17, 2008, 10:12 AM

Post #5 of 43 (2765 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Thu, Jan 17, Christoph Lameter wrote:

> On Thu, 17 Jan 2008, Pekka Enberg wrote:
>
> > Looks similar to the one discussed on linux-mm ("[BUG] at
> > mm/slab.c:3320" thread). Christoph?
>
> Right. Try the latest version of the patch to fix it:

The patch does not help.

> Index: linux-2.6/mm/slab.c
> ===================================================================
> --- linux-2.6.orig/mm/slab.c 2008-01-03 12:26:42.000000000 -0800
> +++ linux-2.6/mm/slab.c 2008-01-09 15:59:49.000000000 -0800
> @@ -2977,7 +2977,10 @@ retry:
> }
> l3 = cachep->nodelists[node];
>
> - BUG_ON(ac->avail > 0 || !l3);
> + if (!l3)
> + return NULL;
> +
> + BUG_ON(ac->avail > 0);
> spin_lock(&l3->list_lock);
>
> /* See if we can refill from the shared array */

Is this hunk supposed to go into cache_grow()? There is no NULL check
for l3.

But if I do that, it does not help:

freeing bootmem node 1
Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
cache_grow(2781) swapper(0):c0,j4294937299 cp c0000000006a4fb8 !l3
Kernel panic - not syncing: kmem_cache_create(): failed to create slab `size-32'

Rebooting in 1 seconds..

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clameter at sgi

Jan 17, 2008, 10:58 AM

Post #6 of 43 (2769 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Thu, 17 Jan 2008, Olaf Hering wrote:

> The patch does not help.

Duh. We need to know more about the problem.

> > --- linux-2.6.orig/mm/slab.c 2008-01-03 12:26:42.000000000 -0800
> > +++ linux-2.6/mm/slab.c 2008-01-09 15:59:49.000000000 -0800
> > @@ -2977,7 +2977,10 @@ retry:
> > }
> > l3 = cachep->nodelists[node];
> >
> > - BUG_ON(ac->avail > 0 || !l3);
> > + if (!l3)
> > + return NULL;
> > +
> > + BUG_ON(ac->avail > 0);
> > spin_lock(&l3->list_lock);
> >
> > /* See if we can refill from the shared array */
>
> Is this hsupposed to go into cache_grow()? There is no NULL check
> for l3.

No its for cache_alloc_refill. cache_grow should only be called for
nodes that have memory. l3 is always used before cache_grow is called.

> freeing bootmem node 1
> Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
> cache_grow(2781) swapper(0):c0,j4294937299 cp c0000000006a4fb8 !l3

Is there more backtrace information? What function called cache_grow?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clameter at sgi

Jan 17, 2008, 11:03 AM

Post #7 of 43 (2769 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

Could you try Pekka's suggestion of reverting
04231b3002ac53f8a64a7bd142fde3fa4b6808c6 ?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


olaf at aepfle

Jan 17, 2008, 11:54 AM

Post #8 of 43 (2798 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Thu, Jan 17, Christoph Lameter wrote:

> > freeing bootmem node 1
> > Memory: 3496632k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
> > cache_grow(2781) swapper(0):c0,j4294937299 cp c0000000006a4fb8 !l3
>
> Is there more backtrace information? What function called cache_grow?

I just put a 'if (!l3) return 0;' into cache_grow, the backtrace is the
one from the initial report.
Reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 does not change
anything.


Since -mm boots further, what patch should I try?

The kernel boots on a different p570.
See attached dmesg. huckleberry boots, cranberry crashes.


--- huckleberry.suse.de-2.6.16.57-0.5-ppc64.txt 2008-01-17 20:48:18.510309000 +0100
+++ cranberry.suse.de-2.6.16.57-0.5-ppc64.txt 2008-01-17 20:48:09.425402000 +0100
@@ -1,56 +1,55 @@
Page orders: linear mapping = 24, others = 12
-Found initrd at 0xc000000002700000:0xc000000002a93000
+Found initrd at 0xc000000001300000:0xc0000000016e6c1e
Partition configured for 8 cpus.
Starting Linux PPC64 #1 SMP Wed Dec 5 09:02:21 UTC 2007
-----------------------------------------------------
-ppc64_pft_size = 0x1b
+ppc64_pft_size = 0x1c
ppc64_interrupt_controller = 0x2
platform = 0x101
-physicalMemorySize = 0x158000000
+physicalMemorySize = 0xda000000
ppc64_caches.dcache_line_size = 0x80
ppc64_caches.icache_line_size = 0x80
htab_address = 0x0000000000000000
-htab_hash_mask = 0xfffff
+htab_hash_mask = 0x1fffff
-----------------------------------------------------
[boot]0100 MM Init
[boot]0100 MM Init Done
Linux version 2.6.16.57-0.5-ppc64 (geeko [at] buildhos) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #1 SMP Wed Dec 5 09:02:21 UTC 2007
[boot]0012 Setup Arch
-Node 0 Memory: 0x0-0xb0000000
-Node 1 Memory: 0xb0000000-0x158000000
+Node 0 Memory:
+Node 1 Memory: 0x0-0xda000000
EEH: PCI Enhanced I/O Error Handling Enabled
-PPC64 nvram contains 7168 bytes
+PPC64 nvram contains 8192 bytes
Using dedicated idle loop
-On node 0 totalpages: 720896
- DMA zone: 720896 pages, LIFO batch:31
+On node 0 totalpages: 0
+ DMA zone: 0 pages, LIFO batch:0
DMA32 zone: 0 pages, LIFO batch:0
Normal zone: 0 pages, LIFO batch:0
HighMem zone: 0 pages, LIFO batch:0
-On node 1 totalpages: 688128
- DMA zone: 688128 pages, LIFO batch:31
+On node 1 totalpages: 892928
+ DMA zone: 892928 pages, LIFO batch:31
DMA32 zone: 0 pages, LIFO batch:0
Normal zone: 0 pages, LIFO batch:0
HighMem zone: 0 pages, LIFO batch:0
[boot]0015 Setup Done
Built 2 zonelists
-Kernel command line: root=/dev/disk/by-id/scsi-SIBM_ST373453LC_3HW1CPW500007445Q010-part5 xmon=on sysrq=1 quiet
+Kernel command line: root=/dev/system/root xmon=on sysrq=1 quiet
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 131072 bytes)
-time_init: decrementer frequency = 207.052000 MHz
-time_init: processor frequency = 1654.344000 MHz
+time_init: decrementer frequency = 275.070000 MHz
+time_init: processor frequency = 2197.800000 MHz
Console: colour dummy device 80x25
-Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
-Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
-freeing bootmem node 0
+Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
+Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
-Memory: 5524952k/5636096k available (4464k kernel code, 111144k reserved, 1992k data, 836k bss, 264k init)
-Calibrating delay loop... 413.69 BogoMIPS (lpj=2068480)
+Memory: 3494648k/3571712k available (4464k kernel code, 77064k reserved, 1992k data, 836k bss, 264k init)
+Calibrating delay loop... 548.86 BogoMIPS (lpj=2744320)
Security Framework v1.0.0 initialized
Mount-cache hash table entries: 256
checking if image is initramfs... it is
-Freeing initrd memory: 3660k freed
+Freeing initrd memory: 3995k freed
Processor 1 found.
Processor 2 found.
Processor 3 found.
@@ -61,7 +60,7 @@ Processor 7 found.
Brought up 8 CPUs
Node 0 CPUs: 0-3
Node 1 CPUs: 4-7
-migration_cost=41,0,4308
+migration_cost=38,0,3225
NET: Registered protocol family 16
PCI: Probing PCI hardware
IOMMU table initialized, virtual merging enabled
Attachments: huckleberry.suse.de-2.6.16.57-0.5-ppc64.txt (16.3 KB)


olaf at aepfle

Jan 17, 2008, 12:20 PM

Post #9 of 43 (2764 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Thu, Jan 17, Olaf Hering wrote:

> Since -mm boots further, what patch should I try?

rc8-mm1 crashes as well, l3 passed to reap_alien() is NULL.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


olaf at aepfle

Jan 17, 2008, 1:15 PM

Post #10 of 43 (2783 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Thu, Jan 17, Christoph Lameter wrote:

> On Thu, 17 Jan 2008, Olaf Hering wrote:
>
> > The patch does not help.
>
> Duh. We need to know more about the problem.

cache_grow is called from 3 places. The third call has cleared l3 for
some reason.


....
Allocated 00a00000 bytes for kernel @ 00200000
Elf64 kernel loaded...
OF stdout device is: /vdevice/vty [at] 3000000
Hypertas detected, assuming LPAR !
command line: xmon=on sysrq=1 debug panic=1
memory layout at init:
alloc_bottom : 0000000000ac1000
alloc_top : 0000000010000000
alloc_top_hi : 00000000da000000
rmo_top : 0000000010000000
ram_top : 00000000da000000
Looking for displays
found display : /pci [at] 80000002000000/pci@2/pci@1/display@0, opening ... done
instantiating rtas at 0x000000000f6a1000 ... done
0000000000000000 : boot cpu 0000000000000000
0000000000000002 : starting cpu hw idx 0000000000000002... done
0000000000000004 : starting cpu hw idx 0000000000000004... done
0000000000000006 : starting cpu hw idx 0000000000000006... done
copying OF device tree ...
Building dt strings...
Building dt structure...
Device tree strings 0x0000000000cc2000 -> 0x0000000000cc34e4
Device tree struct 0x0000000000cc4000 -> 0x0000000000cd6000
Calling quiesce ...
returning from prom_init
Partition configured for 8 cpus.
Starting Linux PPC64 #34 SMP Thu Jan 17 22:06:41 CET 2008
-----------------------------------------------------
ppc64_pft_size = 0x1c
physicalMemorySize = 0xda000000
htab_hash_mask = 0x1fffff
-----------------------------------------------------
Linux version 2.6.24-rc8-ppc64 (olaf [at] lingonberr) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #34 SMP Thu Jan 17 22:06:41 CET 2008
[boot]0012 Setup Arch
EEH: PCI Enhanced I/O Error Handling Enabled
PPC64 nvram contains 8192 bytes
Zone PFN ranges:
DMA 0 -> 892928
Normal 892928 -> 892928
Movable zone start PFN for each node
early_node_map[1] active PFN ranges
1: 0 -> 892928
Could not find start_pfn for node 0
[boot]0015 Setup Done
Built 2 zonelists in Node order, mobility grouping on. Total pages: 880720
Policy zone: DMA
Kernel command line: xmon=on sysrq=1 debug panic=1
[boot]0020 XICS Init
xics: no ISA interrupt controller
[boot]0021 XICS Done
PID hash table entries: 4096 (order: 12, 32768 bytes)
time_init: decrementer frequency = 275.070000 MHz
time_init: processor frequency = 2197.800000 MHz
clocksource: timebase mult[e8ab05] shift[22] registered
clockevent: decrementer mult[466a] shift[16] cpu[0]
Console: colour dummy device 80x25
console handover: boot [udbg-1] -> real [hvc0]
Dentry cache hash table entries: 524288 (order: 10, 4194304 bytes)
Inode-cache hash table entries: 262144 (order: 9, 2097152 bytes)
freeing bootmem node 1
Memory: 3496633k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 0 l3 c0000000005fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 1 l3 c0000000005fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 2 l3 c0000000005fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 3 l3 c0000000005fddf0
------------[ cut here ]------------
Badness at /home/olaf/kernel/git/linux-2.6.24-rc8/mm/slab.c:2779
NIP: c0000000000f78f4 LR: c0000000000f78e0 CTR: 80000000001af404
REGS: c00000000075b880 TRAP: 0700 Not tainted (2.6.24-rc8-ppc64)
MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24000022 XER: 00000001
TASK = c000000000665a50[0] 'swapper' THREAD: c000000000758000 CPU: 0
GPR00: 0000000000000004 c00000000075bb00 c0000000007544c0 0000000000000063
GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000
GPR08: ffffffffffffffff c0000000006a19a0 c0000000007a84b0 c0000000007a84a8
GPR12: 0000000000004000 c000000000666380 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000000200000
GPR20: 0000000000000000 00000000007fbd70 c00000000054f6c8 00000000000492d0
GPR24: 0000000000000000 c0000000006a4fb8 c0000000006a4fb8 c0000000005fdc80
GPR28: 0000000000000000 00000000000412d0 c0000000006e5b80 0000000000000004
NIP [c0000000000f78f4] .cache_grow+0xc8/0x39c
LR [c0000000000f78e0] .cache_grow+0xb4/0x39c
Call Trace:
[c00000000075bb00] [c0000000000f78e0] .cache_grow+0xb4/0x39c (unreliable)
[c00000000075bbd0] [c0000000000f82d0] .cache_alloc_refill+0x234/0x2c0
[c00000000075bc90] [c0000000000f842c] .kmem_cache_alloc+0xd0/0x294
[c00000000075bd40] [c0000000000fb4e8] .kmem_cache_create+0x208/0x478
[c00000000075be20] [c0000000005e670c] .kmem_cache_init+0x218/0x4f4
[c00000000075bee0] [c0000000005bf8ec] .start_kernel+0x2f8/0x3fc
[c00000000075bf90] [c000000000008590] .start_here_common+0x60/0xd0
Instruction dump:
e89e80e0 e92a0000 e80b0468 7f4ad378 fbe10070 f8010078 4bf85f01 60000000
381f0001 7c1f07b4 2f9f0004 409effac <0fe00000> 7b091f24 7d29d214 eb690468
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 0 l3 c0000000005fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 1 l3 c0000000005fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 2 l3 c0000000005fddf0
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 3 l3 c0000000005fddf0
------------[ cut here ]------------
Badness at /home/olaf/kernel/git/linux-2.6.24-rc8/mm/slab.c:2779
NIP: c0000000000f78f4 LR: c0000000000f78e0 CTR: 80000000001af404
REGS: c00000000075b890 TRAP: 0700 Not tainted (2.6.24-rc8-ppc64)
MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24000022 XER: 00000001
TASK = c000000000665a50[0] 'swapper' THREAD: c000000000758000 CPU: 0
GPR00: 0000000000000004 c00000000075bb10 c0000000007544c0 0000000000000063
GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000
GPR08: ffffffffffffffff c0000000006a19a0 c0000000007a84b0 c0000000007a84a8
GPR12: 0000000000004000 c000000000666380 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000000200000
GPR20: 0000000000000000 00000000007fbd70 c00000000054f6c8 00000000000492d0
GPR24: 0000000000000000 00000000000080d0 c0000000006a4fb8 c0000000006a4fb8
GPR28: 0000000000000000 00000000000412d0 c0000000006e5b80 0000000000000004
NIP [c0000000000f78f4] .cache_grow+0xc8/0x39c
LR [c0000000000f78e0] .cache_grow+0xb4/0x39c
Call Trace:
[c00000000075bb10] [c0000000000f78e0] .cache_grow+0xb4/0x39c (unreliable)
[c00000000075bbe0] [c0000000000f7f38] .____cache_alloc_node+0x17c/0x1e8
[c00000000075bc90] [c0000000000f846c] .kmem_cache_alloc+0x110/0x294
[c00000000075bd40] [c0000000000fb4e8] .kmem_cache_create+0x208/0x478
[c00000000075be20] [c0000000005e670c] .kmem_cache_init+0x218/0x4f4
[c00000000075bee0] [c0000000005bf8ec] .start_kernel+0x2f8/0x3fc
[c00000000075bf90] [c000000000008590] .start_here_common+0x60/0xd0
Instruction dump:
e89e80e0 e92a0000 e80b0468 7f4ad378 fbe10070 f8010078 4bf85f01 60000000
381f0001 7c1f07b4 2f9f0004 409effac <0fe00000> 7b091f24 7d29d214 eb690468
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 0 l3 0000000000000000
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 1 l3 0000000000000000
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 2 l3 0000000000000000
cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 3 l3 0000000000000000
------------[ cut here ]------------
Badness at /home/olaf/kernel/git/linux-2.6.24-rc8/mm/slab.c:2779
NIP: c0000000000f78f4 LR: c0000000000f78e0 CTR: 80000000001af404
REGS: c00000000075b890 TRAP: 0700 Not tainted (2.6.24-rc8-ppc64)
MSR: 8000000000029032 <EE,ME,IR,DR> CR: 24000022 XER: 00000001
TASK = c000000000665a50[0] 'swapper' THREAD: c000000000758000 CPU: 0
GPR00: 0000000000000004 c00000000075bb10 c0000000007544c0 0000000000000063
GPR04: 0000000000000001 0000000000000001 0000000000000000 0000000000000000
GPR08: ffffffffffffffff c0000000006a19a0 c0000000007a84b0 c0000000007a84a8
GPR12: 0000000000004000 c000000000666380 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 4000000000200000
GPR20: 0000000000000000 00000000007fbd70 c00000000054f6c8 00000000000080d0
GPR24: 0000000000000001 c0000000d9fe4b00 c0000000006a4fb8 0000000000000000
GPR28: c0000000d8000000 00000000000000d0 c0000000006e5b80 0000000000000004
NIP [c0000000000f78f4] .cache_grow+0xc8/0x39c
LR [c0000000000f78e0] .cache_grow+0xb4/0x39c
Call Trace:
[c00000000075bb10] [c0000000000f78e0] .cache_grow+0xb4/0x39c (unreliable)
[c00000000075bbe0] [c0000000000f7d68] .fallback_alloc+0x1a0/0x1f4
[c00000000075bc90] [c0000000000f846c] .kmem_cache_alloc+0x110/0x294
[c00000000075bd40] [c0000000000fb4e8] .kmem_cache_create+0x208/0x478
[c00000000075be20] [c0000000005e670c] .kmem_cache_init+0x218/0x4f4
[c00000000075bee0] [c0000000005bf8ec] .start_kernel+0x2f8/0x3fc
[c00000000075bf90] [c000000000008590] .start_here_common+0x60/0xd0
Instruction dump:
e89e80e0 e92a0000 e80b0468 7f4ad378 fbe10070 f8010078 4bf85f01 60000000
381f0001 7c1f07b4 2f9f0004 409effac <0fe00000> 7b091f24 7d29d214 eb690468
Unable to handle kernel paging request for data at address 0x00000040
Faulting instruction address: 0xc0000000004377b8
cpu 0x0: Vector: 300 (Data Access) at [c00000000075b810]
pc: c0000000004377b8: ._spin_lock+0x20/0x88
lr: c0000000000f790c: .cache_grow+0xe0/0x39c
sp: c00000000075ba90
msr: 8000000000009032
dar: 40
dsisr: 40000000
current = 0xc000000000665a50
paca = 0xc000000000666380
pid = 0, comm = swapper
enter ? for help
[c00000000075bb10] c0000000000f790c .cache_grow+0xe0/0x39c
[c00000000075bbe0] c0000000000f7d68 .fallback_alloc+0x1a0/0x1f4
[c00000000075bc90] c0000000000f846c .kmem_cache_alloc+0x110/0x294
[c00000000075bd40] c0000000000fb4e8 .kmem_cache_create+0x208/0x478
[c00000000075be20] c0000000005e670c .kmem_cache_init+0x218/0x4f4
[c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc
[c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0
0:mon>



--
Used patch:

Index: linux-2.6.24-rc8/include/linux/olh.h
===================================================================
--- /dev/null
+++ linux-2.6.24-rc8/include/linux/olh.h
@@ -0,0 +1,6 @@
+#ifndef __LINUX_OLH_H
+#define __LINUX_OLH_H
+#define olh(fmt,args ...) \
+ printk(KERN_DEBUG "%s(%u) %s(%u):c%u,j%lu " fmt "\n",__FUNCTION__,__LINE__,current->comm,current->pid,smp_processor_id(),jiffies,##args)
+#endif
+
Index: linux-2.6.24-rc8/mm/slab.c
===================================================================
--- linux-2.6.24-rc8.orig/mm/slab.c
+++ linux-2.6.24-rc8/mm/slab.c
@@ -110,6 +110,7 @@
#include <linux/fault-inject.h>
#include <linux/rtmutex.h>
#include <linux/reciprocal_div.h>
+#include <linux/olh.h>

#include <asm/cacheflush.h>
#include <asm/tlbflush.h>
@@ -2764,6 +2765,7 @@ static int cache_grow(struct kmem_cache
size_t offset;
gfp_t local_flags;
struct kmem_list3 *l3;
+ int i;

/*
* Be lazy and only check for valid flags here, keeping it out of the
@@ -2772,6 +2774,9 @@ static int cache_grow(struct kmem_cache
BUG_ON(flags & GFP_SLAB_BUG_MASK);
local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK);

+ for (i=0;i<4;i++)
+ olh("cachep %p nodeid %d l3 %p",cachep,i,cachep->nodelists[nodeid]);
+ WARN_ON(1);
/* Take the l3 list lock to change the colour_next on this node */
check_irq_off();
l3 = cachep->nodelists[nodeid];

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


olaf at aepfle

Jan 17, 2008, 10:56 PM

Post #11 of 43 (2769 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Thu, Jan 17, Olaf Hering wrote:

> On Thu, Jan 17, Christoph Lameter wrote:
>
> > On Thu, 17 Jan 2008, Olaf Hering wrote:
> >
> > > The patch does not help.
> >
> > Duh. We need to know more about the problem.
>
> cache_grow is called from 3 places. The third call has cleared l3 for
> some reason.

Typo in debug patch.

calls cache_grow with nodeid 0
> [c00000000075bbd0] [c0000000000f82d0] .cache_alloc_refill+0x234/0x2c0
calls cache_grow with nodeid 0
> [c00000000075bbe0] [c0000000000f7f38] .____cache_alloc_node+0x17c/0x1e8

calls cache_grow with nodeid 1
> [c00000000075bbe0] [c0000000000f7d68] .fallback_alloc+0x1a0/0x1f4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clameter at sgi

Jan 18, 2008, 10:42 AM

Post #12 of 43 (2764 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Fri, 18 Jan 2008, Olaf Hering wrote:

> calls cache_grow with nodeid 0
> > [c00000000075bbd0] [c0000000000f82d0] .cache_alloc_refill+0x234/0x2c0
> calls cache_grow with nodeid 0
> > [c00000000075bbe0] [c0000000000f7f38] .____cache_alloc_node+0x17c/0x1e8
>
> calls cache_grow with nodeid 1
> > [c00000000075bbe0] [c0000000000f7d68] .fallback_alloc+0x1a0/0x1f4

Hmmm... fallback_alloc should not be called during bootstrap.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clameter at sgi

Jan 18, 2008, 10:47 AM

Post #13 of 43 (2768 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Thu, 17 Jan 2008, Olaf Hering wrote:

> early_node_map[1] active PFN ranges
> 1: 0 -> 892928
> Could not find start_pfn for node 0

Corrupted min_pfn?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clameter at sgi

Jan 18, 2008, 10:51 AM

Post #14 of 43 (2800 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Thu, 17 Jan 2008, Olaf Hering wrote:

> Normal 892928 -> 892928
> Movable zone start PFN for each node
> early_node_map[1] active PFN ranges
> 1: 0 -> 892928
> Could not find start_pfn for node 0

We only have a single node that is node 1? And then we initialize nodes 0
to 3?

> Memory: 3496633k/3571712k available (6188k kernel code, 75080k reserved, 1324k data, 1220k bss, 304k init)
> cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 0 l3 c0000000005fddf0
> cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 1 l3 c0000000005fddf0
> cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 2 l3 c0000000005fddf0
> cache_grow(2778) swapper(0):c0,j4294937299 cachep c0000000006a4fb8 nodeid 3 l3 c0000000005fddf0

???
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mel at csn

Jan 18, 2008, 1:30 PM

Post #15 of 43 (2768 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On (18/01/08 10:47), Christoph Lameter didst pronounce:
> On Thu, 17 Jan 2008, Olaf Hering wrote:
>
> > early_node_map[1] active PFN ranges
> > 1: 0 -> 892928
> > Could not find start_pfn for node 0
>
> Corrupted min_pfn?
>

Doubtful. Node 0 has no memory but it is still being initialised.

Still, I looked closer at what is going on when that message gets
displayed and I see this in free_area_init_nodes()

for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
free_area_init_node(nid, pgdat, NULL,
find_min_pfn_for_node(nid), NULL);

/* Any memory on that node */
if (pgdat->node_present_pages)
node_set_state(nid, N_HIGH_MEMORY);
check_for_regular_memory(pgdat);
}

This "Any memory on that node" thing is new and it says if there is any
memory on the node, set N_HIGH_MEMORY. Fine I guess, I haven't tracked these
changes closely. It calls check_for_regular_memory() which looks like

static void check_for_regular_memory(pg_data_t *pgdat)
{
#ifdef CONFIG_HIGHMEM
enum zone_type zone_type;

for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
struct zone *zone = &pgdat->node_zones[zone_type];
if (zone->present_pages)
node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
}
#endif
}

i.e. go through the other zones and if any of them have memory, set
N_NORMAL_MEMORY. But... it only does this on CONFIG_HIGHMEM which on
PPC64 is not going to be set so N_NORMAL_MEMORY never gets set on
POWER.... That sounds bad.

mel [at] arnol:~/git/linux-2.6/mm$ grep -n N_NORMAL_MEMORY slab.c
1593: for_each_node_state(nid, N_NORMAL_MEMORY) {
1971: for_each_node_state(node, N_NORMAL_MEMORY) {
2102: for_each_node_state(node, N_NORMAL_MEMORY) {
3818: for_each_node_state(node, N_NORMAL_MEMORY) {

and one of them is in kmem_cache_init(). That seems very significant.
Christoph, can you think of possibilities of where N_NORMAL_MEMORY not
being set would cause trouble for slab?

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clameter at sgi

Jan 18, 2008, 1:43 PM

Post #16 of 43 (2767 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Fri, 18 Jan 2008, Mel Gorman wrote:

> static void check_for_regular_memory(pg_data_t *pgdat)
> {
> #ifdef CONFIG_HIGHMEM
> enum zone_type zone_type;
>
> for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
> struct zone *zone = &pgdat->node_zones[zone_type];
> if (zone->present_pages)
> node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
> }
> #endif
> }
>
> i.e. go through the other zones and if any of them have memory, set
> N_NORMAL_MEMORY. But... it only does this on CONFIG_HIGHMEM which on
> PPC64 is not going to be set so N_NORMAL_MEMORY never gets set on
> POWER.... That sounds bad.

Argh. We may need to do a

node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY) in the !HIGHMEM case.

> and one of them is in kmem_cache_init(). That seems very significant.
> Christoph, can you think of possibilities of where N_NORMAL_MEMORY not
> being set would cause trouble for slab?

Yes. That results in the per node structures not being created and thus l3
== NULL. Explains our failures.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clameter at sgi

Jan 18, 2008, 2:16 PM

Post #17 of 43 (2763 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

Could you try this patch?

Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support HIGHMEM

It seems that we only scan through zones to set N_NORMAL_MEMORY only if
CONFIG_HIGHMEM and CONFIG_NUMA are set. We need to set N_NORMAL_MEMORY
in the !CONFIG_HIGHMEM case.

Signed-off-by: Christoph Lameter <clameter [at] sgi>

Index: linux-2.6/mm/page_alloc.c
===================================================================
--- linux-2.6.orig/mm/page_alloc.c 2008-01-18 14:08:41.000000000 -0800
+++ linux-2.6/mm/page_alloc.c 2008-01-18 14:13:34.000000000 -0800
@@ -3812,7 +3812,6 @@ restart:
/* Any regular memory on that node ? */
static void check_for_regular_memory(pg_data_t *pgdat)
{
-#ifdef CONFIG_HIGHMEM
enum zone_type zone_type;

for (zone_type = 0; zone_type <= ZONE_NORMAL; zone_type++) {
@@ -3820,7 +3819,6 @@ static void check_for_regular_memory(pg_
if (zone->present_pages)
node_set_state(zone_to_nid(zone), N_NORMAL_MEMORY);
}
-#endif
}

/**

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


nish.aravamudan at gmail

Jan 18, 2008, 2:19 PM

Post #18 of 43 (2763 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On 1/18/08, Christoph Lameter <clameter [at] sgi> wrote:
> Could you try this patch?
>
> Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support
> HIGHMEM
>
> It seems that we only scan through zones to set N_NORMAL_MEMORY only if
> CONFIG_HIGHMEM and CONFIG_NUMA are set. We need to set
> N_NORMAL_MEMORY
> in the !CONFIG_HIGHMEM case.

I'm testing this exact patch right now on the machine Mel saw the issues with.

Thanks,
Nish
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clameter at sgi

Jan 18, 2008, 2:38 PM

Post #19 of 43 (2756 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Fri, 18 Jan 2008, Christoph Lameter wrote:

> Memoryless nodes: Set N_NORMAL_MEMORY for a node if we do not support HIGHMEM

If !CONFIG_HIGHMEM then

enum node_states {
#ifdef CONFIG_HIGHMEM
N_HIGH_MEMORY, /* The node has regular or high memory */
#else
N_HIGH_MEMORY = N_NORMAL_MEMORY,
#endif

So
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
free_area_init_node(nid, pgdat, NULL,
find_min_pfn_for_node(nid), NULL);

/* Any memory on that node */
if (pgdat->node_present_pages)
node_set_state(nid, N_HIGH_MEMORY);
^^^ sets N_NORMAL_MEMORY
check_for_regular_memory(pgdat);
}

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


olaf at aepfle

Jan 18, 2008, 2:57 PM

Post #20 of 43 (2763 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Fri, Jan 18, Christoph Lameter wrote:

> Could you try this patch?

Does not help, same crash.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clameter at sgi

Jan 18, 2008, 8:55 PM

Post #21 of 43 (2757 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Fri, 18 Jan 2008, Olaf Hering wrote:

> calls cache_grow with nodeid 0
> > [c00000000075bbd0] [c0000000000f82d0] .cache_alloc_refill+0x234/0x2c0
> calls cache_grow with nodeid 0
> > [c00000000075bbe0] [c0000000000f7f38] .____cache_alloc_node+0x17c/0x1e8
>
> calls cache_grow with nodeid 1
> > [c00000000075bbe0] [c0000000000f7d68] .fallback_alloc+0x1a0/0x1f4

Okay that makes sense. You have no node 0 with normal memory but the node
assigned to the executing processor is zero (correct?). Thus it needs to
fallback to node 1 and that is not possible during bootstrap. You need to
run kmem_cache_init() on a cpu on a processor with memory.

Or we need to revert the patch which would allocate control
structures again for all online nodes regardless if they have memory or
not.

Does reverting 04231b3002ac53f8a64a7bd142fde3fa4b6808c6 change the
situation? (However, we tried this on the other thread without success).

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clameter at sgi

Jan 18, 2008, 8:56 PM

Post #22 of 43 (2755 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Thu, 17 Jan 2008, Olaf Hering wrote:

> On Thu, Jan 17, Olaf Hering wrote:
>
> > Since -mm boots further, what patch should I try?
>
> rc8-mm1 crashes as well, l3 passed to reap_alien() is NULL.

Sigh. It looks like we need alien cache structures in some cases for nodes
that have no memory. We must allocate structures for all nodes regardless
if they have allocatable memory or not.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mel at csn

Jan 22, 2008, 11:54 AM

Post #23 of 43 (2742 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On (18/01/08 23:57), Olaf Hering didst pronounce:
> On Fri, Jan 18, Christoph Lameter wrote:
>
> > Could you try this patch?
>
> Does not help, same crash.
>

Hi Olaf,

It was suggested this problem was the same as another slab-related boot problem
that was fixed for 2.6.24 by reverting a change. This fix can be found at
http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch
. Can you please check on your machine if it fixes your problem?

I am 99.9999% it will *not* fix your problem because there was two bugs, not
one as previously believed. On two test machines here, this kmem_cache_init
problem still happens even with the revert which fixed a third machine. I
was delayed in testing because these boxen unavailable from Friday until
yesterday evening (a stellar display of timing). It was missed on TKO because
it was SLAB-specific and those machines were testing SLUB. I found that the
patch below was necessary to fix the problem.

Olaf, please confirm whether you need the patch below as well as the
revert to make your machine boot.

Christoph/Pekka, this patch is papering over the problem and something
more fundamental may be going wrong. The crash occurs because l3 is NULL
and the cache is kmem_cache so this is early in the boot process. It is
selecting l3 based on node 2 which is correct in terms of available memory
but it initialises the lists on node 0 because that is the node the CPUs are
located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
parts of the log for seeing the memoryless nodes in relation to CPUs is;

early_node_map[1] active PFN ranges
2: 0 -> 1048576
Processor 1 found.
clockevent: decrementer mult[3cf1] shift[16] cpu[2]
Processor 2 found.
clockevent: decrementer mult[3cf1] shift[16] cpu[3]
Processor 3 found.
Brought up 4 CPUs
Node 0 CPUs: 0-3
Node 2 CPUs:

Can you see a better solution than this?

====
Recent changes to how slab operates mean a situation can occur on systems
with memoryless nodes whereby the nodeid used when growing the slab does
not map to the correct kmem_list3. The following patch adds the necessary
check to the indicated preferred nodeid and if it is bogus, use numa_node_id() instead.

Signed-off-by: Mel Gorman <mel [at] csn>

---
mm/slab.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff -rup -X /usr/src/patchset-0.6/bin//dontdiff linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c
--- linux-2.6.24-rc8-005-revert-memoryless-slab/mm/slab.c 2008-01-22 17:46:32.000000000 +0000
+++ linux-2.6.24-rc8-010_handle_missing_l3/mm/slab.c 2008-01-22 18:42:53.000000000 +0000
@@ -2775,6 +2775,11 @@ static int cache_grow(struct kmem_cache
/* Take the l3 list lock to change the colour_next on this node */
check_irq_off();
l3 = cachep->nodelists[nodeid];
+ if (!l3) {
+ nodeid = numa_node_id();
+ l3 = cachep->nodelists[nodeid];
+ }
+ BUG_ON(!l3);
spin_lock(&l3->list_lock);

/* Get colour for the slab, and cal the next value. */
@@ -3317,6 +3322,10 @@ static void *____cache_alloc_node(struct
int x;

l3 = cachep->nodelists[nodeid];
+ if (!l3) {
+ nodeid = numa_node_id();
+ l3 = cachep->nodelists[nodeid];
+ }
BUG_ON(!l3);

retry:


--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


clameter at sgi

Jan 22, 2008, 12:11 PM

Post #24 of 43 (2754 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On Tue, 22 Jan 2008, Mel Gorman wrote:

> Christoph/Pekka, this patch is papering over the problem and something
> more fundamental may be going wrong. The crash occurs because l3 is NULL
> and the cache is kmem_cache so this is early in the boot process. It is
> selecting l3 based on node 2 which is correct in terms of available memory
> but it initialises the lists on node 0 because that is the node the CPUs are
> located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
> parts of the log for seeing the memoryless nodes in relation to CPUs is;

Would it be possible to run the bootstrap on a cpu that has a
node with memory associated to it? I believe we had the same situation
last year when GFP_THISNODE was introduced?

After you reverted the slab memoryless node patch there should be per node
structures created for node 0 unless the node is marked offline. Is it? If
so then you are booting a cpu that is associated with an offline node.

> Can you see a better solution than this?

Well this means that bootstrap will work by introducing foreign objects
into the per cpu queue (should only hold per cpu objects). They will
later be consumed and then the queues will contain the right objects so
the effect of the patch is minimal.

I thought we fixed the similar situation last year by dropping
GFP_THISNODE for some allocations?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


mel at csn

Jan 22, 2008, 1:26 PM

Post #25 of 43 (2742 views)
Permalink
Re: crash in kmem_cache_init [In reply to]

On (22/01/08 12:11), Christoph Lameter didst pronounce:
> On Tue, 22 Jan 2008, Mel Gorman wrote:
>
> > Christoph/Pekka, this patch is papering over the problem and something
> > more fundamental may be going wrong. The crash occurs because l3 is NULL
> > and the cache is kmem_cache so this is early in the boot process. It is
> > selecting l3 based on node 2 which is correct in terms of available memory
> > but it initialises the lists on node 0 because that is the node the CPUs are
> > located. Hence later it uses an uninitialised nodelists and BLAM. Relevant
> > parts of the log for seeing the memoryless nodes in relation to CPUs is;
>
> Would it be possible to run the bootstrap on a cpu that has a
> node with memory associated to it?

Not in the way the machine is currently configured. All the CPUs appear to
be on a node with no memory. It's best to assume I cannot get the machine
reconfigured (which just hides the bug anyway). Physically, it's thousands
of miles away so I can't do the work. I can get lab support to do the job
but that will take a fair while and at the end of the day, it doesn't tell
us a lot. We know that other PPC64 machines work so it's not a general problem.

> I believe we had the same situation
> last year when GFP_THISNODE was introduced?
>

It feels vaguely familiar but I don't recall the details in sufficient detail
to recognise if this is the same problem or not.

> After you reverted the slab memoryless node patch there should be per node
> structures created for node 0 unless the node is marked offline. Is it? If
> so then you are booting a cpu that is associated with an offline node.
>

I'll roll a patch that prints out the online states before startup and
see what it looks like.

> > Can you see a better solution than this?
>
> Well this means that bootstrap will work by introducing foreign objects
> into the per cpu queue (should only hold per cpu objects). They will
> later be consumed and then the queues will contain the right objects so
> the effect of the patch is minimal.
>

By minimal, do you mean that you expect it to break in some other
respect later or minimal as in "this is bad but should not have no
adverse impact".

> I thought we fixed the similar situation last year by dropping
> GFP_THISNODE for some allocations?
>

Whatever this was a problem fixed in the past or not, it's broken again now
:( . It's possible that there is a __GFP_THISNODE that can be dropped early
at boot-time that would also fix this problem in a way that doesn't
affect runtime (like altering cache_grow in my patch does).

--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo [at] vger
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

First page Previous page 1 2 Next page Last page  View All Linux kernel RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.