
philipp at example
Jul 16, 2000, 12:53 PM
Post #4 of 5
(398 views)
Permalink
|
Olaf, You said that this happens if a node rejoins the cluster. The still unanswered question is, on which node happens the crash ? Is it the node, which is running the service, or is the node which is trying to join the cluster? Your initial message: >I usr drbd together with "heartbeat" to create a >highly available server. When I switch off the >power of the primary server, the secondary >takes over the service (and the drbd-mounted >disk). This works fine. >If I start the primary server again, I got (once) [...] Do you got the OOPS on the primary, or on the secondary server ? -Philipp >Ok, here is the ksymoops output: >Jul 11 01:16:47 ha1 kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000005c >Jul 11 01:16:47 ha1 kernel: current->tss.cr3 = 00101000, %cr3 = 00101000 >Jul 11 01:16:47 ha1 kernel: *pde = 00000000 >Jul 11 01:16:47 ha1 kernel: Oops: 0000 >Jul 11 01:16:47 ha1 kernel: CPU: 0 >Jul 11 01:16:47 ha1 kernel: EIP: 0010:[<d005ed5d>] >Using defaults from ksymoops -t elf32-i386 -a i386 >Jul 11 01:16:47 ha1 kernel: EFLAGS: 00010246 >Jul 11 01:16:47 ha1 kernel: eax: 00000000 ebx: c460ff94 ecx: 0000f200 edx: 00000024 >Jul 11 01:16:47 ha1 kernel: esi: 00000058 edi: c460ff94 ebp: 00000000 esp: c460fee4 >Jul 11 01:16:47 ha1 kernel: ds: 0018 es: 0018 ss: 0018 >Jul 11 01:16:47 ha1 kernel: Process drbdd_0 (pid: 530, process nr: 19, stackpage=c460f000) >Jul 11 01:16:47 ha1 kernel: Stack: c460ff94 00000000 00000297 00000000 d006107d 00000058 0000f200 00000001 >Jul 11 01:16:47 ha1 kernel: c460e000 00000000 c46bfa8c c460e000 00002b00 00000000 01000000 c46c2834 >Jul 11 01:16:47 ha1 kernel: 00000000 0000000c 00000000 c45f4500 c46bf800 00000000 00004000 c46c2838 >Jul 11 01:16:47 ha1 kernel: Call Trace: [<d006107d>] [<d0061b59>] [<d005e29e>] [kernel_thread+35/47] >Jul 11 01:16:47 ha1 kernel: Code: 8a 46 04 8d 14 40 c1 e2 03 29 c2 c1 e2 03 29 c2 a1 30 4c 06 > >>>EIP; d005ed5d <[lockd]__module_parm_nlm_timeout+fed/12dc> <===== >Trace; d006107d <[drbd]drbd_init+33d/368> >Trace; d0061b59 <[drbd]drbdd+45d/1418> >Trace; d005e29e <[lockd]__module_parm_nlm_timeout+52e/12dc> >Code; d005ed5d <[lockd]__module_parm_nlm_timeout+fed/12dc> >00000000 <_EIP>: >Code; d005ed5d <[lockd]__module_parm_nlm_timeout+fed/12dc> <===== > 0: 8a 46 04 mov 0x4(%esi),%al <===== >Code; d005ed60 <[lockd]__module_parm_nlm_timeout+ff0/12dc> > 3: 8d 14 40 lea (%eax,%eax,2),%edx >Code; d005ed63 <[lockd]__module_parm_nlm_timeout+ff3/12dc> > 6: c1 e2 03 shl $0x3,%edx >Code; d005ed66 <[lockd]__module_parm_nlm_timeout+ff6/12dc> > 9: 29 c2 sub %eax,%edx >Code; d005ed68 <[lockd]__module_parm_nlm_timeout+ff8/12dc> > b: c1 e2 03 shl $0x3,%edx >Code; d005ed6b <[lockd]__module_parm_nlm_timeout+ffb/12dc> > e: 29 c2 sub %eax,%edx >Code; d005ed6d <[lockd]__module_parm_nlm_timeout+ffd/12dc> > 10: a1 30 4c 06 00 mov 0x64c30,%eax > > >(I did this after a reboot, I hope, the output is >reliable. It seems reasonable to me...) > >The other error messages: >> > And the last one: During the fschk I get messages like >> > set_blocksize: b_count 1, dev drbd(43,0), block 983040! >> > Everything works fine though. I'm just curious... >> Please verify that you use revision 1.83 of drbd.c. >I use revision 1.83. > >Thanks in advance. Olaf
|