Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

Error in drbd

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


1gellert at example

Jul 11, 2000, 4:29 AM

Post #1 of 5 (422 views)
Permalink
Error in drbd

Hello,

I usr drbd together with "heartbeat" to create a
highly available server. When I switch off the
power of the primary server, the secondary
takes over the service (and the drbd-mounted
disk). This works fine.

If I start the primary server again, I got (once)
the following message:

kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000005c
kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
kernel: *pde = 00000000
kernel: Oops: 0000
kernel: CPU: 0
kernel: EIP: 0010:[<d005ed5d>]
kernel: EFLAGS: 00010246
kernel: eax: 00000000 ebx: c460ff94 ecx: 0000f200 edx: 00000024
kernel: esi: 00000058 edi: c460ff94 ebp: 00000000 esp: c460fee4
kernel: ds: 0018 es: 0018 ss: 0018
kernel: Process drbdd_0 (pid: 530, process nr: 19, stackpage=c460f000)
kernel: Stack: c460ff94 00000000 00000297 00000000 d006107d 00000058 0000f200 00000001
kernel: c460e000 00000000 c46bfa8c c460e000 00002b00 00000000 01000000 c46c2834
kernel: 00000000 0000000c 00000000 c45f4500 c46bf800 00000000 00004000 c46c2838
kernel: Call Trace: [<d006107d>] [<d0061b59>] [<d005e29e>] [kernel_thread+35/47]
kernel: Code: 8a 46 04 8d 14 40 c1 e2 03 29 c2 c1 e2 03 29 c2 a1 30 4c 06
kernel: drbd0: ack timeout detected!
kernel: drbd0: could not send signal

Any suggestions?
I use a 2.2.14-5 kernel (redhat 6.2) on a Pentium III (500MHz),
Intel BX440 Chipset. The mounted disk is a scsi disk, the adapter
is an adaptec 2940U2. drbd is checked out from cvs (version
0.5.7).

If I install drbd, depmod returns the message:
depmod: *** Unresolved symbols in /lib/modules/2.2.14-5.0/block/drbd.o
maybe this could be related to the error above?

And the last one: During the fschk I get messages like
set_blocksize: b_count 1, dev drbd(43,0), block 983040!
Everything works fine though. I'm just curious...

Thanx for any help... Olaf


e9525415 at example

Jul 11, 2000, 5:14 AM

Post #2 of 5 (395 views)
Permalink
Re: Error in drbd [In reply to]

On Tue, 11 Jul 2000, Olaf Gellert wrote:

> Hello,
>
> I usr drbd together with "heartbeat" to create a
> highly available server. When I switch off the
> power of the primary server, the secondary
> takes over the service (and the drbd-mounted
> disk). This works fine.
>
> If I start the primary server again, I got (once)
> the following message:
>
> kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000005c
> kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
> kernel: *pde = 00000000
> kernel: Oops: 0000
> kernel: CPU: 0
> kernel: EIP: 0010:[<d005ed5d>]
> kernel: EFLAGS: 00010246
> kernel: eax: 00000000 ebx: c460ff94 ecx: 0000f200 edx: 00000024
> kernel: esi: 00000058 edi: c460ff94 ebp: 00000000 esp: c460fee4
> kernel: ds: 0018 es: 0018 ss: 0018
> kernel: Process drbdd_0 (pid: 530, process nr: 19, stackpage=c460f000)
> kernel: Stack: c460ff94 00000000 00000297 00000000 d006107d 00000058 0000f200 00000001
> kernel: c460e000 00000000 c46bfa8c c460e000 00002b00 00000000 01000000 c46c2834
> kernel: 00000000 0000000c 00000000 c45f4500 c46bf800 00000000 00004000 c46c2838
> kernel: Call Trace: [<d006107d>] [<d0061b59>] [<d005e29e>] [kernel_thread+35/47]
> kernel: Code: 8a 46 04 8d 14 40 c1 e2 03 29 c2 c1 e2 03 29 c2 a1 30 4c 06
> kernel: drbd0: ack timeout detected!
> kernel: drbd0: could not send signal
>

Which server oopsed ? The running one, or the server which
was about to rejoin the cluster ?

Could you run ksymoops on the oops output?

> Any suggestions?
> I use a 2.2.14-5 kernel (redhat 6.2) on a Pentium III (500MHz),
> Intel BX440 Chipset. The mounted disk is a scsi disk, the adapter
> is an adaptec 2940U2. drbd is checked out from cvs (version
> 0.5.7).
>
> If I install drbd, depmod returns the message:
> depmod: *** Unresolved symbols in /lib/modules/2.2.14-5.0/block/drbd.o
> maybe this could be related to the error above?

Well, everybody has this message, nobody knows why :)

>
> And the last one: During the fschk I get messages like
> set_blocksize: b_count 1, dev drbd(43,0), block 983040!
> Everything works fine though. I'm just curious...

Please verify that you use revision 1.83 of drbd.c.

cvs log drbd/drbd.c

-Philipp


1gellert at example

Jul 11, 2000, 7:16 AM

Post #3 of 5 (395 views)
Permalink
Re: Error in drbd [In reply to]

Ok, here is the ksymoops output:
Jul 11 01:16:47 ha1 kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000005c
Jul 11 01:16:47 ha1 kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
Jul 11 01:16:47 ha1 kernel: *pde = 00000000
Jul 11 01:16:47 ha1 kernel: Oops: 0000
Jul 11 01:16:47 ha1 kernel: CPU: 0
Jul 11 01:16:47 ha1 kernel: EIP: 0010:[<d005ed5d>]
Using defaults from ksymoops -t elf32-i386 -a i386
Jul 11 01:16:47 ha1 kernel: EFLAGS: 00010246
Jul 11 01:16:47 ha1 kernel: eax: 00000000 ebx: c460ff94 ecx: 0000f200 edx: 00000024
Jul 11 01:16:47 ha1 kernel: esi: 00000058 edi: c460ff94 ebp: 00000000 esp: c460fee4
Jul 11 01:16:47 ha1 kernel: ds: 0018 es: 0018 ss: 0018
Jul 11 01:16:47 ha1 kernel: Process drbdd_0 (pid: 530, process nr: 19, stackpage=c460f000)
Jul 11 01:16:47 ha1 kernel: Stack: c460ff94 00000000 00000297 00000000 d006107d 00000058 0000f200 00000001
Jul 11 01:16:47 ha1 kernel: c460e000 00000000 c46bfa8c c460e000 00002b00 00000000 01000000 c46c2834
Jul 11 01:16:47 ha1 kernel: 00000000 0000000c 00000000 c45f4500 c46bf800 00000000 00004000 c46c2838
Jul 11 01:16:47 ha1 kernel: Call Trace: [<d006107d>] [<d0061b59>] [<d005e29e>] [kernel_thread+35/47]
Jul 11 01:16:47 ha1 kernel: Code: 8a 46 04 8d 14 40 c1 e2 03 29 c2 c1 e2 03 29 c2 a1 30 4c 06

>>EIP; d005ed5d <[lockd]__module_parm_nlm_timeout+fed/12dc> <=====
Trace; d006107d <[drbd]drbd_init+33d/368>
Trace; d0061b59 <[drbd]drbdd+45d/1418>
Trace; d005e29e <[lockd]__module_parm_nlm_timeout+52e/12dc>
Code; d005ed5d <[lockd]__module_parm_nlm_timeout+fed/12dc>
00000000 <_EIP>:
Code; d005ed5d <[lockd]__module_parm_nlm_timeout+fed/12dc> <=====
0: 8a 46 04 mov 0x4(%esi),%al <=====
Code; d005ed60 <[lockd]__module_parm_nlm_timeout+ff0/12dc>
3: 8d 14 40 lea (%eax,%eax,2),%edx
Code; d005ed63 <[lockd]__module_parm_nlm_timeout+ff3/12dc>
6: c1 e2 03 shl $0x3,%edx
Code; d005ed66 <[lockd]__module_parm_nlm_timeout+ff6/12dc>
9: 29 c2 sub %eax,%edx
Code; d005ed68 <[lockd]__module_parm_nlm_timeout+ff8/12dc>
b: c1 e2 03 shl $0x3,%edx
Code; d005ed6b <[lockd]__module_parm_nlm_timeout+ffb/12dc>
e: 29 c2 sub %eax,%edx
Code; d005ed6d <[lockd]__module_parm_nlm_timeout+ffd/12dc>
10: a1 30 4c 06 00 mov 0x64c30,%eax


(I did this after a reboot, I hope, the output is
reliable. It seems reasonable to me...)

The other error messages:
> > And the last one: During the fschk I get messages like
> > set_blocksize: b_count 1, dev drbd(43,0), block 983040!
> > Everything works fine though. I'm just curious...
> Please verify that you use revision 1.83 of drbd.c.
I use revision 1.83.

Thanks in advance. Olaf


philipp at example

Jul 16, 2000, 12:53 PM

Post #4 of 5 (398 views)
Permalink
Re: Error in drbd [In reply to]

Olaf,

You said that this happens if a node rejoins the cluster.
The still unanswered question is, on which node happens the crash ?
Is it the node, which is running the service, or is the node which
is trying to join the cluster?

Your initial message:
>I usr drbd together with "heartbeat" to create a
>highly available server. When I switch off the
>power of the primary server, the secondary
>takes over the service (and the drbd-mounted
>disk). This works fine.

>If I start the primary server again, I got (once)
[...]

Do you got the OOPS on the primary, or on the secondary server ?

-Philipp

>Ok, here is the ksymoops output:
>Jul 11 01:16:47 ha1 kernel: Unable to handle kernel NULL pointer dereference at virtual address 0000005c
>Jul 11 01:16:47 ha1 kernel: current->tss.cr3 = 00101000, %cr3 = 00101000
>Jul 11 01:16:47 ha1 kernel: *pde = 00000000
>Jul 11 01:16:47 ha1 kernel: Oops: 0000
>Jul 11 01:16:47 ha1 kernel: CPU: 0
>Jul 11 01:16:47 ha1 kernel: EIP: 0010:[<d005ed5d>]
>Using defaults from ksymoops -t elf32-i386 -a i386
>Jul 11 01:16:47 ha1 kernel: EFLAGS: 00010246
>Jul 11 01:16:47 ha1 kernel: eax: 00000000 ebx: c460ff94 ecx: 0000f200 edx: 00000024
>Jul 11 01:16:47 ha1 kernel: esi: 00000058 edi: c460ff94 ebp: 00000000 esp: c460fee4
>Jul 11 01:16:47 ha1 kernel: ds: 0018 es: 0018 ss: 0018
>Jul 11 01:16:47 ha1 kernel: Process drbdd_0 (pid: 530, process nr: 19, stackpage=c460f000)
>Jul 11 01:16:47 ha1 kernel: Stack: c460ff94 00000000 00000297 00000000 d006107d 00000058 0000f200 00000001
>Jul 11 01:16:47 ha1 kernel: c460e000 00000000 c46bfa8c c460e000 00002b00 00000000 01000000 c46c2834
>Jul 11 01:16:47 ha1 kernel: 00000000 0000000c 00000000 c45f4500 c46bf800 00000000 00004000 c46c2838
>Jul 11 01:16:47 ha1 kernel: Call Trace: [<d006107d>] [<d0061b59>] [<d005e29e>] [kernel_thread+35/47]
>Jul 11 01:16:47 ha1 kernel: Code: 8a 46 04 8d 14 40 c1 e2 03 29 c2 c1 e2 03 29 c2 a1 30 4c 06
>
>>>EIP; d005ed5d <[lockd]__module_parm_nlm_timeout+fed/12dc> <=====
>Trace; d006107d <[drbd]drbd_init+33d/368>
>Trace; d0061b59 <[drbd]drbdd+45d/1418>
>Trace; d005e29e <[lockd]__module_parm_nlm_timeout+52e/12dc>
>Code; d005ed5d <[lockd]__module_parm_nlm_timeout+fed/12dc>
>00000000 <_EIP>:
>Code; d005ed5d <[lockd]__module_parm_nlm_timeout+fed/12dc> <=====
> 0: 8a 46 04 mov 0x4(%esi),%al <=====
>Code; d005ed60 <[lockd]__module_parm_nlm_timeout+ff0/12dc>
> 3: 8d 14 40 lea (%eax,%eax,2),%edx
>Code; d005ed63 <[lockd]__module_parm_nlm_timeout+ff3/12dc>
> 6: c1 e2 03 shl $0x3,%edx
>Code; d005ed66 <[lockd]__module_parm_nlm_timeout+ff6/12dc>
> 9: 29 c2 sub %eax,%edx
>Code; d005ed68 <[lockd]__module_parm_nlm_timeout+ff8/12dc>
> b: c1 e2 03 shl $0x3,%edx
>Code; d005ed6b <[lockd]__module_parm_nlm_timeout+ffb/12dc>
> e: 29 c2 sub %eax,%edx
>Code; d005ed6d <[lockd]__module_parm_nlm_timeout+ffd/12dc>
> 10: a1 30 4c 06 00 mov 0x64c30,%eax
>
>
>(I did this after a reboot, I hope, the output is
>reliable. It seems reasonable to me...)
>
>The other error messages:
>> > And the last one: During the fschk I get messages like
>> > set_blocksize: b_count 1, dev drbd(43,0), block 983040!
>> > Everything works fine though. I'm just curious...
>> Please verify that you use revision 1.83 of drbd.c.
>I use revision 1.83.
>
>Thanks in advance. Olaf


1gellert at example

Jul 18, 2000, 12:36 AM

Post #5 of 5 (397 views)
Permalink
Re: Error in drbd [In reply to]

Hello Philipp,

The crash happens on the node that comes up and want's to rejoin the
cluster. Sorry, I simply forgot to answer this question...

Olaf



Olaf Gellert
gellert [at] example _ - __o
Universitaet Hamburg, FB Informatik _- _`\<,_
http://www.stud.uni-hamburg.de/users/cbx/ - (_)/ (_)
-------------------------------------------------------------------------------
As an adolescent I aspired to lasting fame, I craved factual certainty, and
I thirsted for a meaningful vision of human life -- so I became a scientist.
This is like becoming an archbishop so you can meet girls.
-- Matt Cartmill
-------------------------------------------------------------------------------

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.