Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

BUG: unable to handle kernel NULL pointer dereference at 0...037 (drbd-8.3.2)

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


PieterB at gewis

Sep 21, 2009, 3:06 AM

Post #1 of 7 (1179 views)
Permalink
BUG: unable to handle kernel NULL pointer dereference at 0...037 (drbd-8.3.2)

The dmesg log lists an error when I try to start drbd 8.3.2 on node 1
(guest) with a ocfs2 filesystem. It might be caused because of timeserver
issues on that node. I installed ntp recently on the guest.

My configuration:
Node 0: Host: Ubuntu 9.04 Server with 2.6.31 kernel (QEMU 0.91 (kvm-82))
Node 1: Virtual Machine Guest: Ubuntu 9.04 Server with 2.6.31 kernel
Both nodes run DRBD 8.3.2

See attachement for dmesg log.

I couldn't find a bug tracker, so I posting the bug report here.

Regards,

Pieter
Attachments: drbd-oops.txt (4.25 KB)


lars.ellenberg at linbit

Sep 21, 2009, 4:35 AM

Post #2 of 7 (1109 views)
Permalink
Re: BUG: unable to handle kernel NULL pointer dereference at 0...037 (drbd-8.3.2) [In reply to]

On Mon, Sep 21, 2009 at 12:06:41PM +0200, PieterB wrote:
> The dmesg log lists an error when I try to start drbd 8.3.2 on node 1
> (guest) with a ocfs2 filesystem. It might be caused because of timeserver
> issues on that node. I installed ntp recently on the guest.

very unlikely that this has anything to do with "timeserver issues".

> My configuration:
> Node 0: Host: Ubuntu 9.04 Server with 2.6.31 kernel (QEMU 0.91 (kvm-82))
> Node 1: Virtual Machine Guest: Ubuntu 9.04 Server with 2.6.31 kernel
> Both nodes run DRBD 8.3.2
>
> See attachement for dmesg log.
>
> I couldn't find a bug tracker, so I posting the bug report here.

> [12098.000136] Clocksource tsc unstable (delta = -86462353 ns)
> [12175.298598] JBD: Clearing recovery information on journal
> [12175.373555] kjournald starting. Commit interval 5 seconds
> [12213.202240] drbd: initialised. Version: 8.3.2 (api:88/proto:86-90)
> [12213.202242] drbd: GIT-hash: dd7985327f146f33b86d4bff5ca8c94234ce840e build by root [at] v, 2009-09-21 11:44:56
> [12213.202244] drbd: registered as block device major 147
> [12213.202246] drbd: minor_table @ 0xffff88001f8b4f00
> [12217.220456] block drbd0: Starting worker thread (from cqueue [2647])
> [12217.220662] block drbd0: disk( Diskless -> Attaching )
> [12217.287278] block drbd0: Found 64 transactions (3897 active extents) in activity log.
> [12217.287282] block drbd0: Method to ensure write ordering: barrier
> [12217.287286] block drbd0: max_segment_size ( = BIO size ) = 32768
> [12217.287289] block drbd0: drbd_bm_resize called with capacity == 1953460304
> [12217.320116] block drbd0: resync bitmap: bits=244182538 words=3815353
> [12217.320121] block drbd0: size = 931 GB (976730152 KB)
> [12217.822971] block drbd0: recounting of set bits took additional 3 jiffies
> [12217.822983] block drbd0: 0 KB (0 bits) marked out-of-sync by on disk bit-map.
> [12217.822990] block drbd0: disk( Attaching -> UpToDate )
> [12217.846811] block drbd0: conn( StandAlone -> Unconnected )
> [12217.846834] block drbd0: Starting receiver thread (from drbd0_worker [2651])
> [12217.850216] block drbd0: receiver (re)started
> [12217.850235] BUG: unable to handle kernel NULL pointer dereference at 0000000000000037
> [12217.850973] IP: [<ffffffffa01bbff0>] drbd_connect+0x0/0x790 [drbd]
> [12217.851432] PGD 1bd59067 PUD 1f0aa067 PMD 0
> [12217.851931] Oops: 0002 [#1] SMP
> [12217.852349] last sysfs file: /sys/module/drbd/parameters/cn_idx
> [12217.852795] CPU 0
> [12217.853130] Modules linked in: drbd ipmi_msghandler nfsd lockd nfs_acl auth_rpcgss sunrpc exportfs video output ocfs2_dlmfs ocfs2_dlm lp parport psmouse serio_raw virtio_console pcspkr i2c_piix4 virtio_balloon 8139too floppy virtio_pci virtio_ring virtio 8139cp mii
> [12217.860023] Pid: 2679, comm: drbd0_receiver Not tainted 2.6.31-pb3 #1
> [12217.860023] RIP: 0010:[<ffffffffa01bbff0>] [<ffffffffa01bbff0>] drbd_connect+0x0/0x790 [drbd]


something that Oopses at offset 0 of some function is most likely a compile problem,
i.e. I suggest that your kernel and drbd module do not fit together.

> [12217.860023] RSP: 0018:ffff8800053c7ed8 EFLAGS: 00010282
> [12217.860023] RAX: 0000000000000037 RBX: ffff88001f8a5c98 RCX: 000000000001ffff
> [12217.860023] RDX: ffff880001c44000 RSI: 0000000000000046 RDI: ffff88001f030000
> [12217.860023] RBP: ffff8800053c7ef0 R08: 0000000000000000 R09: 0000000000000001
> [12217.860023] R10: 000000000000000a R11: 0000000000000000 R12: ffff88001f030000
> [12217.860023] R13: ffff88001f030000 R14: ffffffffa01e5112 R15: ffffffffa01e5112
> [12217.860023] FS: 00007f24b15e56f0(0000) GS:ffff880001c44000(0000) knlGS:0000000000000000
> [12217.860023] CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
> [12217.860023] CR2: 0000000000000037 CR3: 000000001f17e000 CR4: 00000000000006b0
> [12217.860023] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [12217.860023] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> [12217.860023] Process drbd0_receiver (pid: 2679, threadinfo ffff8800053c6000, task ffff88001f0ead60)
> [12217.860023] Stack:
> [12217.860023] ffffffffa01be972 ffff88001f974410 ffff88001f0305f0 ffff8800053c7f40
> [12217.860023] <0> ffffffffa01cc32d ffff880001c59440 ffff88001f030600 ffff880016563400
> [12217.860023] <0> ffff88001f974410 0000000000000a5b ffff88001f030000 ffff88001f0305f0
> [12217.860023] Call Trace:
> [12217.860023] [<ffffffffa01be972>] ? drbdd_init+0x62/0x180 [drbd]
> [12217.860023] [<ffffffffa01cc32d>] drbd_thread_setup+0xed/0x280 [drbd]
> [12217.860023] [<ffffffff81012fca>] child_rip+0xa/0x20
> [12217.860023] [<ffffffffa01cc240>] ? drbd_thread_setup+0x0/0x280 [drbd]
> [12217.860023] [<ffffffff81012fc0>] ? child_rip+0x0/0x20
> [12217.860023] Code: 00 00 00 00 00 00 00 00 00 00 00 d0 fb 1d a0 00 00 00 71 17 48 e1 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 <00> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> [12217.860023] RIP [<ffffffffa01bbff0>] drbd_connect+0x0/0x790 [drbd]
> [12217.860023] RSP <ffff8800053c7ed8>
> [12217.860023] CR2: 0000000000000037
> [12217.946436] ---[ end trace 5b87bcb43da8b955 ]---
> [12232.979600] block drbd0: role( Secondary -> Primary )
> [12232.991728] block drbd0: Creating new current UUID


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


it-beratung at thomasreinhold

Sep 24, 2009, 12:15 PM

Post #3 of 7 (1089 views)
Permalink
Re: BUG: unable to handle kernel NULL pointer dereference at 0...037 (drbd-8.3.2) [In reply to]

Am 21.09.2009 um 13:35 schrieb Lars Ellenberg:
>
>
> something that Oopses at offset 0 of some function is most likely a
> compile problem,
> i.e. I suggest that your kernel and drbd module do not fit together.



Hi,

I have experienced a similar looking problem on a cluster system also
using Ubuntu:

Systems (identically configured): Ubuntu 9.04 Server x86_64 server
with Kernel 2.6.28-15-server (standard distro kernel)
DRBD: as included in Ubuntu Server (8.3.0)
Usage: KVM host (Software Raid 1 -> DRBD -> LVM -> KVM VMs)


The problem appeared on the secondary directly after issuing "drbdadm
verify all":

> Sep 23 17:04:58 cluster2 kernel: [ 1579.818149] drbd0:
> conn( Connected -> VerifyT )
> Sep 23 17:04:59 cluster2 kernel: [ 1579.820914] drbd1:
> conn( Connected -> VerifyT )
> Sep 23 17:04:59 cluster2 kernel: [ 1579.830819] BUG: unable to
> handle kernel NULL pointer dereference at 0000000000000030
> Sep 23 17:04:59 cluster2 kernel: [ 1579.830877] IP:
> [<ffffffffa0105373>] w_e_end_ov_req+0x43/0x1a0 [drbd]
> Sep 23 17:04:59 cluster2 kernel: [ 1579.830924] PGD 0
> Sep 23 17:04:59 cluster2 kernel: [ 1579.830949] Oops: 0000 [#1] SMP
> Sep 23 17:04:59 cluster2 kernel: [ 1579.830979] last sysfs file: /
> sys/module/drbd/parameters/cn_idx
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831011] Dumping ftrace buffer:
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831037] (ftrace buffer
> empty)
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831063] CPU 0
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831087] Modules linked in:
> tun r8169 mii ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables
> bridge stp kvm_intel kvm video output input_polldev drbd
> sha1_generic lp parport iTCO_wdt iTCO_vendor_support pcspkr raid10
> multipath linear fbcon tileblit font bitblit softcursor r8168
> raid456 async_xor async_memcpy async_tx xor raid1 raid0 3w_9xxx
> 3w_xxxx
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831340] Pid: 3286, comm:
> drbd0_worker Not tainted 2.6.28-15-server #49-Ubuntu
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831389] RIP: 0010:
> [<ffffffffa0105373>] [<ffffffffa0105373>] w_e_end_ov_req+0x43/0x1a0
> [drbd]
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831453] RSP:
> 0018:ffff88023acc9e60 EFLAGS: 00010202
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831482] RAX:
> 0000000000000000 RBX: ffff880210115720 RCX: ffff88023acc9ecc
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831515] RDX:
> 0000000000000000 RSI: 00000000000000d0 RDI: ffff88023c8cb000
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831548] RBP:
> ffff88023acc9e80 R08: 0000000000000004 R09: 0000000000000001
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831581] R10:
> 0000000000000000 R11: 00000000ffffffff R12: ffff88023c8cb000
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831614] R13:
> ffff880210115720 R14: ffff88023c8cb118 R15: 0000000000000000
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831648] FS:
> 0000000000000000(0000) GS:ffffffff80a9a000(0000) knlGS:
> 0000000000000000
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831698] CS: 0010 DS: 0018
> ES: 0018 CR0: 000000008005003b
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831728] CR2:
> 0000000000000030 CR3: 0000000000201000 CR4: 00000000000026a0
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831761] DR0:
> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831795] DR3:
> 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831828] Process drbd0_worker
> (pid: 3286, threadinfo ffff88023acc8000, task ffff88023bcdd980)
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831879] Stack:
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831901] ffff880210115720
> ffff88023c8cb000 ffff88023acc9eb0 ffff88023c8cb118
> Sep 23 17:04:59 cluster2 kernel: [ 1579.831943] ffff88023acc9f00
> ffffffffa01046e7 ffff88023c8cb620 ffff88023c8cb100
> Sep 23 17:04:59 cluster2 kernel: [ 1579.832004] ffff88023c8cb120
> ffff88023c8cb0f0 ffff88023acc9eb0 ffff88023acc9eb0
> Sep 23 17:04:59 cluster2 kernel: [ 1579.832082] Call Trace:
> Sep 23 17:04:59 cluster2 kernel: [ 1579.832106]
> [<ffffffffa01046e7>] drbd_worker+0x1e7/0x440 [drbd]
> Sep 23 17:04:59 cluster2 kernel: [ 1579.832147]
> [<ffffffff806997ad>] ? schedule_timeout+0x5d/0xd0
> Sep 23 17:04:59 cluster2 kernel: [ 1579.832185]
> [<ffffffffa011de03>] drbd_thread_setup+0xe3/0x200 [drbd]
> Sep 23 17:04:59 cluster2 kernel: [ 1579.832230]
> [<ffffffff80213979>] child_rip+0xa/0x11
> Sep 23 17:04:59 cluster2 kernel: [ 1579.832264]
> [<ffffffffa011dd20>] ? drbd_thread_setup+0x0/0x200 [drbd]
> Sep 23 17:04:59 cluster2 kernel: [ 1579.832308]
> [<ffffffff8021396f>] ? child_rip+0x0/0x11
> Sep 23 17:04:59 cluster2 kernel: [ 1579.832340] Code: 89 1c 24 4c 89
> 74 24 18 49 89 f5 0f 85 ef 00 00 00 48 8b 46 20 f6 40 18 01 0f 84 d9
> 00 00 00 48 8b 87 c8 05 00 00 be d0 00 00 00 <44> 8b 70 30 49 63 fe
> e8 c1 de 1d e0 48 85 c0 48 89 c3 0f 84 b5
> Sep 23 17:04:59 cluster2 kernel: [ 1579.832595] RIP
> [<ffffffffa0105373>] w_e_end_ov_req+0x43/0x1a0 [drbd]
> Sep 23 17:04:59 cluster2 kernel: [ 1579.832639] RSP
> <ffff88023acc9e60>
> Sep 23 17:04:59 cluster2 kernel: [ 1579.832664] CR2: 0000000000000030
> Sep 23 17:04:59 cluster2 kernel: [ 1579.833056] ---[ end trace
> bfafe5c2861ff3a8 ]---



I had done some pretty intensive testing (Bonnie++, dd, etc.) before
and everything was ok...could not reproduce the problem afterwards,
either.


What bothers me (apart from the bug) is the following:

- Your changelog says that compatibility with Linux 2.6.27, 2.6.28
and 2.6.29 was added in 8.3.1, while Ubuntu Server with kernel 2.6.28
ships with DRBD 8.3.0. Btw., the DRBD module appear to has been
compiled in January, while the kernel was compiled in August.


Did the Ubuntu guys really screw it up that bloody? And if so, am I
the only one using the DRBD version included in Ubuntu server??


Thanks for your time,

Thomas


_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lars.ellenberg at linbit

Sep 25, 2009, 2:36 AM

Post #4 of 7 (1076 views)
Permalink
Re: BUG: unable to handle kernel NULL pointer dereference at 0...037 (drbd-8.3.2) [In reply to]

On Thu, Sep 24, 2009 at 09:15:40PM +0200, Thomas Reinhold wrote:
>
> Am 21.09.2009 um 13:35 schrieb Lars Ellenberg:
>>
>>
>> something that Oopses at offset 0 of some function is most likely a
>> compile problem,
>> i.e. I suggest that your kernel and drbd module do not fit together.
>
>
>
> Hi,
>
> I have experienced a similar looking problem on a cluster system also
> using Ubuntu:

"Similar looking"
as in "just some oops and hexnumbers and other gobledeegook"

Not similar looking at all in the details of the oops.

PieterB had some NULL dereference at drbd_connect+0x0,
at a zero offset to a function.
Which is extremely unlikely to be a coding bug that seldom triggers.

Your oops is ...:

> Systems (identically configured): Ubuntu 9.04 Server x86_64 server with
> Kernel 2.6.28-15-server (standard distro kernel)
> DRBD: as included in Ubuntu Server (8.3.0)

> Usage: KVM host (Software Raid 1 -> DRBD -> LVM -> KVM VMs)
>
>
> The problem appeared on the secondary directly after issuing "drbdadm
> verify all":
>
>> Sep 23 17:04:58 cluster2 kernel: [ 1579.818149] drbd0: conn( Connected
>> -> VerifyT )
>> Sep 23 17:04:59 cluster2 kernel: [ 1579.820914] drbd1: conn( Connected
>> -> VerifyT )
>> Sep 23 17:04:59 cluster2 kernel: [ 1579.830819] BUG: unable to handle
>> kernel NULL pointer dereference at 0000000000000030
>> Sep 23 17:04:59 cluster2 kernel: [ 1579.830877] IP:
>> [<ffffffffa0105373>] w_e_end_ov_req+0x43/0x1a0 [drbd]

There. w_e_end_ov_req+0x43, this is a place and offset where it is much
more likely to be a coding bug. It may even be fixed already.

Please reproduce with 8.3.3.

> I had done some pretty intensive testing (Bonnie++, dd, etc.) before and
> everything was ok...could not reproduce the problem afterwards, either.
>
>
> What bothers me (apart from the bug) is the following:
>
> - Your changelog says that compatibility with Linux 2.6.27, 2.6.28 and
> 2.6.29 was added in 8.3.1, while Ubuntu Server with kernel 2.6.28 ships
> with DRBD 8.3.0. Btw., the DRBD module appear to has been compiled in
> January, while the kernel was compiled in August.
>
>
> Did the Ubuntu guys really screw it up that bloody? And if so, am I
> the only one using the DRBD version included in Ubuntu server??

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


it-beratung at thomasreinhold

Sep 25, 2009, 3:52 AM

Post #5 of 7 (1086 views)
Permalink
Re: BUG: unable to handle kernel NULL pointer dereference at 0...037 (drbd-8.3.2) [In reply to]

Hi,

thanks for your answer.


Am 25.09.2009 um 11:36 schrieb Lars Ellenberg

> There. w_e_end_ov_req+0x43, this is a place and offset where it is
> much
> more likely to be a coding bug. It may even be fixed already.
>
> Please reproduce with 8.3.3.
>

I'll try, but I couldn't even reproduce it with the same system so
far. What about this part?

> - Your changelog says that compatibility with Linux 2.6.27, 2.6.28
> and
> 2.6.29 was added in 8.3.1, while Ubuntu Server with kernel 2.6.28
> ships
> with DRBD 8.3.0.



Take care,

Thomas
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


lars.ellenberg at linbit

Sep 25, 2009, 4:45 AM

Post #6 of 7 (1077 views)
Permalink
Re: BUG: unable to handle kernel NULL pointer dereference at 0...037 (drbd-8.3.2) [In reply to]

On Fri, Sep 25, 2009 at 12:52:47PM +0200, Thomas Reinhold wrote:
> I'll try, but I couldn't even reproduce it with the same system so far.
> What about this part?

Bugs that cannot reproduced are hard to fix.

>> - Your changelog says that compatibility with Linux 2.6.27, 2.6.28
>> and 2.6.29 was added in 8.3.1, while Ubuntu Server with kernel 2.6.28
>> ships with DRBD 8.3.0.

Maybe Ubuntu ships their own compat layer for out-of-tree modules?
Also, double check your packages, maybe you are not using the module you
think you are.
Or maybe the "buildtag" was not updated when the module was rebuilt,
or maybe ... I don't know.
Someone from Ubuntu around?


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
__
please don't Cc me, but send to list -- I'm subscribed
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


it-beratung at thomasreinhold

Sep 25, 2009, 7:07 AM

Post #7 of 7 (1073 views)
Permalink
Re: BUG: unable to handle kernel NULL pointer dereference at 0...037 (drbd-8.3.2) [In reply to]

Am 25.09.2009 um 13:45 schrieb Lars Ellenberg
>
> Maybe Ubuntu ships their own compat layer for out-of-tree modules?
> Also, double check your packages, maybe you are not using the module
> you
> think you are.
> Or maybe the "buildtag" was not updated when the module was rebuilt,
> or maybe ... I don't know.
> Someone from Ubuntu around?


Here is some further information. The DRBD modules is definitely the
one that comes with the Ubuntu kernel, as I have not compiled or
installed anything else.

> version: 8.3.0 (api:88/proto:86-89)
> GIT-hash: 9ba8b93e24d842f0dd3fb1f9b90e8348ddb95829 build by
> ivoks [at] ubunt, 2009-01-17 07:49:56
>
> filename: /lib/modules/2.6.28-15-server/kernel/ubuntu/drbd/
> drbd.ko
> alias: block-major-147-*
> license: GPL
> description: drbd - Distributed Replicated Block Device v8.3.0
> author: Philipp Reisner <phil [at] linbit>, Lars Ellenberg <lars [at] linbit
> >
> srcversion: E9C0B482BF4BE5DAEE8CF2A
> depends:
> vermagic: 2.6.28-15-server SMP mod_unload modversions
> parm: minor_count:Maximum number of drbd devices (1-255)
> (uint)
> parm: allow_oos:DONT USE! (bool)
> parm: cn_idx:uint
> parm: enable_faults:int
> parm: fault_rate:int
> parm: fault_count:int
> parm: fault_devs:int
> parm: trace_level:int
> parm: trace_type:int
> parm: trace_devs:int
> parm: proc_details:int
> parm: usermode_helper:string

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.