chrisd1100 at gmail
Apr 12, 2012, 12:51 PM
Post #7 of 7
I can confirm that the issue is neither present in drbd 8.3.13rc1 or 8.4.1
Re: Testing local-io-error handler -- blkid hangs and ties up drbd device
[In reply to]
stable. The issue must have been a result of some of the code introduced
between the 8.4.1 release and the current master.
On Thu, Apr 12, 2012 at 11:18 AM, Chris Dickson <chrisd1100 [at] gmail>wrote:
> A little more info:
> If I set the the node with the good disk to primary, then write 100MB to
> the drbd volume, the drbd node with the bad disk calls my handler
> successfully, detaches and does not hang. It seems to only hang when I
> change the node with the bad disk's role to Primary.
> On Thu, Apr 12, 2012 at 9:40 AM, Chris Dickson <chrisd1100 [at] gmail>wrote:
>> Thanks Lars, dmesg indeed reported the exit code of 0:
>> [ 332.733554] block drbd575: role( Secondary -> Primary )
>> [ 332.772827] block drbd575: disk( UpToDate -> Failed )
>> [ 332.772840] block drbd575: Local IO failed in __req_mod. Detaching...
>> [ 332.772925] block drbd575: helper command: /sbin/drbdadm
>> local-io-error minor-575
>> [ 332.790163] block drbd575: helper command: /sbin/drbdadm
>> local-io-error minor-575 exit code 0 (0x0)
>> [ 332.790189] block drbd575: disk( Failed -> Diskless )
>> [ 332.803862] block drbd575: receiver updated UUIDs to effective data
>> uuid: 2B81D15C3E0ADD80
>> The peer node is also locked up, all operations report:
>> r575: State change failed: (-10) State change was refused by peer node
>> One question on 8.3.latest, one of the reasons I wanted to use 8.4 was
>> the support for more minor numbers. It's not that I necessarily need more
>> than 256 on one machine, but the way my numbering system works it makes it
>> nice to be able to assign minor numbers greater than 255. Is there a quick
>> hack somewhere in the source that I can increase this limit or is this a
>> more complex change made for 8.4?
>> Also the prefer-remote read balancing method is something that I was
>> interested in, but not super necessary.
>> On Thu, Apr 12, 2012 at 9:24 AM, Lars Ellenberg <
>> lars.ellenberg [at] linbit> wrote:
>>> On Thu, Apr 12, 2012 at 09:14:38AM -0400, Chris Dickson wrote:
>>> > Thanks for the quick reply,
>>> > My test handler currently isn't doing anything interesting, I just had
>>> > echo 'hello world' to a file which is located on a different drive
>>> than the
>>> > LVM volume. The echo seems to have completed successfully as the file
>>> > written.
>>> > The end goal for the handler is to simply insert a row into a remote
>>> > other than that the default behavior on io-error of detaching is
>>> > what I would like to have happen.
>>> > I just tried filtering out drbd in lvm.conf and that doesn't seem to
>>> be the
>>> > issue. After another try I did a quick ps auxf this showed up:
>>> > root 340 0.0 0.0 21392 1284 ? Ss 12:59 0:00 udevd
>>> > --daemon
>>> > root 415 0.0 0.0 21384 896 ? S 12:59 0:00 \_
>>> > --daemon
>>> > root 1775 0.0 0.0 8448 724 ? D 13:04 0:00 |
>>> > /sbin/blkid -o udev -p /dev/drbd575
>>> > So it seems like udev is initiating the blkid call, could it be doing
>>> > before drbd has finished executing the handler?
>>> If the handler finished,
>>> (drbd prints "... helper command .... exit code ..." to the kernel log).
>>> there is no reason for anything to hang.
>>> DRBD is supposed to retry failed local requests on the peer, and if that
>>> is not possible (no connection, or no good remote disk either), either
>>> freeze IO (if so configured) or report IO errors back up the stack.
>>> "Supposed to just work".
>>> Maybe rather downgrade to 8.3.latest, I know we fixed some issues
>>> in the retry logic on the way to 8.4.not-yet-but-"soon"-to-be-released.2
>>> : Lars Ellenberg
>>> : LINBIT | Your Way to High Availability
>>> : DRBD/HA support and consulting http://www.linbit.com
>>> drbd-user mailing list
>>> drbd-user [at] lists