Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

"fence-peer helper broken, returned 1"

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


insyte at gmail

Sep 28, 2009, 5:55 PM

Post #1 of 2 (511 views)
Permalink
"fence-peer helper broken, returned 1"

I've implemented a simple two-node DRBD cluster with Heartbeat v1. I
have a single back-end ethernet link for both DRBD replication and
heartbeat traffic. I realize this is not a stable configuration.

If I kill the ethernet link between the two nodes, drbd does not fail
over, logging these errors:

Sep 28 19:19:55 test01 kernel: block drbd0: helper command:
/sbin/drbdadm fence-peer minor-0
Sep 28 19:19:55 test01 kernel: block drbd0: helper command:
/sbin/drbdadm fence-peer minor-0 exit code 1 (0x100)
Sep 28 19:19:55 test01 kernel: block drbd0: fence-peer helper broken, returned 1
Sep 28 19:19:55 test01 kernel: block drbd0: Considering state change
from bad state. Error would be: 'Refusing to be Primary while peer is
not outdated'

Is this caused by the lack of a secondary communications mechanism for
Heartbeat to convey the 'fence-peer' command to the second node? If
so, how does heartbeat/drbd handle a node that fails suddenly and
catastrophically, before the 'fence-peer' command can be conveyed?

Thanks!

-Ben

ha.cf contains these lines:
respawn hacluster /usr/lib64/heartbeat/dopd
apiauth dopd gid=haclient uid=hacluster

drbd.conf:

global {
usage-count no;
}

common {
syncer { rate 10M; }
}

resource ha_disk {
protocol C;
handlers {
pri-on-incon-degr "echo o > /proc/sysrq-trigger ; halt -f";
pri-lost-after-sb "echo o > /proc/sysrq-trigger ; halt -f";
local-io-error "echo o > /proc/sysrq-trigger ; halt -f";
outdate-peer "/usr/lib64/heartbeat/drbd-peer-outdater -t 5";
}

startup {
degr-wfc-timeout 120; # 2 minutes.
}

disk {
on-io-error detach;
fencing resource-only;
}

net {
cram-hmac-alg "sha1";
shared-secret "eea0a7bfc04b965b9beb5ba37096bc7a";
after-sb-0pri discard-older-primary;
after-sb-1pri consensus;
after-sb-2pri disconnect;
rr-conflict disconnect;
}

syncer {
rate 10M;
al-extents 257;
}

on test01 {
device /dev/drbd0;
disk /dev/vg0/drbd-disk;
address 10.1.1.210:7788;
flexible-meta-disk internal;
}

on test02 {
device /dev/drbd0;
disk /dev/vg0/drbd-disk;
address 10.1.1.211:7788;
flexible-meta-disk internal;
}
}
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


insyte at gmail

Sep 28, 2009, 6:21 PM

Post #2 of 2 (456 views)
Permalink
Re: "fence-peer helper broken, returned 1" [In reply to]

Solved my own problem. According to this link, I've encountered a
known bug in recent versions of DRBD:

http://www.gossamer-threads.com/lists/linuxha/pacemaker/57754


So manually running the drbd outdater works as expected:

[root [at] test0 ~]# /usr/lib64/heartbeat/drbd-peer-outdater -p test02 -r
ha_disk; echo $?
4

And I can also get it to work like so:

[root [at] test0 ~]# DRBD_PEER=test02 DRBD_RESOURCE=ha_disk /sbin/drbdadm
fence-peer minor-0; echo $?
4

I've modified the "outdate-peer" line in drbd.conf to include the
above command, complete with environment variables.

Also, I will be adding a serial link for improved reliability.

Thanks!

-Ben
_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.