Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: DRBD: Users

DRBD Filesystem Pacemaker Resources Stopped

 

 

DRBD users RSS feed   Index | Next | Previous | View Threaded


Robert.Langley at ventura

Mar 23, 2012, 4:09 PM

Post #1 of 6 (1436 views)
Permalink
DRBD Filesystem Pacemaker Resources Stopped

Maybe I need to post this with Pacemaker? Not sure.
I am a bit new to this scene and trying my best to learn all of this (Linux/DRBD/Pacemaker/Heartbeat).

I am in the middle of following this document, "Highly available NFS storage with DRBD and Pacemaker" located at:
http://www.linbit.com/en/education/tech-guides/highly-available-nfs-with-drbd-and-pacemaker/

OS: Ubuntu 11.10
DRBD version: 8.3.11
Pacemaker version: 1.1.5
I have two servers with 2.4 TB of internal hard drive space each, plus mirrored hard drives for the OS. They both have 10 NICs (2 onboard in a bond and 8 between 2, 4 port intel NICs).

Issue: I got to the end of part 4.3 (commit) and that is when things went bad. I actually ended up with a split-brain and I seem to have recovered from that, but now my resources are as follows (running crm_mon -1):
My slave node is actually showing as the Master under the Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
Clone set Started
Resource Group: Only p_lvm_nfs is Started on my slave node. All of the Filesystem resources are Stopped.

Then, I have this at the bottom:
Failed actions:
p_fs_vol01_start_0 (node=ds01, call=46, rc=5, status=complete): not installed
p_fs_vol01_start_0 (node=ds02, call=430, rc=5, status=complete): not installed

Looking in the syslog on ds01 (primary node) does not reveal anything worth mentioning; but, looking at the syslog on ds02 (secondary node) shows the following messages:

pengine: [11725]: notice: unpack_rsc_op: Hard error - p_fs_vol01_start_0 failed with rc=5: Preventing p_fs_vol01 from re-starting on ds01
pengine: [11725]: WARN: unpack_rsc_op: Processing failed op p_fs_vol01_start_0 on ds01: not installed (5)
pengine: [11725]: notice: unpack_rsc_op: Operation p_lsb_nfsserver:1_monitor_0 found resource p_lsb_nfsserver:1 active on ds02
pengine: [11725]: notice: unpack_rsc_op: Hard error - p_fs_vol01_start_0 failed with rc=5: Preventing p_fs_vol01 from re-starting on ds02
pengine: [11725]: WARN: unpack_rsc_op: Processing failed op p_fs_vol01_start_0 on ds02: not installed (5)
pengine: [11725]: notice: native_print: failover-ip#011(ocf::heartbeat:IPaddr):#011Stopped
pengine: [11725]: notice: clone_print: Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
...
pengine: [11725]: WARN: common_apply_stickiness: Forcing p_fs_vol01 away from ds01 after 1000000 failures (max=1000000)
pengine: [11725]: notice: common_apply_stickiness: p_lvm_nfs can fail 999999 more times on ds02 before being forced off
pengine: [11725]: WARN: common_apply_stickiness: Forcing p_fs_vol01 away from ds02 after 1000000 failures (max=1000000)
pengine: [11725]: notice: LogActions: Leave failover-ip#011(Stopped)
pengine: [11725]: notice: LogActions: Leave p_drbd_nfs:0#011(Slave ds01)
pengine: [11725]: notice: LogActions: Leave p_drbd_nfs:1#011(Master ds02)


Thank you in advance for any assistance,
Robert


_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


andreas at hastexo

Mar 26, 2012, 4:18 PM

Post #2 of 6 (1412 views)
Permalink
Re: DRBD Filesystem Pacemaker Resources Stopped [In reply to]

On 03/24/2012 12:09 AM, Robert Langley wrote:
> Maybe I need to post this with Pacemaker? Not sure.
> I am a bit new to this scene and trying my best to learn all of this (Linux/DRBD/Pacemaker/Heartbeat).
>
> I am in the middle of following this document, "Highly available NFS storage with DRBD and Pacemaker" located at:
> http://www.linbit.com/en/education/tech-guides/highly-available-nfs-with-drbd-and-pacemaker/
>
> OS: Ubuntu 11.10
> DRBD version: 8.3.11
> Pacemaker version: 1.1.5
> I have two servers with 2.4 TB of internal hard drive space each, plus mirrored hard drives for the OS. They both have 10 NICs (2 onboard in a bond and 8 between 2, 4 port intel NICs).
>
> Issue: I got to the end of part 4.3 (commit) and that is when things went bad. I actually ended up with a split-brain and I seem to have recovered from that, but now my resources are as follows (running crm_mon -1):
> My slave node is actually showing as the Master under the Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
> Clone set Started
> Resource Group: Only p_lvm_nfs is Started on my slave node. All of the Filesystem resources are Stopped.
>
> Then, I have this at the bottom:
> Failed actions:
> p_fs_vol01_start_0 (node=ds01, call=46, rc=5, status=complete): not installed
> p_fs_vol01_start_0 (node=ds02, call=430, rc=5, status=complete): not installed

Mountpoint created on both nodes, defined correct device and valid file
system? What happens after a cleanup? ... crm resource cleanup
p_fs_vol01 ... grep for "Filesystem" in your logs to get the error
output from the resource agent.

For more ... please share current drbd state/configuration and your
cluster configuration.

Regards,
Andreas

--
Need help with DRBD?
http://www.hastexo.com/now

>
> Looking in the syslog on ds01 (primary node) does not reveal anything worth mentioning; but, looking at the syslog on ds02 (secondary node) shows the following messages:
>
> pengine: [11725]: notice: unpack_rsc_op: Hard error - p_fs_vol01_start_0 failed with rc=5: Preventing p_fs_vol01 from re-starting on ds01
> pengine: [11725]: WARN: unpack_rsc_op: Processing failed op p_fs_vol01_start_0 on ds01: not installed (5)
> pengine: [11725]: notice: unpack_rsc_op: Operation p_lsb_nfsserver:1_monitor_0 found resource p_lsb_nfsserver:1 active on ds02
> pengine: [11725]: notice: unpack_rsc_op: Hard error - p_fs_vol01_start_0 failed with rc=5: Preventing p_fs_vol01 from re-starting on ds02
> pengine: [11725]: WARN: unpack_rsc_op: Processing failed op p_fs_vol01_start_0 on ds02: not installed (5)
> pengine: [11725]: notice: native_print: failover-ip#011(ocf::heartbeat:IPaddr):#011Stopped
> pengine: [11725]: notice: clone_print: Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
> ...
> pengine: [11725]: WARN: common_apply_stickiness: Forcing p_fs_vol01 away from ds01 after 1000000 failures (max=1000000)
> pengine: [11725]: notice: common_apply_stickiness: p_lvm_nfs can fail 999999 more times on ds02 before being forced off
> pengine: [11725]: WARN: common_apply_stickiness: Forcing p_fs_vol01 away from ds02 after 1000000 failures (max=1000000)
> pengine: [11725]: notice: LogActions: Leave failover-ip#011(Stopped)
> pengine: [11725]: notice: LogActions: Leave p_drbd_nfs:0#011(Slave ds01)
> pengine: [11725]: notice: LogActions: Leave p_drbd_nfs:1#011(Master ds02)
>
>
> Thank you in advance for any assistance,
> Robert
>
>
> _______________________________________________
> drbd-user mailing list
> drbd-user [at] lists
> http://lists.linbit.com/mailman/listinfo/drbd-user
Attachments: signature.asc (0.22 KB)


Robert.Langley at ventura

Mar 29, 2012, 2:53 PM

Post #3 of 6 (1358 views)
Permalink
Re: DRBD Filesystem Pacemaker Resources Stopped [In reply to]

On 3/27/2012 01:18 AM, Andreas Krurz wrote:
> Mountpoint created on both nodes, defined correct device and valid file system? What happens after a cleanup? ... crm resource cleanup p_fs_vol01 ... grep for "Filesystem" in your logs to get the error output from the resource agent.
>
> For more ... please share current drbd state/configuration and your cluster configuration.
>
> Regards,
> Andreas

* Pardon me if I'm not replying correctly, I'm trying to learn the mailing list usage. I'll see how this goes. Look out, I'm a noob!

Andreas,
Thank you for your reply.

Mountpoints are done using LVM2 (as mentioned in the LinBit guide; the DRBD resource is the used as the physical volume for the LV Group) and are all showing available on ds01, status is NOT available on ds02 at this time. I formatted them with ext4 and specified that difference when going through the LinBit guide (for the Pacemaker config; they mention ext3 in their guide).
I had previously run the cleanup, and it did not appear to make a different.

This time I first ran crm_mon -1 and the lvm resource was Started, but the Filesystems were not. That is how it has been.
Then, and maybe I shouldn't have worried about this yet, but I noticed in my global_common.conf file that I hadn't included any wait-connect in the Startup section (see below for my additions.
After doing so, though I have not restarted anything yet, I ran crm resource cleanup p_fs_vol01 , then I saw the lvm resource say "FAILED" and I am now getting the following from crm_mon -1:

Stack: Heartbeat
Current DC: ds02 (8a61ab9e-da93-4b4d-8f37-9523436b5f14) - partition with quorum
Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
2 Nodes configured, unknown expected votes
4 Resources configured.
============

Online: [ ds01 ds02 ]

Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
p_drbd_nfs:0 (ocf::linbit:drbd): Slave ds01 (unmanaged) FAILED
p_drbd_nfs:1 (ocf::linbit:drbd): Slave ds02 (unmanaged) FAILED
Clone Set: cl_lsb_nfsserver [p_lsb_nfsserver]
Started: [ ds01 ds02 ]

Failed actions:
p_drbd_nfs:0_demote_0 (node=ds01, call=75, rc=5, status=complete): not installed
p_drbd_nfs:0_stop_0 (node=ds01, call=79, rc=5, status=complete): not installed
p_drbd_nfs:1_monitor_30000 (node=ds02, call=447, rc=5, status=complete): not installed
p_drbd_nfs:1_stop_0 (node=ds02, call=458, rc=5, status=complete): not installed

Grep for "Filesystem" in /var/log/syslog on ds01 shows the following for every volume repeatedly:
Mar 29 11:06:19 ds01 pengine: [27987]: notice: native_print: p_fs_vol01#011(ocf::heartbeat:Filesystem):#011Stopped

On ds02, I receive the same in the syslog file, with the addition of this message after the above messages:
Mar 29 11:09:26 ds02 Filesystem[2000]: [2021]: WARNING: Couldn't find device [/dev/nfs/vol01]. Expected /dev/??? to exist

DRBD State from ds01 (Before restarting ds02): Connected and UpToDate with ds01 as the Primary.
DRBD State from ds02 (After restarting ds02; interesting; Pacemaker?): cat: /proc/drbd: No such file or directory
DRBD State from ds01 (After restarting ds02): WFConnection with ds02 as unknown.

---- Configuration below here ---

:::DRBD Resource Config:::
resource nfs {
device /dev/drbd0;
disk /dev/sda1;
meta-disk internal;
on ds01 {
address 192.168.1.11:7790;
}
on ds02 {
address 192.168.1.12:7790;
}

:::DRBD Global_common.conf:::
global {
usage-count yes;
}

common {
protocol C;

handlers {
pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
}

startup {
# Just included before reply on mailing list. 3/29/2012, Please reply with comment to this if I am mistaken for adding these.
wfc-timeout 120
degr-wfc-timeout 120
outdated-wfc-timeout 120
wait-after-sb 180
}

disk {
on-io-error detach;
}

net {
after-sb-0pri disconnect;
after-sb-1pri disconnect;
after-sb-2pri disconnect;
}

syncer {
rate 100M;
al-extents 257;
}
}

:::Heartbeat ha.cf:::
autojoin none
mcast bond0 239.0.0.1 694 1 0
bcast bond1
keepalive 2
deadtime 15
warntime 5
initdead 60
node ds01
node ds02
pacemaker respawn

:::Pacemaker CIB.XML:::
<cib epoch="60" num_updates="0" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.5" have-quorum="1" cib-last-written="Thu Mar 29 10:39:59 2012" dc-uuid="8a61ab9e-da93-4b4d-8f37-9523436b5f14">
<configuration>
<crm_config>
<cluster_property_set id="cib-bootstrap-options">
<nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f"/>
<nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="Heartbeat"/>
<nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
<nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1333042796"/>
<nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
</cluster_property_set>
</crm_config>
<nodes>
<node id="b0dff0ec-073e-475b-b7b9-167ae122e5e0" type="normal" uname="ds01"/>
<node id="8a61ab9e-da93-4b4d-8f37-9523436b5f14" type="normal" uname="ds02"/>
</nodes>
<resources>
<primitive class="ocf" id="failover-ip" provider="heartbeat" type="IPaddr">
<instance_attributes id="failover-ip-instance_attributes">
<nvpair id="failover-ip-instance_attributes-ip" name="ip" value="192.168.2.10"/>
</instance_attributes>
<operations>
<op id="failover-ip-monitor-10s" interval="10s" name="monitor"/>
</operations>
<meta_attributes id="failover-ip-meta_attributes">
<nvpair id="failover-ip-meta_attributes-target-role" name="target-role" value="Stopped"/>
</meta_attributes>
</primitive>
<master id="ms_drbd_nfs">
<meta_attributes id="ms_drbd_nfs-meta_attributes">
<nvpair id="ms_drbd_nfs-meta_attributes-master-max" name="master-max" value="1"/>
<nvpair id="ms_drbd_nfs-meta_attributes-master-node-max" name="master-node-max" value="1"/>
<nvpair id="ms_drbd_nfs-meta_attributes-clone-max" name="clone-max" value="2"/>
<nvpair id="ms_drbd_nfs-meta_attributes-clone-node-max" name="clone-node-max" value="1"/>
<nvpair id="ms_drbd_nfs-meta_attributes-notify" name="notify" value="true"/>
</meta_attributes>
<primitive class="ocf" id="p_drbd_nfs" provider="linbit" type="drbd">
<instance_attributes id="p_drbd_nfs-instance_attributes">
<nvpair id="p_drbd_nfs-instance_attributes-drbd_resource" name="drbd_resource" value="nfs"/>
</instance_attributes>
<operations>
<op id="p_drbd_nfs-monitor-15" interval="15" name="monitor" role="Master"/>
<op id="p_drbd_nfs-monitor-30" interval="30" name="monitor" role="Slave"/>
</operations>
</primitive>
</master>
<clone id="cl_lsb_nfsserver">
<primitive class="lsb" id="p_lsb_nfsserver" type="nfs-kernel-server">
<operations>
<op id="p_lsb_nfsserver-monitor-30s" interval="30s" name="monitor"/>
</operations>
</primitive>
</clone>
<group id="g_nfs">
<primitive class="ocf" id="p_lvm_nfs" provider="heartbeat" type="LVM">
<instance_attributes id="p_lvm_nfs-instance_attributes">
<nvpair id="p_lvm_nfs-instance_attributes-volgrpname" name="volgrpname" value="nfs"/>
</instance_attributes>
<operations>
<op id="p_lvm_nfs-monitor-30s" interval="30s" name="monitor"/>
</operations>
<meta_attributes id="p_lvm_nfs-meta_attributes">
<nvpair id="p_lvm_nfs-meta_attributes-is-managed" name="is-managed" value="true"/>
</meta_attributes>
</primitive>
<primitive class="ocf" id="p_fs_vol01" provider="heartbeat" type="Filesystem">
<instance_attributes id="p_fs_vol01-instance_attributes">
<nvpair id="p_fs_vol01-instance_attributes-device" name="device" value="/dev/nfs/vol01"/>
<nvpair id="p_fs_vol01-instance_attributes-directory" name="directory" value="/srv/nfs/vol01"/>
<nvpair id="p_fs_vol01-instance_attributes-fstype" name="fstype" value="ext4"/>
</instance_attributes>
<operations>
<op id="p_fs_vol01-monitor-10s" interval="10s" name="monitor"/>
</operations>
<meta_attributes id="p_fs_vol01-meta_attributes">

</meta_attributes>
</primitive>
<primitive class="ocf" id="p_fs_vol02" provider="heartbeat" type="Filesystem">
<instance_attributes id="p_fs_vol02-instance_attributes">
<nvpair id="p_fs_vol02-instance_attributes-device" name="device" value="/dev/nfs/vol02"/>
<nvpair id="p_fs_vol02-instance_attributes-directory" name="directory" value="/srv/nfs/vol02"/>
<nvpair id="p_fs_vol02-instance_attributes-fstype" name="fstype" value="ext4"/>
</instance_attributes>
<operations>
<op id="p_fs_vol02-monitor-10s" interval="10s" name="monitor"/>
</operations>
</primitive>
<primitive class="ocf" id="p_fs_vol03" provider="heartbeat" type="Filesystem">
<instance_attributes id="p_fs_vol03-instance_attributes">
<nvpair id="p_fs_vol03-instance_attributes-device" name="device" value="/dev/nfs/vol03"/>
<nvpair id="p_fs_vol03-instance_attributes-directory" name="directory" value="/srv/nfs/vol03"/>
<nvpair id="p_fs_vol03-instance_attributes-fstype" name="fstype" value="ext4"/>
</instance_attributes>
<operations>
<op id="p_fs_vol03-monitor-10s" interval="10s" name="monitor"/>
</operations>
</primitive>
<primitive class="ocf" id="p_fs_vol04" provider="heartbeat" type="Filesystem">
<instance_attributes id="p_fs_vol04-instance_attributes">
<nvpair id="p_fs_vol04-instance_attributes-device" name="device" value="/dev/nfs/vol04"/>
<nvpair id="p_fs_vol04-instance_attributes-directory" name="directory" value="/srv/nfs/vol04"/>
<nvpair id="p_fs_vol04-instance_attributes-fstype" name="fstype" value="ext4"/>
</instance_attributes>
<operations>
<op id="p_fs_vol04-monitor-10s" interval="10s" name="monitor"/>
</operations>
</primitive>
<primitive class="ocf" id="p_fs_vol05" provider="heartbeat" type="Filesystem">
<instance_attributes id="p_fs_vol05-instance_attributes">
<nvpair id="p_fs_vol05-instance_attributes-device" name="device" value="/dev/nfs/vol05"/>
<nvpair id="p_fs_vol05-instance_attributes-directory" name="directory" value="/srv/nfs/vol05"/>
<nvpair id="p_fs_vol05-instance_attributes-fstype" name="fstype" value="ext4"/>
</instance_attributes>
<operations>
<op id="p_fs_vol05-monitor-10s" interval="10s" name="monitor"/>
</operations>
</primitive>
<primitive class="ocf" id="p_fs_vol06" provider="heartbeat" type="Filesystem">
<instance_attributes id="p_fs_vol06-instance_attributes">
<nvpair id="p_fs_vol06-instance_attributes-device" name="device" value="/dev/nfs/vol06"/>
<nvpair id="p_fs_vol06-instance_attributes-directory" name="directory" value="/srv/nfs/vol06"/>
<nvpair id="p_fs_vol06-instance_attributes-fstype" name="fstype" value="ext4"/>
</instance_attributes>
<operations>
<op id="p_fs_vol06-monitor-10s" interval="10s" name="monitor"/>
</operations>
</primitive>
<primitive class="ocf" id="p_fs_vol07" provider="heartbeat" type="Filesystem">
<instance_attributes id="p_fs_vol07-instance_attributes">
<nvpair id="p_fs_vol07-instance_attributes-device" name="device" value="/dev/nfs/vol07"/>
<nvpair id="p_fs_vol07-instance_attributes-directory" name="directory" value="/srv/nfs/vol07"/>
<nvpair id="p_fs_vol07-instance_attributes-fstype" name="fstype" value="ext4"/>
</instance_attributes>
<operations>
<op id="p_fs_vol07-monitor-10s" interval="10s" name="monitor"/>
</operations>
</primitive>
<primitive class="ocf" id="p_fs_vol08" provider="heartbeat" type="Filesystem">
<instance_attributes id="p_fs_vol08-instance_attributes">
<nvpair id="p_fs_vol08-instance_attributes-device" name="device" value="/dev/nfs/vol08"/>
<nvpair id="p_fs_vol08-instance_attributes-directory" name="directory" value="/srv/nfs/vol08"/>
<nvpair id="p_fs_vol08-instance_attributes-fstype" name="fstype" value="ext4"/>
</instance_attributes>
<operations>
<op id="p_fs_vol08-monitor-10s" interval="10s" name="monitor"/>
</operations>
</primitive>
<primitive class="ocf" id="p_fs_vol09" provider="heartbeat" type="Filesystem">
<instance_attributes id="p_fs_vol09-instance_attributes">
<nvpair id="p_fs_vol09-instance_attributes-device" name="device" value="/dev/nfs/vol09"/>
<nvpair id="p_fs_vol09-instance_attributes-directory" name="directory" value="/srv/nfs/vol09"/>
<nvpair id="p_fs_vol09-instance_attributes-fstype" name="fstype" value="ext4"/>
</instance_attributes>
<operations>
<op id="p_fs_vol09-monitor-10s" interval="10s" name="monitor"/>
</operations>
</primitive>
<primitive class="ocf" id="p_fs_vol10" provider="heartbeat" type="Filesystem">
<instance_attributes id="p_fs_vol10-instance_attributes">
<nvpair id="p_fs_vol10-instance_attributes-device" name="device" value="/dev/nfs/vol10"/>
<nvpair id="p_fs_vol10-instance_attributes-directory" name="directory" value="/srv/nfs/vol10"/>
<nvpair id="p_fs_vol10-instance_attributes-fstype" name="fstype" value="ext4"/>
</instance_attributes>
<operations>
<op id="p_fs_vol10-monitor-10s" interval="10s" name="monitor"/>
</operations>
</primitive>
<primitive class="ocf" id="p_fs_dc1" provider="heartbeat" type="Filesystem">
<instance_attributes id="p_fs_dc1-instance_attributes">
<nvpair id="p_fs_dc1-instance_attributes-device" name="device" value="/dev/nfs/dc1"/>
<nvpair id="p_fs_dc1-instance_attributes-directory" name="directory" value="/srv/nfs/dc1"/>
<nvpair id="p_fs_dc1-instance_attributes-fstype" name="fstype" value="ext4"/>
</instance_attributes>
<operations>
<op id="p_fs_dc1-monitor-10s" interval="10s" name="monitor"/>
</operations>
</primitive>
<meta_attributes id="g_nfs-meta_attributes">
<nvpair id="g_nfs-meta_attributes-target-role" name="target-role" value="Started"/>
</meta_attributes>
</group>
</resources>
<constraints>
<rsc_order first="ms_drbd_nfs" first-action="promote" id="o_drbd_before_nfs" score="INFINITY" then="g_nfs" then-action="start"/>
<rsc_colocation id="c_nfs_on_drbd" rsc="g_nfs" score="INFINITY" with-rsc="ms_drbd_nfs" with-rsc-role="Master"/>
<rsc_location id="cli-prefer-failover-ip" rsc="failover-ip">
<rule id="cli-prefer-rule-failover-ip" score="INFINITY" boolean-op="and">
<expression id="cli-prefer-expr-failover-ip" attribute="#uname" operation="eq" value="ds01" type="string"/>
</rule>
</rsc_location>
</constraints>
<rsc_defaults>
<meta_attributes id="rsc-options">
<nvpair id="rsc-options-resource-stickiness" name="resource-stickiness" value="200"/>
</meta_attributes>
</rsc_defaults>
</configuration>
</cib>


Thank you,
Robert

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


Robert.Langley at ventura

Apr 3, 2012, 12:05 PM

Post #4 of 6 (1305 views)
Permalink
DRBD Filesystem Pacemaker Resources Stopped [In reply to]

UPDATE: I just checked and my /dev/drbd0 is gone on both nodes. Also, /dev/drbd is no longer on either of the nodes.


_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


brian at linbit

Apr 3, 2012, 1:13 PM

Post #5 of 6 (1306 views)
Permalink
Re: DRBD Filesystem Pacemaker Resources Stopped [In reply to]

On 04/03/2012 12:05 PM, Robert Langley wrote:
> UPDATE: I just checked and my /dev/drbd0 is gone on both nodes. Also, /dev/drbd is no longer on either of the nodes.
Probably due to the module not being loaded, check 'lsmod | grep drbd'.
You can manually load the module with 'modprobe drbd'.

Brian

--

: Brian Hellman
: LINBIT | "Your Way to High Availability"
: 1-503-573-1262 | 1-877-4-LINBIT
: Web: http://www.linbit.com

_______________________________________________
drbd-user mailing list
drbd-user [at] lists
http://lists.linbit.com/mailman/listinfo/drbd-user


andreas at hastexo

Apr 3, 2012, 3:14 PM

Post #6 of 6 (1347 views)
Permalink
Re: DRBD Filesystem Pacemaker Resources Stopped [In reply to]

On 03/29/2012 11:53 PM, Robert Langley wrote:
> On 3/27/2012 01:18 AM, Andreas Krurz wrote:
>> Mountpoint created on both nodes, defined correct device and valid file system? What happens after a cleanup? ... crm resource cleanup p_fs_vol01 ... grep for "Filesystem" in your logs to get the error output from the resource agent.
>>
>> For more ... please share current drbd state/configuration and your cluster configuration.
>>
>> Regards,
>> Andreas
>
> * Pardon me if I'm not replying correctly, I'm trying to learn the mailing list usage. I'll see how this goes. Look out, I'm a noob!
>
> Andreas,
> Thank you for your reply.
>
> Mountpoints are done using LVM2 (as mentioned in the LinBit guide; the DRBD resource is the used as the physical volume for the LV Group) and are all showing available on ds01, status is NOT available on ds02 at this time. I formatted them with ext4 and specified that difference when going through the LinBit guide (for the Pacemaker config; they mention ext3 in their guide).
> I had previously run the cleanup, and it did not appear to make a different.
>
> This time I first ran crm_mon -1 and the lvm resource was Started, but the Filesystems were not. That is how it has been.
> Then, and maybe I shouldn't have worried about this yet, but I noticed in my global_common.conf file that I hadn't included any wait-connect in the Startup section (see below for my additions.
> After doing so, though I have not restarted anything yet, I ran crm resource cleanup p_fs_vol01 , then I saw the lvm resource say "FAILED" and I am now getting the following from crm_mon -1:
>
> Stack: Heartbeat
> Current DC: ds02 (8a61ab9e-da93-4b4d-8f37-9523436b5f14) - partition with quorum
> Version: 1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f
> 2 Nodes configured, unknown expected votes
> 4 Resources configured.
> ============
>
> Online: [ ds01 ds02 ]
>
> Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
> p_drbd_nfs:0 (ocf::linbit:drbd): Slave ds01 (unmanaged) FAILED
> p_drbd_nfs:1 (ocf::linbit:drbd): Slave ds02 (unmanaged) FAILED
> Clone Set: cl_lsb_nfsserver [p_lsb_nfsserver]
> Started: [ ds01 ds02 ]
>
> Failed actions:
> p_drbd_nfs:0_demote_0 (node=ds01, call=75, rc=5, status=complete): not installed
> p_drbd_nfs:0_stop_0 (node=ds01, call=79, rc=5, status=complete): not installed
> p_drbd_nfs:1_monitor_30000 (node=ds02, call=447, rc=5, status=complete): not installed
> p_drbd_nfs:1_stop_0 (node=ds02, call=458, rc=5, status=complete): not installed

hmm ... not installed ... looks like broken config. You only make
changes to your drbd config while Pacemaker is in maintenance-mode and
you don't switch it live without testing the drbd config for validity?

>
> Grep for "Filesystem" in /var/log/syslog on ds01 shows the following for every volume repeatedly:
> Mar 29 11:06:19 ds01 pengine: [27987]: notice: native_print: p_fs_vol01#011(ocf::heartbeat:Filesystem):#011Stopped
>
> On ds02, I receive the same in the syslog file, with the addition of this message after the above messages:
> Mar 29 11:09:26 ds02 Filesystem[2000]: [2021]: WARNING: Couldn't find device [/dev/nfs/vol01]. Expected /dev/??? to exist
>
> DRBD State from ds01 (Before restarting ds02): Connected and UpToDate with ds01 as the Primary.
> DRBD State from ds02 (After restarting ds02; interesting; Pacemaker?): cat: /proc/drbd: No such file or directory
> DRBD State from ds01 (After restarting ds02): WFConnection with ds02 as unknown.
>
> ---- Configuration below here ---
>
> :::DRBD Resource Config:::
> resource nfs {
> device /dev/drbd0;
> disk /dev/sda1;
> meta-disk internal;
> on ds01 {
> address 192.168.1.11:7790;
> }
> on ds02 {
> address 192.168.1.12:7790;
> }
>
> :::DRBD Global_common.conf:::
> global {
> usage-count yes;
> }
>
> common {
> protocol C;
>
> handlers {
> pri-on-incon-degr "/usr/lib/drbd/notify-pri-on-incon-degr.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
> pri-lost-after-sb "/usr/lib/drbd/notify-pri-lost-after-sb.sh; /usr/lib/drbd/notify-emergency-reboot.sh; echo b > /proc/sysrq-trigger ; reboot -f";
> local-io-error "/usr/lib/drbd/notify-io-error.sh; /usr/lib/drbd/notify-emergency-shutdown.sh; echo o > /proc/sysrq-trigger ; halt -f";
> }
>
> startup {
> # Just included before reply on mailing list. 3/29/2012, Please reply with comment to this if I am mistaken for adding these.
> wfc-timeout 120
> degr-wfc-timeout 120
> outdated-wfc-timeout 120
> wait-after-sb 180

ah yes ... missing ; at the end of those lines. But you don't need that
startup timeouts because they are only read by the init script which
should be disabled. The complete drbd should be managed by Pacemaker.

The rest of your config looks ok, except for not using STONITH and DRBD
resource level fencing.

If the filesystems still don't start because of "not installed" ...
these are the reasons and you should find the entrys in your logs:

ocf_log err "Couldn't find filesystem $FSTYPE in /proc/filesystems"

ocf_log err "Couldn't find device [$DEVICE]. Expected /dev/??? to exist"

ocf_log err "Couldn't find directory [$MOUNTPOINT] to use as a mount point"

So at least p_fs_vol01 suffers from one of the above problems ... check
mountpoints and lvs for existence and typos.

Regards,
Andreas

--
Need help with DRBD?
http://www.hastexo.com/now

> }
>
> disk {
> on-io-error detach;
> }
>
> net {
> after-sb-0pri disconnect;
> after-sb-1pri disconnect;
> after-sb-2pri disconnect;
> }
>
> syncer {
> rate 100M;
> al-extents 257;
> }
> }
>
> :::Heartbeat ha.cf:::
> autojoin none
> mcast bond0 239.0.0.1 694 1 0
> bcast bond1
> keepalive 2
> deadtime 15
> warntime 5
> initdead 60
> node ds01
> node ds02
> pacemaker respawn
>
> :::Pacemaker CIB.XML:::
> <cib epoch="60" num_updates="0" admin_epoch="0" validate-with="pacemaker-1.2" crm_feature_set="3.0.5" have-quorum="1" cib-last-written="Thu Mar 29 10:39:59 2012" dc-uuid="8a61ab9e-da93-4b4d-8f37-9523436b5f14">
> <configuration>
> <crm_config>
> <cluster_property_set id="cib-bootstrap-options">
> <nvpair id="cib-bootstrap-options-dc-version" name="dc-version" value="1.1.5-01e86afaaa6d4a8c4836f68df80ababd6ca3902f"/>
> <nvpair id="cib-bootstrap-options-cluster-infrastructure" name="cluster-infrastructure" value="Heartbeat"/>
> <nvpair id="cib-bootstrap-options-stonith-enabled" name="stonith-enabled" value="false"/>
> <nvpair id="cib-bootstrap-options-last-lrm-refresh" name="last-lrm-refresh" value="1333042796"/>
> <nvpair id="cib-bootstrap-options-no-quorum-policy" name="no-quorum-policy" value="ignore"/>
> </cluster_property_set>
> </crm_config>
> <nodes>
> <node id="b0dff0ec-073e-475b-b7b9-167ae122e5e0" type="normal" uname="ds01"/>
> <node id="8a61ab9e-da93-4b4d-8f37-9523436b5f14" type="normal" uname="ds02"/>
> </nodes>
> <resources>
> <primitive class="ocf" id="failover-ip" provider="heartbeat" type="IPaddr">
> <instance_attributes id="failover-ip-instance_attributes">
> <nvpair id="failover-ip-instance_attributes-ip" name="ip" value="192.168.2.10"/>
> </instance_attributes>
> <operations>
> <op id="failover-ip-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> <meta_attributes id="failover-ip-meta_attributes">
> <nvpair id="failover-ip-meta_attributes-target-role" name="target-role" value="Stopped"/>
> </meta_attributes>
> </primitive>
> <master id="ms_drbd_nfs">
> <meta_attributes id="ms_drbd_nfs-meta_attributes">
> <nvpair id="ms_drbd_nfs-meta_attributes-master-max" name="master-max" value="1"/>
> <nvpair id="ms_drbd_nfs-meta_attributes-master-node-max" name="master-node-max" value="1"/>
> <nvpair id="ms_drbd_nfs-meta_attributes-clone-max" name="clone-max" value="2"/>
> <nvpair id="ms_drbd_nfs-meta_attributes-clone-node-max" name="clone-node-max" value="1"/>
> <nvpair id="ms_drbd_nfs-meta_attributes-notify" name="notify" value="true"/>
> </meta_attributes>
> <primitive class="ocf" id="p_drbd_nfs" provider="linbit" type="drbd">
> <instance_attributes id="p_drbd_nfs-instance_attributes">
> <nvpair id="p_drbd_nfs-instance_attributes-drbd_resource" name="drbd_resource" value="nfs"/>
> </instance_attributes>
> <operations>
> <op id="p_drbd_nfs-monitor-15" interval="15" name="monitor" role="Master"/>
> <op id="p_drbd_nfs-monitor-30" interval="30" name="monitor" role="Slave"/>
> </operations>
> </primitive>
> </master>
> <clone id="cl_lsb_nfsserver">
> <primitive class="lsb" id="p_lsb_nfsserver" type="nfs-kernel-server">
> <operations>
> <op id="p_lsb_nfsserver-monitor-30s" interval="30s" name="monitor"/>
> </operations>
> </primitive>
> </clone>
> <group id="g_nfs">
> <primitive class="ocf" id="p_lvm_nfs" provider="heartbeat" type="LVM">
> <instance_attributes id="p_lvm_nfs-instance_attributes">
> <nvpair id="p_lvm_nfs-instance_attributes-volgrpname" name="volgrpname" value="nfs"/>
> </instance_attributes>
> <operations>
> <op id="p_lvm_nfs-monitor-30s" interval="30s" name="monitor"/>
> </operations>
> <meta_attributes id="p_lvm_nfs-meta_attributes">
> <nvpair id="p_lvm_nfs-meta_attributes-is-managed" name="is-managed" value="true"/>
> </meta_attributes>
> </primitive>
> <primitive class="ocf" id="p_fs_vol01" provider="heartbeat" type="Filesystem">
> <instance_attributes id="p_fs_vol01-instance_attributes">
> <nvpair id="p_fs_vol01-instance_attributes-device" name="device" value="/dev/nfs/vol01"/>
> <nvpair id="p_fs_vol01-instance_attributes-directory" name="directory" value="/srv/nfs/vol01"/>
> <nvpair id="p_fs_vol01-instance_attributes-fstype" name="fstype" value="ext4"/>
> </instance_attributes>
> <operations>
> <op id="p_fs_vol01-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> <meta_attributes id="p_fs_vol01-meta_attributes">
>
> </meta_attributes>
> </primitive>
> <primitive class="ocf" id="p_fs_vol02" provider="heartbeat" type="Filesystem">
> <instance_attributes id="p_fs_vol02-instance_attributes">
> <nvpair id="p_fs_vol02-instance_attributes-device" name="device" value="/dev/nfs/vol02"/>
> <nvpair id="p_fs_vol02-instance_attributes-directory" name="directory" value="/srv/nfs/vol02"/>
> <nvpair id="p_fs_vol02-instance_attributes-fstype" name="fstype" value="ext4"/>
> </instance_attributes>
> <operations>
> <op id="p_fs_vol02-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> </primitive>
> <primitive class="ocf" id="p_fs_vol03" provider="heartbeat" type="Filesystem">
> <instance_attributes id="p_fs_vol03-instance_attributes">
> <nvpair id="p_fs_vol03-instance_attributes-device" name="device" value="/dev/nfs/vol03"/>
> <nvpair id="p_fs_vol03-instance_attributes-directory" name="directory" value="/srv/nfs/vol03"/>
> <nvpair id="p_fs_vol03-instance_attributes-fstype" name="fstype" value="ext4"/>
> </instance_attributes>
> <operations>
> <op id="p_fs_vol03-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> </primitive>
> <primitive class="ocf" id="p_fs_vol04" provider="heartbeat" type="Filesystem">
> <instance_attributes id="p_fs_vol04-instance_attributes">
> <nvpair id="p_fs_vol04-instance_attributes-device" name="device" value="/dev/nfs/vol04"/>
> <nvpair id="p_fs_vol04-instance_attributes-directory" name="directory" value="/srv/nfs/vol04"/>
> <nvpair id="p_fs_vol04-instance_attributes-fstype" name="fstype" value="ext4"/>
> </instance_attributes>
> <operations>
> <op id="p_fs_vol04-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> </primitive>
> <primitive class="ocf" id="p_fs_vol05" provider="heartbeat" type="Filesystem">
> <instance_attributes id="p_fs_vol05-instance_attributes">
> <nvpair id="p_fs_vol05-instance_attributes-device" name="device" value="/dev/nfs/vol05"/>
> <nvpair id="p_fs_vol05-instance_attributes-directory" name="directory" value="/srv/nfs/vol05"/>
> <nvpair id="p_fs_vol05-instance_attributes-fstype" name="fstype" value="ext4"/>
> </instance_attributes>
> <operations>
> <op id="p_fs_vol05-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> </primitive>
> <primitive class="ocf" id="p_fs_vol06" provider="heartbeat" type="Filesystem">
> <instance_attributes id="p_fs_vol06-instance_attributes">
> <nvpair id="p_fs_vol06-instance_attributes-device" name="device" value="/dev/nfs/vol06"/>
> <nvpair id="p_fs_vol06-instance_attributes-directory" name="directory" value="/srv/nfs/vol06"/>
> <nvpair id="p_fs_vol06-instance_attributes-fstype" name="fstype" value="ext4"/>
> </instance_attributes>
> <operations>
> <op id="p_fs_vol06-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> </primitive>
> <primitive class="ocf" id="p_fs_vol07" provider="heartbeat" type="Filesystem">
> <instance_attributes id="p_fs_vol07-instance_attributes">
> <nvpair id="p_fs_vol07-instance_attributes-device" name="device" value="/dev/nfs/vol07"/>
> <nvpair id="p_fs_vol07-instance_attributes-directory" name="directory" value="/srv/nfs/vol07"/>
> <nvpair id="p_fs_vol07-instance_attributes-fstype" name="fstype" value="ext4"/>
> </instance_attributes>
> <operations>
> <op id="p_fs_vol07-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> </primitive>
> <primitive class="ocf" id="p_fs_vol08" provider="heartbeat" type="Filesystem">
> <instance_attributes id="p_fs_vol08-instance_attributes">
> <nvpair id="p_fs_vol08-instance_attributes-device" name="device" value="/dev/nfs/vol08"/>
> <nvpair id="p_fs_vol08-instance_attributes-directory" name="directory" value="/srv/nfs/vol08"/>
> <nvpair id="p_fs_vol08-instance_attributes-fstype" name="fstype" value="ext4"/>
> </instance_attributes>
> <operations>
> <op id="p_fs_vol08-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> </primitive>
> <primitive class="ocf" id="p_fs_vol09" provider="heartbeat" type="Filesystem">
> <instance_attributes id="p_fs_vol09-instance_attributes">
> <nvpair id="p_fs_vol09-instance_attributes-device" name="device" value="/dev/nfs/vol09"/>
> <nvpair id="p_fs_vol09-instance_attributes-directory" name="directory" value="/srv/nfs/vol09"/>
> <nvpair id="p_fs_vol09-instance_attributes-fstype" name="fstype" value="ext4"/>
> </instance_attributes>
> <operations>
> <op id="p_fs_vol09-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> </primitive>
> <primitive class="ocf" id="p_fs_vol10" provider="heartbeat" type="Filesystem">
> <instance_attributes id="p_fs_vol10-instance_attributes">
> <nvpair id="p_fs_vol10-instance_attributes-device" name="device" value="/dev/nfs/vol10"/>
> <nvpair id="p_fs_vol10-instance_attributes-directory" name="directory" value="/srv/nfs/vol10"/>
> <nvpair id="p_fs_vol10-instance_attributes-fstype" name="fstype" value="ext4"/>
> </instance_attributes>
> <operations>
> <op id="p_fs_vol10-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> </primitive>
> <primitive class="ocf" id="p_fs_dc1" provider="heartbeat" type="Filesystem">
> <instance_attributes id="p_fs_dc1-instance_attributes">
> <nvpair id="p_fs_dc1-instance_attributes-device" name="device" value="/dev/nfs/dc1"/>
> <nvpair id="p_fs_dc1-instance_attributes-directory" name="directory" value="/srv/nfs/dc1"/>
> <nvpair id="p_fs_dc1-instance_attributes-fstype" name="fstype" value="ext4"/>
> </instance_attributes>
> <operations>
> <op id="p_fs_dc1-monitor-10s" interval="10s" name="monitor"/>
> </operations>
> </primitive>
> <meta_attributes id="g_nfs-meta_attributes">
> <nvpair id="g_nfs-meta_attributes-target-role" name="target-role" value="Started"/>
> </meta_attributes>
> </group>
> </resources>
> <constraints>
> <rsc_order first="ms_drbd_nfs" first-action="promote" id="o_drbd_before_nfs" score="INFINITY" then="g_nfs" then-action="start"/>
> <rsc_colocation id="c_nfs_on_drbd" rsc="g_nfs" score="INFINITY" with-rsc="ms_drbd_nfs" with-rsc-role="Master"/>
> <rsc_location id="cli-prefer-failover-ip" rsc="failover-ip">
> <rule id="cli-prefer-rule-failover-ip" score="INFINITY" boolean-op="and">
> <expression id="cli-prefer-expr-failover-ip" attribute="#uname" operation="eq" value="ds01" type="string"/>
> </rule>
> </rsc_location>
> </constraints>
> <rsc_defaults>
> <meta_attributes id="rsc-options">
> <nvpair id="rsc-options-resource-stickiness" name="resource-stickiness" value="200"/>
> </meta_attributes>
> </rsc_defaults>
> </configuration>
> </cib>
>
>
> Thank you,
> Robert
>
> _______________________________________________
> drbd-user mailing list
> drbd-user [at] lists
> http://lists.linbit.com/mailman/listinfo/drbd-user
Attachments: signature.asc (0.22 KB)

DRBD users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.