Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

GFS2 with Pacemaker on RHEL6.3 restarts with reboot

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


bhaxo at sgi

Aug 8, 2012, 7:14 PM

Post #1 of 6 (486 views)
Permalink
GFS2 with Pacemaker on RHEL6.3 restarts with reboot

Greetings.

I have followed the setup instructions of Clusters From Scratch :
Creating Active/Passive and Active/Active Clusters on Fedora, Edition 5,
including locating the new cman pages that do not seem to be linked into
the main document, for example,

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08s02s02.html

The stack that I'm implementing includes RHEL6.3, drbd, dlm, gfs2,
Pacemaker (RHEL6.3 build), cman, kvm ... hopefully I didn't leave
anybody off the party list.

I have these all working together to support "live" migration of the
virt client between the two phys hosts, so at that level, all is good.

Questions: Is there a document that covers the fully covers such an
installation, meaning the extends the Cluster From Scratch (and replaces
the Apache example) to implementation of a HA virtual client? For
instance, should libvirtd be handled as a Pacemaker resource, or should
it be started as an system service at boot? What should be done with
"libvirt-guests"? Should cman be started as a system service at boot?

Problem: When the the non-VM-host is rebooted, then when Pacemaker
restarts the gfs2 filesystem gets restarted on the VM host, which causes
the stop and start of the VirtualDomain. The gfs2 filesystem also gets
restarted without of the VirtualDomain resource included.

This behavior does not seem correct ... I think I would have flagged it
in my memory if I'd encountered the behavior when working with the SLES
HAE product. I've been doing a lot of fumbling this past week trying to
get the colocation and order statements correct, without affecting this
behavior.

What am I missing?

Here are the first indications of this restart issue during the restart
of Pacemaker and friends with the boot. I have attached more messages.

Aug 8 20:00:57 hikari crmd[2734]: info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, id=status-hikari2-master-drbd_r0.1, name=master-drbd_r0:1, value=5, magic=NA, cib=0.474.170) : Transient attribute: update
Aug 8 20:00:57 hikari crmd[2734]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Aug 8 20:00:57 hikari pengine[2733]: notice: unpack_config: On loss of CCM Quorum: Ignore
Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Promote drbd_r0:1#011(Slave -> Master hikari2)
Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Restart virt#011(Started hikari) <<<<<<<<<<<<<<<<<<
Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Restart shared-gfs2:0#011(Started hikari) <<<<<<<<
Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Start shared-gfs2:1#011(hikari2)
Aug 8 20:00:57 hikari crmd[2734]: info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, id=status-hikari2-master-drbd_r1.1, name=master-drbd_r1:1, value=5, magic=NA, cib=0.474.171) : Transient attribute: update

Here are the current constraints resulting from fumbling (actually,
trying to make sense of all of the information obtained from a Google
searches):

colocation co-gfs-on-drbd inf: c_shared-gfs2 drbd_r0_clone:Master
order o-drbd_r0-then-gfs inf: drbd_r0_clone:promote c_shared-gfs2:start
order o-drbd_r1_clone-then-virt inf: drbd_r1_clone virt
order o-gfs-then-virt inf: c_shared-gfs2 virt

Full config file attached.

For reference, here is "service blah status" for the set of services:

[root [at] hikari ~]# ha-status
------- service corosync status -------
corosync (pid 1996) is running...
------- service cman status -------
cluster is running.
------- service drbd status -------
drbd driver loaded OK; device status:
version: 8.4.1 (api:1/proto:86-100)
GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by
phil [at] Build64R, 2012-04-17 11:28:08
m:res cs ro ds p mounted fstype
1:r0 Connected Primary/Primary UpToDate/UpToDate C /shared gfs2
2:r1 Connected Primary/Primary UpToDate/UpToDate C
3:r2 Connected Primary/Primary UpToDate/UpToDate C
------- service pacemaker status -------
pacemakerd (pid 8912) is running...
------- service gfs2 status -------
Configured GFS2 mountpoints:
/shared
Active GFS2 mountpoints:
/shared
------- service libvirtd status -------
libvirtd (pid 2510) is running...

[root [at] hikar ~]# crm_mon -1ro
============
Last updated: Wed Aug 8 21:01:47 2012
Last change: Wed Aug 8 20:48:49 2012 via cibadmin on hikari
Stack: cman
Current DC: hikari - partition with quorum
Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
2 Nodes configured, 2 expected votes
11 Resources configured.
============

Online: [ hikari hikari2 ]

Full list of resources:

Master/Slave Set: drbd_r0_clone [drbd_r0]
Masters: [ hikari hikari2 ]
Master/Slave Set: drbd_r1_clone [drbd_r1]
Masters: [ hikari hikari2 ]
Master/Slave Set: drbd_r2_clone [drbd_r2]
Masters: [ hikari hikari2 ]
ipmi-fencing-1 (stonith:fence_ipmilan): Started hikari
ipmi-fencing-2 (stonith:fence_ipmilan): Started hikari2
virt (ocf::heartbeat:VirtualDomain): Started hikari
Clone Set: c_shared-gfs2 [shared-gfs2]
Started: [ hikari hikari2 ]

Operations:
* Node hikari2:
drbd_r1:1: migration-threshold=1000000
+ (17) monitor: interval=60000ms rc=0 (ok)
+ (26) promote: rc=0 (ok)
drbd_r0:1: migration-threshold=1000000
+ (21) promote: rc=0 (ok)
drbd_r2:1: migration-threshold=1000000
+ (19) monitor: interval=60000ms rc=0 (ok)
+ (27) promote: rc=0 (ok)
ipmi-fencing-2: migration-threshold=1000000
+ (12) start: rc=0 (ok)
+ (13) monitor: interval=240000ms rc=0 (ok)
shared-gfs2:1: migration-threshold=1000000
+ (25) start: rc=0 (ok)
* Node hikari:
drbd_r1:0: migration-threshold=1000000
+ (24) promote: rc=0 (ok)
drbd_r2:0: migration-threshold=1000000
+ (25) promote: rc=0 (ok)
shared-gfs2:0: migration-threshold=1000000
+ (92) start: rc=0 (ok)
drbd_r0:0: migration-threshold=1000000
+ (23) promote: rc=0 (ok)
ipmi-fencing-1: migration-threshold=1000000
+ (12) start: rc=0 (ok)
+ (13) monitor: interval=240000ms rc=0 (ok)
virt: migration-threshold=1000000
+ (120) start: rc=0 (ok)
+ (121) monitor: interval=10000ms rc=0 (ok)

Thanks for reading ...
Bob Haxo
bhaxo [at] sgi
Attachments: crm-configure-save-20120808-1820.crm (2.82 KB)
  messages-startup-20120808-1832.txt (36.8 KB)


andrew at beekhof

Aug 9, 2012, 7:21 PM

Post #2 of 6 (454 views)
Permalink
Re: GFS2 with Pacemaker on RHEL6.3 restarts with reboot [In reply to]

On Thu, Aug 9, 2012 at 12:14 PM, Bob Haxo <bhaxo [at] sgi> wrote:
> Greetings.
>
> I have followed the setup instructions of Clusters From Scratch :
> Creating Active/Passive and Active/Active Clusters on Fedora, Edition 5,
> including locating the new cman pages that do not seem to be linked into
> the main document, for example,
>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08s02s02.html

The 1.1 document was updated for corosync 2.x
I kept the cman/plugin version around but moved it to:

http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/index.html

Look for "Version: 1.1-plugin" on the main docs page.

>
> The stack that I'm implementing includes RHEL6.3, drbd, dlm, gfs2,
> Pacemaker (RHEL6.3 build), cman, kvm ... hopefully I didn't leave
> anybody off the party list.
>
> I have these all working together to support "live" migration of the
> virt client between the two phys hosts, so at that level, all is good.
>
> Questions: Is there a document that covers the fully covers such an
> installation, meaning the extends the Cluster From Scratch (and replaces
> the Apache example) to implementation of a HA virtual client? For
> instance, should libvirtd be handled as a Pacemaker resource, or should
> it be started as an system service at boot? What should be done with
> "libvirt-guests"?

These things I do not know sorry.

> Should cman be started as a system service at boot?

I prefer not to, but its just a personal preference.
I run potentially broken versions of the cluster and have been hit
hard before with processes running amok and putting machines into
reboot cycles.

>
> Problem: When the the non-VM-host is rebooted, then when Pacemaker
> restarts the gfs2 filesystem gets restarted on the VM host, which causes
> the stop and start of the VirtualDomain. The gfs2 filesystem also gets
> restarted without of the VirtualDomain resource included.

This sounds like the "starting a clone on A causes a restart of the
clone on B" bug.
I think we've squashed that one now but not in a released version...
how confident are you at creating rpms?

> This behavior does not seem correct ... I think I would have flagged it
> in my memory if I'd encountered the behavior when working with the SLES
> HAE product. I've been doing a lot of fumbling this past week trying to
> get the colocation and order statements correct, without affecting this
> behavior.
>
> What am I missing?
>
> Here are the first indications of this restart issue during the restart
> of Pacemaker and friends with the boot. I have attached more messages.
>
> Aug 8 20:00:57 hikari crmd[2734]: info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, id=status-hikari2-master-drbd_r0.1, name=master-drbd_r0:1, value=5, magic=NA, cib=0.474.170) : Transient attribute: update
> Aug 8 20:00:57 hikari crmd[2734]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> Aug 8 20:00:57 hikari pengine[2733]: notice: unpack_config: On loss of CCM Quorum: Ignore
> Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Promote drbd_r0:1#011(Slave -> Master hikari2)
> Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Restart virt#011(Started hikari) <<<<<<<<<<<<<<<<<<
> Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Restart shared-gfs2:0#011(Started hikari) <<<<<<<<
> Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Start shared-gfs2:1#011(hikari2)
> Aug 8 20:00:57 hikari crmd[2734]: info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, id=status-hikari2-master-drbd_r1.1, name=master-drbd_r1:1, value=5, magic=NA, cib=0.474.171) : Transient attribute: update
>
> Here are the current constraints resulting from fumbling (actually,
> trying to make sense of all of the information obtained from a Google
> searches):
>
> colocation co-gfs-on-drbd inf: c_shared-gfs2 drbd_r0_clone:Master
> order o-drbd_r0-then-gfs inf: drbd_r0_clone:promote c_shared-gfs2:start
> order o-drbd_r1_clone-then-virt inf: drbd_r1_clone virt
> order o-gfs-then-virt inf: c_shared-gfs2 virt
>
> Full config file attached.
>
> For reference, here is "service blah status" for the set of services:
>
> [root [at] hikari ~]# ha-status
> ------- service corosync status -------
> corosync (pid 1996) is running...
> ------- service cman status -------
> cluster is running.
> ------- service drbd status -------
> drbd driver loaded OK; device status:
> version: 8.4.1 (api:1/proto:86-100)
> GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by
> phil [at] Build64R, 2012-04-17 11:28:08
> m:res cs ro ds p mounted fstype
> 1:r0 Connected Primary/Primary UpToDate/UpToDate C /shared gfs2
> 2:r1 Connected Primary/Primary UpToDate/UpToDate C
> 3:r2 Connected Primary/Primary UpToDate/UpToDate C
> ------- service pacemaker status -------
> pacemakerd (pid 8912) is running...
> ------- service gfs2 status -------
> Configured GFS2 mountpoints:
> /shared
> Active GFS2 mountpoints:
> /shared
> ------- service libvirtd status -------
> libvirtd (pid 2510) is running...
>
> [root [at] hikar ~]# crm_mon -1ro
> ============
> Last updated: Wed Aug 8 21:01:47 2012
> Last change: Wed Aug 8 20:48:49 2012 via cibadmin on hikari
> Stack: cman
> Current DC: hikari - partition with quorum
> Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
> 2 Nodes configured, 2 expected votes
> 11 Resources configured.
> ============
>
> Online: [ hikari hikari2 ]
>
> Full list of resources:
>
> Master/Slave Set: drbd_r0_clone [drbd_r0]
> Masters: [ hikari hikari2 ]
> Master/Slave Set: drbd_r1_clone [drbd_r1]
> Masters: [ hikari hikari2 ]
> Master/Slave Set: drbd_r2_clone [drbd_r2]
> Masters: [ hikari hikari2 ]
> ipmi-fencing-1 (stonith:fence_ipmilan): Started hikari
> ipmi-fencing-2 (stonith:fence_ipmilan): Started hikari2
> virt (ocf::heartbeat:VirtualDomain): Started hikari
> Clone Set: c_shared-gfs2 [shared-gfs2]
> Started: [ hikari hikari2 ]
>
> Operations:
> * Node hikari2:
> drbd_r1:1: migration-threshold=1000000
> + (17) monitor: interval=60000ms rc=0 (ok)
> + (26) promote: rc=0 (ok)
> drbd_r0:1: migration-threshold=1000000
> + (21) promote: rc=0 (ok)
> drbd_r2:1: migration-threshold=1000000
> + (19) monitor: interval=60000ms rc=0 (ok)
> + (27) promote: rc=0 (ok)
> ipmi-fencing-2: migration-threshold=1000000
> + (12) start: rc=0 (ok)
> + (13) monitor: interval=240000ms rc=0 (ok)
> shared-gfs2:1: migration-threshold=1000000
> + (25) start: rc=0 (ok)
> * Node hikari:
> drbd_r1:0: migration-threshold=1000000
> + (24) promote: rc=0 (ok)
> drbd_r2:0: migration-threshold=1000000
> + (25) promote: rc=0 (ok)
> shared-gfs2:0: migration-threshold=1000000
> + (92) start: rc=0 (ok)
> drbd_r0:0: migration-threshold=1000000
> + (23) promote: rc=0 (ok)
> ipmi-fencing-1: migration-threshold=1000000
> + (12) start: rc=0 (ok)
> + (13) monitor: interval=240000ms rc=0 (ok)
> virt: migration-threshold=1000000
> + (120) start: rc=0 (ok)
> + (121) monitor: interval=10000ms rc=0 (ok)
>
> Thanks for reading ...
> Bob Haxo
> bhaxo [at] sgi
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


bhaxo at sgi

Aug 12, 2012, 6:27 PM

Post #3 of 6 (440 views)
Permalink
Re: GFS2 with Pacemaker on RHEL6.3 restarts with reboot [In reply to]

On Fri, 2012-08-10 at 12:21 +1000, Andrew Beekhof wrote:
> On Thu, Aug 9, 2012 at 12:14 PM, Bob Haxo <bhaxo at sgi.com> wrote:
> > Greetings.
> >
> > I have followed the setup instructions of Clusters From Scratch :
> > Creating Active/Passive and Active/Active Clusters on Fedora, Edition 5,
> > including locating the new cman pages that do not seem to be linked into
> > the main document, for example,
> >
> > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08s02s02.html
>
> The 1.1 document was updated for corosync 2.x
> I kept the cman/plugin version around but moved it to:
>
> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/index.html
>
> Look for "Version: 1.1-plugin" on the main docs page.

Andrew, much thanks for the response ... and much thanks here ... I had
not connected the dots regarding use of cman being an *earlier* version
of the docs (and software stack).

>
> >
> > The stack that I'm implementing includes RHEL6.3, drbd, dlm, gfs2,
> > Pacemaker (RHEL6.3 build), cman, kvm ... hopefully I didn't leave
> > anybody off the party list.
> >
> > I have these all working together to support "live" migration of the
> > virt client between the two phys hosts, so at that level, all is good.
> >
> > Questions: Is there a document that covers the fully covers such an
> > installation, meaning the extends the Cluster From Scratch (and replaces
> > the Apache example) to implementation of a HA virtual client? For
> > instance, should libvirtd be handled as a Pacemaker resource, or should
> > it be started as an system service at boot? What should be done with
> > "libvirt-guests"?
>
> These things I do not know sorry.
>
> > Should cman be started as a system service at boot?
>
> I prefer not to, but its just a personal preference.
> I run potentially broken versions of the cluster and have been hit
> hard before with processes running amok and putting machines into
> reboot cycles.

Ah, right. I too in my testing start cman and pacemaker manually. I
was thinking more of when moving from testing to production. I think
you have answered that.

>
> >
> > Problem: When the the non-VM-host is rebooted, then when Pacemaker
> > restarts the gfs2 filesystem gets restarted on the VM host, which causes
> > the stop and start of the VirtualDomain. The gfs2 filesystem also gets
> > restarted without of the VirtualDomain resource included.
>
> This sounds like the "starting a clone on A causes a restart of the
> clone on B" bug.
> I think we've squashed that one now but not in a released version...
> how confident are you at creating rpms?

:-) Well "how confident" depends upon the precise meaning of "creating
rpms" .. if this is building a rpm given a working spec file, then that
I can do. If it is a matter of making mods to an almost working spec
file, that I can do. If it involves creating the spec file from scratch
for a large project, that would be a challenge.

FYI, I'm trying to get Pacemaker accepted for use in a product rather
than rgmanager.

Thanks, Andrew.
Bob Haxo
bhaxo at sgi.com

>
> > This behavior does not seem correct ... I think I would have flagged it
> > in my memory if I'd encountered the behavior when working with the SLES
> > HAE product. I've been doing a lot of fumbling this past week trying to
> > get the colocation and order statements correct, without affecting this
> > behavior.
> >
> > What am I missing?
> >
> > Here are the first indications of this restart issue during the restart
> > of Pacemaker and friends with the boot. I have attached more messages.
> >
> > Aug 8 20:00:57 hikari crmd[2734]: info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, id=status-hikari2-master-drbd_r0.1, name=master-drbd_r0:1, value=5, magic=NA, cib=0.474.170) : Transient attribute: update
> > Aug 8 20:00:57 hikari crmd[2734]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> > Aug 8 20:00:57 hikari pengine[2733]: notice: unpack_config: On loss of CCM Quorum: Ignore
> > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Promote drbd_r0:1#011(Slave -> Master hikari2)
> > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Restart virt#011(Started hikari) <<<<<<<<<<<<<<<<<<
> > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Restart shared-gfs2:0#011(Started hikari) <<<<<<<<
> > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Start shared-gfs2:1#011(hikari2)
> > Aug 8 20:00:57 hikari crmd[2734]: info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, id=status-hikari2-master-drbd_r1.1, name=master-drbd_r1:1, value=5, magic=NA, cib=0.474.171) : Transient attribute: update
> >
> > Here are the current constraints resulting from fumbling (actually,
> > trying to make sense of all of the information obtained from a Google
> > searches):
> >
> > colocation co-gfs-on-drbd inf: c_shared-gfs2 drbd_r0_clone:Master
> > order o-drbd_r0-then-gfs inf: drbd_r0_clone:promote c_shared-gfs2:start
> > order o-drbd_r1_clone-then-virt inf: drbd_r1_clone virt
> > order o-gfs-then-virt inf: c_shared-gfs2 virt
> >
> > Full config file attached.
> >
> > For reference, here is "service blah status" for the set of services:
> >
> > [root [at] hikari ~]# ha-status
> > ------- service corosync status -------
> > corosync (pid 1996) is running...
> > ------- service cman status -------
> > cluster is running.
> > ------- service drbd status -------
> > drbd driver loaded OK; device status:
> > version: 8.4.1 (api:1/proto:86-100)
> > GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by
> > phil [at] Build64R, 2012-04-17 11:28:08
> > m:res cs ro ds p mounted fstype
> > 1:r0 Connected Primary/Primary UpToDate/UpToDate C /shared gfs2
> > 2:r1 Connected Primary/Primary UpToDate/UpToDate C
> > 3:r2 Connected Primary/Primary UpToDate/UpToDate C
> > ------- service pacemaker status -------
> > pacemakerd (pid 8912) is running...
> > ------- service gfs2 status -------
> > Configured GFS2 mountpoints:
> > /shared
> > Active GFS2 mountpoints:
> > /shared
> > ------- service libvirtd status -------
> > libvirtd (pid 2510) is running...
> >
> > [root [at] hikar ~]# crm_mon -1ro
> > ============
> > Last updated: Wed Aug 8 21:01:47 2012
> > Last change: Wed Aug 8 20:48:49 2012 via cibadmin on hikari
> > Stack: cman
> > Current DC: hikari - partition with quorum
> > Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
> > 2 Nodes configured, 2 expected votes
> > 11 Resources configured.
> > ============
> >
> > Online: [ hikari hikari2 ]
> >
> > Full list of resources:
> >
> > Master/Slave Set: drbd_r0_clone [drbd_r0]
> > Masters: [ hikari hikari2 ]
> > Master/Slave Set: drbd_r1_clone [drbd_r1]
> > Masters: [ hikari hikari2 ]
> > Master/Slave Set: drbd_r2_clone [drbd_r2]
> > Masters: [ hikari hikari2 ]
> > ipmi-fencing-1 (stonith:fence_ipmilan): Started hikari
> > ipmi-fencing-2 (stonith:fence_ipmilan): Started hikari2
> > virt (ocf::heartbeat:VirtualDomain): Started hikari
> > Clone Set: c_shared-gfs2 [shared-gfs2]
> > Started: [ hikari hikari2 ]
> >
> > Operations:
> > * Node hikari2:
> > drbd_r1:1: migration-threshold=1000000
> > + (17) monitor: interval=60000ms rc=0 (ok)
> > + (26) promote: rc=0 (ok)
> > drbd_r0:1: migration-threshold=1000000
> > + (21) promote: rc=0 (ok)
> > drbd_r2:1: migration-threshold=1000000
> > + (19) monitor: interval=60000ms rc=0 (ok)
> > + (27) promote: rc=0 (ok)
> > ipmi-fencing-2: migration-threshold=1000000
> > + (12) start: rc=0 (ok)
> > + (13) monitor: interval=240000ms rc=0 (ok)
> > shared-gfs2:1: migration-threshold=1000000
> > + (25) start: rc=0 (ok)
> > * Node hikari:
> > drbd_r1:0: migration-threshold=1000000
> > + (24) promote: rc=0 (ok)
> > drbd_r2:0: migration-threshold=1000000
> > + (25) promote: rc=0 (ok)
> > shared-gfs2:0: migration-threshold=1000000
> > + (92) start: rc=0 (ok)
> > drbd_r0:0: migration-threshold=1000000
> > + (23) promote: rc=0 (ok)
> > ipmi-fencing-1: migration-threshold=1000000
> > + (12) start: rc=0 (ok)
> > + (13) monitor: interval=240000ms rc=0 (ok)
> > virt: migration-threshold=1000000
> > + (120) start: rc=0 (ok)
> > + (121) monitor: interval=10000ms rc=0 (ok)
> >
> > Thanks for reading ...
> > Bob Haxo
> > bhaxo @ sgi.com
> >
> > _______________________________________________
> > Pacemaker mailing list: Pacemaker [at] oss
> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> >
> > Project Home: http://www.clusterlabs.org
> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> > Bugs: http://bugs.clusterlabs.org
> >
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Aug 12, 2012, 7:04 PM

Post #4 of 6 (439 views)
Permalink
Re: GFS2 with Pacemaker on RHEL6.3 restarts with reboot [In reply to]

On Mon, Aug 13, 2012 at 11:27 AM, Bob Haxo <bhaxo [at] sgi> wrote:
>
> On Fri, 2012-08-10 at 12:21 +1000, Andrew Beekhof wrote:
>> On Thu, Aug 9, 2012 at 12:14 PM, Bob Haxo <bhaxo at sgi.com> wrote:
>> > Greetings.
>> >
>> > I have followed the setup instructions of Clusters From Scratch :
>> > Creating Active/Passive and Active/Active Clusters on Fedora, Edition 5,
>> > including locating the new cman pages that do not seem to be linked into
>> > the main document, for example,
>> >
>> > http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ch08s02s02.html
>>
>> The 1.1 document was updated for corosync 2.x
>> I kept the cman/plugin version around but moved it to:
>>
>> http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Clusters_from_Scratch/index.html
>>
>> Look for "Version: 1.1-plugin" on the main docs page.
>
> Andrew, much thanks for the response ... and much thanks here ... I had
> not connected the dots regarding use of cman being an *earlier* version
> of the docs (and software stack).
>
>>
>> >
>> > The stack that I'm implementing includes RHEL6.3, drbd, dlm, gfs2,
>> > Pacemaker (RHEL6.3 build), cman, kvm ... hopefully I didn't leave
>> > anybody off the party list.
>> >
>> > I have these all working together to support "live" migration of the
>> > virt client between the two phys hosts, so at that level, all is good.
>> >
>> > Questions: Is there a document that covers the fully covers such an
>> > installation, meaning the extends the Cluster From Scratch (and replaces
>> > the Apache example) to implementation of a HA virtual client? For
>> > instance, should libvirtd be handled as a Pacemaker resource, or should
>> > it be started as an system service at boot? What should be done with
>> > "libvirt-guests"?
>>
>> These things I do not know sorry.
>>
>> > Should cman be started as a system service at boot?
>>
>> I prefer not to, but its just a personal preference.
>> I run potentially broken versions of the cluster and have been hit
>> hard before with processes running amok and putting machines into
>> reboot cycles.
>
> Ah, right. I too in my testing start cman and pacemaker manually. I
> was thinking more of when moving from testing to production. I think
> you have answered that.
>
>>
>> >
>> > Problem: When the the non-VM-host is rebooted, then when Pacemaker
>> > restarts the gfs2 filesystem gets restarted on the VM host, which causes
>> > the stop and start of the VirtualDomain. The gfs2 filesystem also gets
>> > restarted without of the VirtualDomain resource included.
>>
>> This sounds like the "starting a clone on A causes a restart of the
>> clone on B" bug.
>> I think we've squashed that one now but not in a released version...
>> how confident are you at creating rpms?
>
> :-) Well "how confident" depends upon the precise meaning of "creating
> rpms" .. if this is building a rpm given a working spec file, then that
> I can do. If it is a matter of making mods to an almost working spec
> file, that I can do. If it involves creating the spec file from scratch
> for a large project, that would be a challenge.

Yeah, that would be asking a bit much :)

Depending on how "clean" the machine you're working on is, and if its
running the same software versions as the machine that the results
will be installed on, you /should/ be able to check out the latest git
and run 'make rpm'.
Otherwise you might need to set up mock and run something like 'make
mock-epel-6-x86_64' from the top of the latest pacemaker git tree.

>
> FYI, I'm trying to get Pacemaker accepted for use in a product rather
> than rgmanager.
>
> Thanks, Andrew.
> Bob Haxo
> bhaxo at sgi.com
>
>>
>> > This behavior does not seem correct ... I think I would have flagged it
>> > in my memory if I'd encountered the behavior when working with the SLES
>> > HAE product. I've been doing a lot of fumbling this past week trying to
>> > get the colocation and order statements correct, without affecting this
>> > behavior.
>> >
>> > What am I missing?
>> >
>> > Here are the first indications of this restart issue during the restart
>> > of Pacemaker and friends with the boot. I have attached more messages.
>> >
>> > Aug 8 20:00:57 hikari crmd[2734]: info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, id=status-hikari2-master-drbd_r0.1, name=master-drbd_r0:1, value=5, magic=NA, cib=0.474.170) : Transient attribute: update
>> > Aug 8 20:00:57 hikari crmd[2734]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
>> > Aug 8 20:00:57 hikari pengine[2733]: notice: unpack_config: On loss of CCM Quorum: Ignore
>> > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Promote drbd_r0:1#011(Slave -> Master hikari2)
>> > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Restart virt#011(Started hikari) <<<<<<<<<<<<<<<<<<
>> > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Restart shared-gfs2:0#011(Started hikari) <<<<<<<<
>> > Aug 8 20:00:57 hikari pengine[2733]: notice: LogActions: Start shared-gfs2:1#011(hikari2)
>> > Aug 8 20:00:57 hikari crmd[2734]: info: abort_transition_graph: te_update_diff:176 - Triggered transition abort (complete=1, tag=nvpair, id=status-hikari2-master-drbd_r1.1, name=master-drbd_r1:1, value=5, magic=NA, cib=0.474.171) : Transient attribute: update
>> >
>> > Here are the current constraints resulting from fumbling (actually,
>> > trying to make sense of all of the information obtained from a Google
>> > searches):
>> >
>> > colocation co-gfs-on-drbd inf: c_shared-gfs2 drbd_r0_clone:Master
>> > order o-drbd_r0-then-gfs inf: drbd_r0_clone:promote c_shared-gfs2:start
>> > order o-drbd_r1_clone-then-virt inf: drbd_r1_clone virt
>> > order o-gfs-then-virt inf: c_shared-gfs2 virt
>> >
>> > Full config file attached.
>> >
>> > For reference, here is "service blah status" for the set of services:
>> >
>> > [root [at] hikari ~]# ha-status
>> > ------- service corosync status -------
>> > corosync (pid 1996) is running...
>> > ------- service cman status -------
>> > cluster is running.
>> > ------- service drbd status -------
>> > drbd driver loaded OK; device status:
>> > version: 8.4.1 (api:1/proto:86-100)
>> > GIT-hash: 91b4c048c1a0e06777b5f65d312b38d47abaea80 build by
>> > phil [at] Build64R, 2012-04-17 11:28:08
>> > m:res cs ro ds p mounted fstype
>> > 1:r0 Connected Primary/Primary UpToDate/UpToDate C /shared gfs2
>> > 2:r1 Connected Primary/Primary UpToDate/UpToDate C
>> > 3:r2 Connected Primary/Primary UpToDate/UpToDate C
>> > ------- service pacemaker status -------
>> > pacemakerd (pid 8912) is running...
>> > ------- service gfs2 status -------
>> > Configured GFS2 mountpoints:
>> > /shared
>> > Active GFS2 mountpoints:
>> > /shared
>> > ------- service libvirtd status -------
>> > libvirtd (pid 2510) is running...
>> >
>> > [root [at] hikar ~]# crm_mon -1ro
>> > ============
>> > Last updated: Wed Aug 8 21:01:47 2012
>> > Last change: Wed Aug 8 20:48:49 2012 via cibadmin on hikari
>> > Stack: cman
>> > Current DC: hikari - partition with quorum
>> > Version: 1.1.7-6.el6-148fccfd5985c5590cc601123c6c16e966b85d14
>> > 2 Nodes configured, 2 expected votes
>> > 11 Resources configured.
>> > ============
>> >
>> > Online: [ hikari hikari2 ]
>> >
>> > Full list of resources:
>> >
>> > Master/Slave Set: drbd_r0_clone [drbd_r0]
>> > Masters: [ hikari hikari2 ]
>> > Master/Slave Set: drbd_r1_clone [drbd_r1]
>> > Masters: [ hikari hikari2 ]
>> > Master/Slave Set: drbd_r2_clone [drbd_r2]
>> > Masters: [ hikari hikari2 ]
>> > ipmi-fencing-1 (stonith:fence_ipmilan): Started hikari
>> > ipmi-fencing-2 (stonith:fence_ipmilan): Started hikari2
>> > virt (ocf::heartbeat:VirtualDomain): Started hikari
>> > Clone Set: c_shared-gfs2 [shared-gfs2]
>> > Started: [ hikari hikari2 ]
>> >
>> > Operations:
>> > * Node hikari2:
>> > drbd_r1:1: migration-threshold=1000000
>> > + (17) monitor: interval=60000ms rc=0 (ok)
>> > + (26) promote: rc=0 (ok)
>> > drbd_r0:1: migration-threshold=1000000
>> > + (21) promote: rc=0 (ok)
>> > drbd_r2:1: migration-threshold=1000000
>> > + (19) monitor: interval=60000ms rc=0 (ok)
>> > + (27) promote: rc=0 (ok)
>> > ipmi-fencing-2: migration-threshold=1000000
>> > + (12) start: rc=0 (ok)
>> > + (13) monitor: interval=240000ms rc=0 (ok)
>> > shared-gfs2:1: migration-threshold=1000000
>> > + (25) start: rc=0 (ok)
>> > * Node hikari:
>> > drbd_r1:0: migration-threshold=1000000
>> > + (24) promote: rc=0 (ok)
>> > drbd_r2:0: migration-threshold=1000000
>> > + (25) promote: rc=0 (ok)
>> > shared-gfs2:0: migration-threshold=1000000
>> > + (92) start: rc=0 (ok)
>> > drbd_r0:0: migration-threshold=1000000
>> > + (23) promote: rc=0 (ok)
>> > ipmi-fencing-1: migration-threshold=1000000
>> > + (12) start: rc=0 (ok)
>> > + (13) monitor: interval=240000ms rc=0 (ok)
>> > virt: migration-threshold=1000000
>> > + (120) start: rc=0 (ok)
>> > + (121) monitor: interval=10000ms rc=0 (ok)
>> >
>> > Thanks for reading ...
>> > Bob Haxo
>> > bhaxo @ sgi.com
>> >
>> > _______________________________________________
>> > Pacemaker mailing list: Pacemaker [at] oss
>> > http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>> >
>> > Project Home: http://www.clusterlabs.org
>> > Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> > Bugs: http://bugs.clusterlabs.org
>> >
>>
>> _______________________________________________
>> Pacemaker mailing list: Pacemaker [at] oss
>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>>
>> Project Home: http://www.clusterlabs.org
>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
>> Bugs: http://bugs.clusterlabs.org
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


andrew at beekhof

Aug 12, 2012, 7:09 PM

Post #5 of 6 (440 views)
Permalink
Re: GFS2 with Pacemaker on RHEL6.3 restarts with reboot [In reply to]

On Mon, Aug 13, 2012 at 11:27 AM, Bob Haxo <bhaxo [at] sgi> wrote:
> I had
> not connected the dots regarding use of cman being an *earlier* version
> of the docs (and software stack).

I've updated the listing page to highlight which version is
appropriate for cman :)

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


bhaxo at sgi

Aug 12, 2012, 8:25 PM

Post #6 of 6 (434 views)
Permalink
Re: GFS2 with Pacemaker on RHEL6.3 restarts with reboot [In reply to]

Thanks Andrew,

I'll check out the latest git and give building a try.

Cheers,
Bob Haxo



On Mon, 2012-08-13 at 12:09 +1000, Andrew Beekhof wrote:
> On Mon, Aug 13, 2012 at 11:27 AM, Bob Haxo <bhaxo @ sgi.com> wrote:
> > I had
> > not connected the dots regarding use of cman being an *earlier* version
> > of the docs (and software stack).
>
> I've updated the listing page to highlight which version is
> appropriate for cman :)
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org


_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.