Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Who takes care of the failover?

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


rasca at miamammausalinux

Nov 25, 2009, 12:00 AM

Post #1 of 4 (599 views)
Permalink
Who takes care of the failover?

Hi everybody,
I'm trying to do some tests with heartbeat and pacemaker (with
ubuntu-server 9.10, heartbeat 2.99.2+sles11r9-5ubuntu1 and
pacemaker-heartbeat 1.0.5+hg20090813-0ubuntu4) this is my configuration:

node $id="2ee6e25d-8bd6-42ba-a2c5-6bc98b6f4715" nas-1 \
attributes standby="off"
node $id="4ea6f84c-841a-4272-903c-e14ad4baefe4" nas-2 \
attributes standby="off"
primitive drbd0 ocf:linbit:drbd \
params drbd_resource="r0" \
op monitor interval="15s" \
meta target-role="Started"
primitive drbd1 ocf:linbit:drbd \
params drbd_resource="r1" \
op monitor interval="15s" \
meta target-role="Started"
primitive fs_hafs ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/hafs" fstype="ext3" \
meta target-role="Started"
primitive fs_mysql ocf:heartbeat:Filesystem \
params device="/dev/drbd1" directory="/mysql" fstype="ext3" \
meta target-role="Started"
primitive ip_hafs ocf:heartbeat:IPaddr2 \
params ip="192.168.1.80" nic="eth0"
primitive ip_mysql ocf:heartbeat:IPaddr2 \
params ip="192.168.1.81" nic="eth0"
primitive mysql-server lsb:mysql
group hafs fs_hafs ip_hafs
group mysql fs_mysql ip_mysql mysql-server
ms ms_drbd0 drbd0 \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
ms ms_drbd1 drbd1 \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true"
location cli-prefer-hafs hafs \
rule $id="cli-prefer-rule-hafs" inf: #uname eq nas-1
location cli-prefer-mysql mysql \
rule $id="cli-prefer-rule-mysql" inf: #uname eq nas-2
colocation hafs_on_drbd inf: hafs ms_drbd0:Master
colocation mysql_on_drbd inf: mysql ms_drbd1:Master
order hafs_after_drbd inf: ms_drbd0:promote hafs:start
order mysql_after_drbd inf: ms_drbd1:promote mysql:start
property $id="cib-bootstrap-options" \
dc-version="1.0.5-3840e6b5a305ccb803d29b468556739e75532d56" \
cluster-infrastructure="Heartbeat" \
stonith-enabled="false" \
last-lrm-refresh="1259075046"

the two machines are connected with eth0 on the lan and one to each
othere with a double cross cable with a bond interface named bond1 that
works in balance-rr mode.
This is the heartbeat configuration:

crm on

use_logd no
debugfile /var/log/ha.debug
logfile /var/log/ha.log
logfacility local0

keepalive 2
deadtime 10
warntime 5
initdead 15

ucast eth0 192.168.1.79
ucast eth0 192.168.1.77
ucast bond1 10.0.0.1
ucast bond1 10.0.0.2

auto_failback on

node nas-1
node nas-2

ping_group lan gateway pdc

Everything works fine (i can move, migrate resources and put a node in
standby mode) until i force a node to be faulty. I mean something like
removing the ethernet cable. What I can't understand is why, even if the
log shows the cable failure, the crm does not move any resource.
For example, if I'm in this situation:

Master/Slave Set: ms_drbd0
Masters: [ nas-1 ]
Slaves: [ nas-2 ]
Resource Group: hafs
fs_hafs (ocf::heartbeat:Filesystem): Started nas-1
ip_hafs (ocf::heartbeat:IPaddr2): Started nas-1
Master/Slave Set: ms_drbd1
Masters: [ nas-2 ]
Slaves: [ nas-1 ]
Resource Group: mysql
fs_mysql (ocf::heartbeat:Filesystem): Started nas-2
ip_mysql (ocf::heartbeat:IPaddr2): Started nas-2
mysql-server (lsb:mysql): Started nas-2

and then I remove the ethernet cable of the nas-2 node, from the log i
see this message:

Nov 24 17:41:45 nas-1 heartbeat: [1489]: info: Link nas-2:eth0 dead.

but any other action is taken by the crm or hertbeat itself...

What's wrong with my thoughts? What am I ignoring?

Thanks for your help.

--
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
rasca [at] miamammausalinux
http://www.miamammausalinux.org

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


misch at multinet

Nov 25, 2009, 12:07 AM

Post #2 of 4 (573 views)
Permalink
Re: Who takes care of the failover? [In reply to]

Am Mittwoch, 25. November 2009 09:00:35 schrieb RaSca:
> Hi everybody,
> I'm trying to do some tests with heartbeat and pacemaker (with
> ubuntu-server 9.10, heartbeat 2.99.2+sles11r9-5ubuntu1 and
> pacemaker-heartbeat 1.0.5+hg20090813-0ubuntu4) this is my configuration:
>
> node $id="2ee6e25d-8bd6-42ba-a2c5-6bc98b6f4715" nas-1 \
> attributes standby="off"
> node $id="4ea6f84c-841a-4272-903c-e14ad4baefe4" nas-2 \
> attributes standby="off"
> primitive drbd0 ocf:linbit:drbd \
> params drbd_resource="r0" \
> op monitor interval="15s" \
> meta target-role="Started"
> primitive drbd1 ocf:linbit:drbd \
> params drbd_resource="r1" \
> op monitor interval="15s" \
> meta target-role="Started"
> primitive fs_hafs ocf:heartbeat:Filesystem \
> params device="/dev/drbd0" directory="/hafs" fstype="ext3" \
> meta target-role="Started"
> primitive fs_mysql ocf:heartbeat:Filesystem \
> params device="/dev/drbd1" directory="/mysql" fstype="ext3" \
> meta target-role="Started"
> primitive ip_hafs ocf:heartbeat:IPaddr2 \
> params ip="192.168.1.80" nic="eth0"
> primitive ip_mysql ocf:heartbeat:IPaddr2 \
> params ip="192.168.1.81" nic="eth0"
> primitive mysql-server lsb:mysql
> group hafs fs_hafs ip_hafs
> group mysql fs_mysql ip_mysql mysql-server
> ms ms_drbd0 drbd0 \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> ms ms_drbd1 drbd1 \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true"
> location cli-prefer-hafs hafs \
> rule $id="cli-prefer-rule-hafs" inf: #uname eq nas-1
> location cli-prefer-mysql mysql \
> rule $id="cli-prefer-rule-mysql" inf: #uname eq nas-2
> colocation hafs_on_drbd inf: hafs ms_drbd0:Master
> colocation mysql_on_drbd inf: mysql ms_drbd1:Master
> order hafs_after_drbd inf: ms_drbd0:promote hafs:start
> order mysql_after_drbd inf: ms_drbd1:promote mysql:start
> property $id="cib-bootstrap-options" \
> dc-version="1.0.5-3840e6b5a305ccb803d29b468556739e75532d56" \
> cluster-infrastructure="Heartbeat" \
> stonith-enabled="false" \
> last-lrm-refresh="1259075046"
>
> the two machines are connected with eth0 on the lan and one to each
> othere with a double cross cable with a bond interface named bond1 that
> works in balance-rr mode.
> This is the heartbeat configuration:
>
> crm on
>
> use_logd no
> debugfile /var/log/ha.debug
> logfile /var/log/ha.log
> logfacility local0
>
> keepalive 2
> deadtime 10
> warntime 5
> initdead 15
>
> ucast eth0 192.168.1.79
> ucast eth0 192.168.1.77
> ucast bond1 10.0.0.1
> ucast bond1 10.0.0.2

It doesn't make sens that the node is taking to itself. In a two node cluster
your need to make the nodes talk to each other, so only one ucast line.

> auto_failback on
>
> node nas-1
> node nas-2
>
> ping_group lan gateway pdc

using the ping... option in heartbeat is somehow deprecated. So make the
network tests a pingd resource.

> Everything works fine (i can move, migrate resources and put a node in
> standby mode) until i force a node to be faulty. I mean something like
> removing the ethernet cable. What I can't understand is why, even if the
> log shows the cable failure, the crm does not move any resource.
> For example, if I'm in this situation:
>
> Master/Slave Set: ms_drbd0
> Masters: [ nas-1 ]
> Slaves: [ nas-2 ]
> Resource Group: hafs
> fs_hafs (ocf::heartbeat:Filesystem): Started nas-1
> ip_hafs (ocf::heartbeat:IPaddr2): Started nas-1
> Master/Slave Set: ms_drbd1
> Masters: [ nas-2 ]
> Slaves: [ nas-1 ]
> Resource Group: mysql
> fs_mysql (ocf::heartbeat:Filesystem): Started nas-2
> ip_mysql (ocf::heartbeat:IPaddr2): Started nas-2
> mysql-server (lsb:mysql): Started nas-2
>
> and then I remove the ethernet cable of the nas-2 node, from the log i
> see this message:
>
> Nov 24 17:41:45 nas-1 heartbeat: [1489]: info: Link nas-2:eth0 dead.
>
> but any other action is taken by the crm or hertbeat itself...
>
> What's wrong with my thoughts? What am I ignoring?
>
> Thanks for your help.

How should pacemaker get the idea that the network is broken? And how did you
tell it to react on any failures? Please see:
http://www.clusterlabs.org/wiki/Example_configurations#Failover_IP__Service_in_a_Group_running_on_a_connected_node

--
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: misch [at] multinet
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


rasca at miamammausalinux

Nov 25, 2009, 12:31 AM

Post #3 of 4 (572 views)
Permalink
Re: Who takes care of the failover? [In reply to]

Il giorno Mer 25 Nov 2009 09:07:09 CET, Michael Schwartzkopff ha scritto:
[...]
>> ucast eth0 192.168.1.79
>> ucast eth0 192.168.1.77
>> ucast bond1 10.0.0.1
>> ucast bond1 10.0.0.2
> It doesn't make sens that the node is taking to itself. In a two node cluster
> your need to make the nodes talk to each other, so only one ucast line.

The heartbeat documentation says that "Note that ucast directives which
go to the local machine are effectively ignored. This allows the ha.cf
directives on all machines to be identical." so in this way I've got an
identical ha.cf file on the nodes.

[...]
> using the ping... option in heartbeat is somehow deprecated. So make the
> network tests a pingd resource.

What's the meaning of "Somehow"? Is or is no deprecated? Are there some
docs that explain this thing?

> How should pacemaker get the idea that the network is broken? And how did you
> tell it to react on any failures? Please see:
> http://www.clusterlabs.org/wiki/Example_configurations#Failover_IP__Service_in_a_Group_running_on_a_connected_node

Thank you so much, I'll take a look on this page.

Bye,

--
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
rasca [at] miamammausalinux
http://www.miamammausalinux.org

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Nov 25, 2009, 4:39 AM

Post #4 of 4 (564 views)
Permalink
Re: Who takes care of the failover? [In reply to]

Hi,

On Wed, Nov 25, 2009 at 09:31:27AM +0100, RaSca wrote:
> Il giorno Mer 25 Nov 2009 09:07:09 CET, Michael Schwartzkopff ha scritto:
> [...]
> >> ucast eth0 192.168.1.79
> >> ucast eth0 192.168.1.77
> >> ucast bond1 10.0.0.1
> >> ucast bond1 10.0.0.2
> > It doesn't make sens that the node is taking to itself. In a two node cluster
> > your need to make the nodes talk to each other, so only one ucast line.
>
> The heartbeat documentation says that "Note that ucast directives which
> go to the local machine are effectively ignored. This allows the ha.cf
> directives on all machines to be identical." so in this way I've got an
> identical ha.cf file on the nodes.

Right.

> [...]
> > using the ping... option in heartbeat is somehow deprecated. So make the
> > network tests a pingd resource.
>
> What's the meaning of "Somehow"? Is or is no deprecated? Are there some
> docs that explain this thing?

I'm not sure whether it's deprecated, but the best practice is to
use the pingd resource.

Thanks,

Dejan

> > How should pacemaker get the idea that the network is broken? And how did you
> > tell it to react on any failures? Please see:
> > http://www.clusterlabs.org/wiki/Example_configurations#Failover_IP__Service_in_a_Group_running_on_a_connected_node
>
> Thank you so much, I'll take a look on this page.
>
> Bye,
>
> --
> RaSca
> Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
> rasca [at] miamammausalinux
> http://www.miamammausalinux.org
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.