Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

heartbeat strange behavior

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


douglas.pasqua at gmail

Apr 30, 2012, 9:52 AM

Post #1 of 3 (518 views)
Permalink
heartbeat strange behavior

Hi friends,

I create a linux ha solution using 2 nodes: node-a and node-b.

My /etc/ha.d/ha.cf:

use_logd yes
keepalive 1
deadtime 90
warntime 5
initdead 120
bcast eth6
node node-a
node node-b
crm off
auto_failback off

My /etc/ha.d/haresources
node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 service1 service2 service3

I booted the two nodes together. node-a become master and node-b become
slave. After, I booted the node-a. Then node-b become master. When node-a
return from boot, it become slave, because *auto_failback is off* i think.
All as expected until here.

As the node-a as a slave, I decide to halt the node-a (using halt command).
Then heartbeat in node-b go standby and my cluster was down. The virtual
ips was down too. I expected the node-b stay on. Why did this happen ?

Some log from node2:

Apr 30 00:02:57 node-b heartbeat: [3082]: info: Received shutdown notice
from 'node-a'.
Apr 30 00:02:57 node-b heartbeat: [3082]: info: Resources being acquired
from node-a.
Apr 30 00:02:57 node-b heartbeat: [4414]: debug: notify_world: setting
SIGCHLD Handler to SIG_DFL
Apr 30 00:02:57 node-b harc[4414]: [4428]: info: Running
/etc/ha.d/rc.d/status status
Apr 30 00:02:57 node-b heartbeat: [4416]: info: No local resources
[/usr/share/heartbeat/ResourceManager listkeys node-b] to acquire.
Apr 30 00:02:57 node-b heartbeat: [3082]: debug: StartNextRemoteRscReq():
child count 1

Apr 30 00:02:58 node-b ResourceManager[4462]: [4657]: debug:
/etc/init.d/asterisk start done. RC=1
Apr 30 00:02:58 node-b ResourceManager[4462]: [4658]: ERROR: Return code 1
from /etc/init.d/asterisk
Apr 30 00:02:58 node-b ResourceManager[4462]: [4659]: CRIT: Giving up
resources due to failure of asterisk
Apr 30 00:02:58 node-b ResourceManager[4462]: [4660]: info: Releasing
resource group: node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 asterisk
sincronismo notificacao
Apr 30 00:02:58 node-b ResourceManager[4462]: [4670]: info: Running
/etc/init.d/notificacao stop
Apr 30 00:02:58 node-b ResourceManager[4462]: [4671]: debug: Starting
/etc/init.d/notificacao stop

Apr 30 00:02:58 node-b ResourceManager[4462]: [4694]: debug:
/etc/init.d/notificacao stop done. RC=0
Apr 30 00:02:58 node-b ResourceManager[4462]: [4704]: info: Running
/etc/init.d/sincronismo stop
Apr 30 00:02:58 node-b ResourceManager[4462]: [4705]: debug: Starting
/etc/init.d/sincronismo stop
Apr 30 00:02:58 node-b ResourceManager[4462]: [4711]: debug:
/etc/init.d/sincronismo stop done. RC=0
Apr 30 00:02:58 node-b ResourceManager[4462]: [4720]: info: Running
/etc/init.d/asterisk stop
Apr 30 00:02:58 node-b ResourceManager[4462]: [4721]: debug: Starting
/etc/init.d/asterisk stop
Apr 30 00:02:58 node-b ResourceManager[4462]: [4725]: debug:
/etc/init.d/asterisk stop done. RC=0
Apr 30 00:02:58 node-b ResourceManager[4462]: [4741]: info: Running
/etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop
Apr 30 00:02:58 node-b ResourceManager[4462]: [4742]: debug: Starting
/etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop

Apr 30 00:03:29 node-b heartbeat: [3082]: info: node-b wants to go standby
[foreign]
Apr 30 00:03:39 node-b heartbeat: [3082]: WARN: No reply to standby
request. Standby request cancelled.
Apr 30 00:04:29 node-b heartbeat: [3082]: WARN: node node-a: is dead
Apr 30 00:04:29 node-b heartbeat: [3082]: info: Dead node node-a gave up
resources.
Apr 30 00:04:29 node-b heartbeat: [3082]: info: Link node-a:eth6 dead.


Best Regards,
Douglas V. Pasqua
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


lars.ellenberg at linbit

May 2, 2012, 5:25 AM

Post #2 of 3 (483 views)
Permalink
Re: heartbeat strange behavior [In reply to]

On Mon, Apr 30, 2012 at 01:52:05PM -0300, Douglas Pasqua wrote:
> Hi friends,
>
> I create a linux ha solution using 2 nodes: node-a and node-b.
>
> My /etc/ha.d/ha.cf:
>
> use_logd yes
> keepalive 1
> deadtime 90
> warntime 5
> initdead 120
> bcast eth6
> node node-a
> node node-b
> crm off
> auto_failback off
>
> My /etc/ha.d/haresources
> node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 service1 service2 service3
>
> I booted the two nodes together. node-a become master and node-b become
> slave. After, I booted the node-a. Then node-b become master. When node-a
> return from boot, it become slave, because *auto_failback is off* i think.
> All as expected until here.
>
> As the node-a as a slave, I decide to halt the node-a (using halt command).
> Then heartbeat in node-b go standby and my cluster was down. The virtual
> ips was down too. I expected the node-b stay on. Why did this happen ?
>
> Some log from node2:
>
> Apr 30 00:02:57 node-b heartbeat: [3082]: info: Received shutdown notice
> from 'node-a'.
> Apr 30 00:02:57 node-b heartbeat: [3082]: info: Resources being acquired
> from node-a.
> Apr 30 00:02:57 node-b heartbeat: [4414]: debug: notify_world: setting
> SIGCHLD Handler to SIG_DFL
> Apr 30 00:02:57 node-b harc[4414]: [4428]: info: Running
> /etc/ha.d/rc.d/status status
> Apr 30 00:02:57 node-b heartbeat: [4416]: info: No local resources
> [/usr/share/heartbeat/ResourceManager listkeys node-b] to acquire.
> Apr 30 00:02:57 node-b heartbeat: [3082]: debug: StartNextRemoteRscReq():
> child count 1
>
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4657]: debug:
> /etc/init.d/asterisk start done. RC=1
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4658]: ERROR: Return code 1
> from /etc/init.d/asterisk
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4659]: CRIT: Giving up
> resources due to failure of asterisk

Because of the above error when starting asterisk. Maybe your asterisk
init script is simply not idempotent. Maybe it is broken, or maybe
there really was some problem trying to start asterisk.


> Apr 30 00:02:58 node-b ResourceManager[4462]: [4660]: info: Releasing
> resource group: node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 asterisk
> sincronismo notificacao
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4670]: info: Running
> /etc/init.d/notificacao stop
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4671]: debug: Starting
> /etc/init.d/notificacao stop
>
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4694]: debug:
> /etc/init.d/notificacao stop done. RC=0
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4704]: info: Running
> /etc/init.d/sincronismo stop
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4705]: debug: Starting
> /etc/init.d/sincronismo stop
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4711]: debug:
> /etc/init.d/sincronismo stop done. RC=0
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4720]: info: Running
> /etc/init.d/asterisk stop
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4721]: debug: Starting
> /etc/init.d/asterisk stop
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4725]: debug:
> /etc/init.d/asterisk stop done. RC=0
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4741]: info: Running
> /etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop
> Apr 30 00:02:58 node-b ResourceManager[4462]: [4742]: debug: Starting
> /etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop
>
> Apr 30 00:03:29 node-b heartbeat: [3082]: info: node-b wants to go standby
> [foreign]
> Apr 30 00:03:39 node-b heartbeat: [3082]: WARN: No reply to standby
> request. Standby request cancelled.
> Apr 30 00:04:29 node-b heartbeat: [3082]: WARN: node node-a: is dead
> Apr 30 00:04:29 node-b heartbeat: [3082]: info: Dead node node-a gave up
> resources.
> Apr 30 00:04:29 node-b heartbeat: [3082]: info: Link node-a:eth6 dead.

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


douglas.pasqua at gmail

May 7, 2012, 5:39 AM

Post #3 of 3 (458 views)
Permalink
Re: heartbeat strange behavior [In reply to]

Thanks Lars..

problem solved. I changed the asterisk init script to be idempotent.

Regards,
Douglas

On Wed, May 2, 2012 at 9:25 AM, Lars Ellenberg <lars.ellenberg [at] linbit>wrote:

> On Mon, Apr 30, 2012 at 01:52:05PM -0300, Douglas Pasqua wrote:
> > Hi friends,
> >
> > I create a linux ha solution using 2 nodes: node-a and node-b.
> >
> > My /etc/ha.d/ha.cf:
> >
> > use_logd yes
> > keepalive 1
> > deadtime 90
> > warntime 5
> > initdead 120
> > bcast eth6
> > node node-a
> > node node-b
> > crm off
> > auto_failback off
> >
> > My /etc/ha.d/haresources
> > node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 service1 service2 service3
> >
> > I booted the two nodes together. node-a become master and node-b become
> > slave. After, I booted the node-a. Then node-b become master. When node-a
> > return from boot, it become slave, because *auto_failback is off* i
> think.
> > All as expected until here.
> >
> > As the node-a as a slave, I decide to halt the node-a (using halt
> command).
> > Then heartbeat in node-b go standby and my cluster was down. The virtual
> > ips was down too. I expected the node-b stay on. Why did this happen ?
> >
> > Some log from node2:
> >
> > Apr 30 00:02:57 node-b heartbeat: [3082]: info: Received shutdown notice
> > from 'node-a'.
> > Apr 30 00:02:57 node-b heartbeat: [3082]: info: Resources being acquired
> > from node-a.
> > Apr 30 00:02:57 node-b heartbeat: [4414]: debug: notify_world: setting
> > SIGCHLD Handler to SIG_DFL
> > Apr 30 00:02:57 node-b harc[4414]: [4428]: info: Running
> > /etc/ha.d/rc.d/status status
> > Apr 30 00:02:57 node-b heartbeat: [4416]: info: No local resources
> > [/usr/share/heartbeat/ResourceManager listkeys node-b] to acquire.
> > Apr 30 00:02:57 node-b heartbeat: [3082]: debug: StartNextRemoteRscReq():
> > child count 1
> >
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4657]: debug:
> > /etc/init.d/asterisk start done. RC=1
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4658]: ERROR: Return code
> 1
> > from /etc/init.d/asterisk
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4659]: CRIT: Giving up
> > resources due to failure of asterisk
>
> Because of the above error when starting asterisk. Maybe your asterisk
> init script is simply not idempotent. Maybe it is broken, or maybe
> there really was some problem trying to start asterisk.
>
>
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4660]: info: Releasing
> > resource group: node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 asterisk
> > sincronismo notificacao
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4670]: info: Running
> > /etc/init.d/notificacao stop
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4671]: debug: Starting
> > /etc/init.d/notificacao stop
> >
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4694]: debug:
> > /etc/init.d/notificacao stop done. RC=0
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4704]: info: Running
> > /etc/init.d/sincronismo stop
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4705]: debug: Starting
> > /etc/init.d/sincronismo stop
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4711]: debug:
> > /etc/init.d/sincronismo stop done. RC=0
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4720]: info: Running
> > /etc/init.d/asterisk stop
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4721]: debug: Starting
> > /etc/init.d/asterisk stop
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4725]: debug:
> > /etc/init.d/asterisk stop done. RC=0
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4741]: info: Running
> > /etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop
> > Apr 30 00:02:58 node-b ResourceManager[4462]: [4742]: debug: Starting
> > /etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop
> >
> > Apr 30 00:03:29 node-b heartbeat: [3082]: info: node-b wants to go
> standby
> > [foreign]
> > Apr 30 00:03:39 node-b heartbeat: [3082]: WARN: No reply to standby
> > request. Standby request cancelled.
> > Apr 30 00:04:29 node-b heartbeat: [3082]: WARN: node node-a: is dead
> > Apr 30 00:04:29 node-b heartbeat: [3082]: info: Dead node node-a gave up
> > resources.
> > Apr 30 00:04:29 node-b heartbeat: [3082]: info: Link node-a:eth6 dead.
>
> --
> : Lars Ellenberg
> : LINBIT | Your Way to High Availability
> : DRBD/HA support and consulting http://www.linbit.com
>
> DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.