Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Moving Resources Due to Failure

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


mohamed.s at alcatel-lucent

Apr 13, 2012, 3:53 AM

Post #1 of 2 (421 views)
Permalink
Moving Resources Due to Failure

Hi,
The Pacemaker_Explained.pdf document says that

" setting of migration-threshold=2 and failure-timeout=60s would cause the
resource to move to a new node after 2 failures, and allow it to move back
(depending on the stickiness and constraint scores) after one minute."

Can you please help me understand what will happen on the following
scenarios in 2 node active passive configuration?

1 - If one resource failed twice within 60s, it will move to the other
node.
This is clear to understand.

2 - If one resource failed once and there is no failure within 60s, will
the pacemaker reset the failcounts of that resource, so that the
failcounts are tracked freshly?

The failcounts are not reset if the migration-threshold didn't occur
within the failure-timeout period. Is that a bug in pacemaker-1.0.5-4.1?

Thanks,
Raffi

Note: Sorry for posting earliar with wrong subject line.

>
>
>
>
>
> > -----Original Message-----
> > From: linux-ha-bounces [at] lists [mailto:linux-ha-
> > bounces [at] lists] On Behalf Of Andreas Kurz
> > Sent: Friday, April 13, 2012 2:49 PM
> > To: linux-ha [at] lists
> > Subject: Re: [Linux-HA] problem with pind
> >
> > On 04/12/2012 02:59 PM, Trujillo Carmona, Antonio wrote:
> > >
> > > I'm try to configure a cluster and I have problem with pingd.
> > > my config is
> > > crm(live)configure# show
> > > node proxy-00
> > > node proxy-01
> > > primitive ip-segura ocf:heartbeat:IPaddr2 \
> > > params ip="10.104.16.123" nic="lan" cidr_netmask="19" \
> > > op monitor interval="10" \
> > > meta target-role="Started"
> > > primitive pingd ocf:pacemaker:pingd \
> >
> > use ocf:pacemaker:ping
> >
> > > params host_list="10.104.16.157" \
> >
> > and you have to define a monitor operation.
> >
> > Without any constraints to let the cluster react on connectivity changes
> > ping resource is useless ... this may help:
> >
> > http://www.hastexo.com/resources/hints-and-kinks/network-connectivity-
> > check-pacemaker
> >
> > Regards,
> > Andreas
> >
> > --
> > Need help with Pacemaker?
> > http://www.hastexo.com/now
> >
> > > meta target-role="Started"
> > > property $id="cib-bootstrap-options" \
> > > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
> > > cluster-infrastructure="openais" \
> > > stonith-enabled="false" \
> > > no-quorum-policy="ignore" \
> > > expected-quorum-votes="2"
> > >
> > > crm(live)# status
> > > ============
> > > Last updated: Thu Apr 12 14:54:21 2012
> > > Last change: Thu Apr 12 14:40:00 2012
> > > Stack: openais
> > > Current DC: proxy-00 - partition WITHOUT quorum
> > > Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> > > 2 Nodes configured, 2 expected votes
> > > 2 Resources configured.
> > > ============
> > >
> > > Online: [ proxy-00 ]
> > > OFFLINE: [ proxy-01 ]
> > >
> > > ip-segura (ocf::heartbeat:IPaddr2): Started proxy-00
> > >
> > > Failed actions:
> > > pingd:0_monitor_0 (node=proxy-00, call=5, rc=2, status=complete):
> > > invalid parameter
> > > pingd_monitor_0 (node=proxy-00, call=8, rc=2, status=complete):
> > > invalid parameter
> > >
> > > crm(live)resource# start pingd
> > > crm(live)resource# status
> > > ip-segura (ocf::heartbeat:IPaddr2) Started
> > > pingd (ocf::pacemaker:pingd) Stopped
> > >
> > > and in the system log I got:
> > >
> > > Apr 12 14:55:18 proxy-00 crm_resource: [27941]: ERROR: unpack_rsc_op:
> > > Hard error - pingd:0_last_failure_0 failed with rc=2: Preventing
> pingd:0
> > > from re-starting on proxy-00
> > > Apr 12 14:55:18 proxy-00 crm_resource: [27941]: ERROR: unpack_rsc_op:
> > > Hard error - pingd_last_failure_0 failed with rc=2: Preventing pingd
> > > from re-starting on proxy-00
> > >
> > > I have stoped node 2 in order to less problem
> > >
> > > ¿I can't found any reference to this error?
> > > ¿Can you help me? please.
> > >
> > >
> > >
> > >
> >
> >

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Apr 16, 2012, 5:28 AM

Post #2 of 2 (386 views)
Permalink
Re: Moving Resources Due to Failure [In reply to]

Hi,

On Fri, Apr 13, 2012 at 04:23:38PM +0530, S, MOHAMED (MOHAMED)** CTR ** wrote:
> Hi,
> The Pacemaker_Explained.pdf document says that
>
> " setting of migration-threshold=2 and failure-timeout=60s would cause the
> resource to move to a new node after 2 failures, and allow it to move back
> (depending on the stickiness and constraint scores) after one minute."
>
> Can you please help me understand what will happen on the following
> scenarios in 2 node active passive configuration?
>
> 1 - If one resource failed twice within 60s, it will move to the other
> node.
> This is clear to understand.
>
> 2 - If one resource failed once and there is no failure within 60s, will
> the pacemaker reset the failcounts of that resource, so that the
> failcounts are tracked freshly?

Yes.

> The failcounts are not reset if the migration-threshold didn't occur
> within the failure-timeout period. Is that a bug in pacemaker-1.0.5-4.1?

Pacemaker versions v1.0.x don't reset failcounts in the CIB, but
the failcounts effectively expire and are not taken into
account. You can make sure by, for instance in the example you
quoted, making the resource fail twice on both nodes within 60s.
Note that it would still take the next PE run (see
cluster-recheck-interval) for the resource to be started.

Thanks,

Dejan

> Thanks,
> Raffi
>
> Note: Sorry for posting earliar with wrong subject line.
>
> >
> >
> >
> >
> >
> > > -----Original Message-----
> > > From: linux-ha-bounces [at] lists [mailto:linux-ha-
> > > bounces [at] lists] On Behalf Of Andreas Kurz
> > > Sent: Friday, April 13, 2012 2:49 PM
> > > To: linux-ha [at] lists
> > > Subject: Re: [Linux-HA] problem with pind
> > >
> > > On 04/12/2012 02:59 PM, Trujillo Carmona, Antonio wrote:
> > > >
> > > > I'm try to configure a cluster and I have problem with pingd.
> > > > my config is
> > > > crm(live)configure# show
> > > > node proxy-00
> > > > node proxy-01
> > > > primitive ip-segura ocf:heartbeat:IPaddr2 \
> > > > params ip="10.104.16.123" nic="lan" cidr_netmask="19" \
> > > > op monitor interval="10" \
> > > > meta target-role="Started"
> > > > primitive pingd ocf:pacemaker:pingd \
> > >
> > > use ocf:pacemaker:ping
> > >
> > > > params host_list="10.104.16.157" \
> > >
> > > and you have to define a monitor operation.
> > >
> > > Without any constraints to let the cluster react on connectivity changes
> > > ping resource is useless ... this may help:
> > >
> > > http://www.hastexo.com/resources/hints-and-kinks/network-connectivity-
> > > check-pacemaker
> > >
> > > Regards,
> > > Andreas
> > >
> > > --
> > > Need help with Pacemaker?
> > > http://www.hastexo.com/now
> > >
> > > > meta target-role="Started"
> > > > property $id="cib-bootstrap-options" \
> > > > dc-version="1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c" \
> > > > cluster-infrastructure="openais" \
> > > > stonith-enabled="false" \
> > > > no-quorum-policy="ignore" \
> > > > expected-quorum-votes="2"
> > > >
> > > > crm(live)# status
> > > > ============
> > > > Last updated: Thu Apr 12 14:54:21 2012
> > > > Last change: Thu Apr 12 14:40:00 2012
> > > > Stack: openais
> > > > Current DC: proxy-00 - partition WITHOUT quorum
> > > > Version: 1.1.6-9971ebba4494012a93c03b40a2c58ec0eb60f50c
> > > > 2 Nodes configured, 2 expected votes
> > > > 2 Resources configured.
> > > > ============
> > > >
> > > > Online: [ proxy-00 ]
> > > > OFFLINE: [ proxy-01 ]
> > > >
> > > > ip-segura (ocf::heartbeat:IPaddr2): Started proxy-00
> > > >
> > > > Failed actions:
> > > > pingd:0_monitor_0 (node=proxy-00, call=5, rc=2, status=complete):
> > > > invalid parameter
> > > > pingd_monitor_0 (node=proxy-00, call=8, rc=2, status=complete):
> > > > invalid parameter
> > > >
> > > > crm(live)resource# start pingd
> > > > crm(live)resource# status
> > > > ip-segura (ocf::heartbeat:IPaddr2) Started
> > > > pingd (ocf::pacemaker:pingd) Stopped
> > > >
> > > > and in the system log I got:
> > > >
> > > > Apr 12 14:55:18 proxy-00 crm_resource: [27941]: ERROR: unpack_rsc_op:
> > > > Hard error - pingd:0_last_failure_0 failed with rc=2: Preventing
> > pingd:0
> > > > from re-starting on proxy-00
> > > > Apr 12 14:55:18 proxy-00 crm_resource: [27941]: ERROR: unpack_rsc_op:
> > > > Hard error - pingd_last_failure_0 failed with rc=2: Preventing pingd
> > > > from re-starting on proxy-00
> > > >
> > > > I have stoped node 2 in order to less problem
> > > >
> > > > ¿I can't found any reference to this error?
> > > > ¿Can you help me? please.
> > > >
> > > >
> > > >
> > > >
> > >
> > >
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.