Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Heartbeat 3 / Openais / ldirectord

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


uweiss at icrcom

Sep 22, 2009, 3:06 AM

Post #1 of 7 (2024 views)
Permalink
Heartbeat 3 / Openais / ldirectord

Hello

I'm trying to get this working since two days now, but ldirectord
somehow does not work. Had no problem with it on older Heartbeat 2. Hope
you can give me a hint.


My setup:
- CentOS 5.3
- HA packages from
"http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_
$releasever/":
- heartbeat-3.0.0-33.2
- openais-0.80.5-15.1
- libopenais2-0.80.5-15.1
- pacemaker-1.0.5-4.1
- pacemaker-libs-1.0.5-4.1


The goal:
- ldirectord with failover to second node


The current config looks like this:
====================================
crm(live)# configure show
node ovz01.icrcom.ch
node ovz04.icrcom.ch
primitive failover-ip ocf:heartbeat:IPaddr \
params ip="172.30.101.110" nic="eth0" netmask="24"
broadcast="172.30.101.255" \
op monitor interval="5s" timeout="15s"
primitive ldirectord_1 ocf:heartbeat:ldirectord \
params 1="ldirectord.cf" target_role="started" \
op monitor interval="120s" role="Started" timeout="60s" start_delay="0"
disabled="false"
property $id="cib-bootstrap-options" \
dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
cluster-infrastructure="Heartbeat" \
symetric-cluster="true" \
stonith-enabled="false" \
no-quorum-policy="stop" \
default-resource-stickiness="0" \
default-resource-failure-stickiness="0" \
stop-orphan-actions="true" \
stop-orphan-resources="true" \
remove-after-stop="false" \
short-resource-names="true" \
transition-idle-timeout="5min" \
default-action-timeout="15s" \
is-managed-default="true" \
expected-quorum-votes="2" \
last-lrm-refresh="1253609925"
====================================


The IP looks good, but not ldirectord:
====================================
# crm_mon --one-shot

============
Last updated: Tue Sep 22 11:57:06 2009
Stack: openais
Current DC: ovz04.icrcom.ch - partition with quorum
Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ ovz04.icrcom.ch ovz01.icrcom.ch ]

failover-ip (ocf::heartbeat:IPaddr): Started ovz04.icrcom.ch
ldirectord_1 (ocf::heartbeat:ldirectord) Started [ ovz04.icrcom.ch
ovz01.icrcom.ch ]

Failed actions:
ldirectord_1_monitor_0 (node=ovz04.icrcom.ch, call=3, rc=1,
status=complete): unknown error
ldirectord_1_stop_0 (node=ovz04.icrcom.ch, call=4, rc=1,
status=complete): unknown error
ldirectord_1_monitor_0 (node=ovz01.icrcom.ch, call=3, rc=1,
status=complete): unknown error
====================================


>From the logs:
====================================
Sep 22 11:56:40 ovz04 pengine: [12685]: info: determine_online_status:
Node ovz04.icrcom.ch is online
Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op:
ldirectord_1_monitor_0 on ovz04.icrcom.ch returned 1 (unknown error)
instead of the expected value: 7 (not running)
Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing
failed op ldirectord_1_monitor_0 on ovz04.icrcom.ch: unknown error
Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op:
ldirectord_1_stop_0 on ovz04.icrcom.ch returned 1 (unknown error)
instead of the expected value: 0 (ok)
Sep 22 11:56:40 ovz04 crmd: [12686]: info: process_lrm_event: LRM
operation failover-ip_start_0 (call=5, rc=0, cib-update=60,
confirmed=true) complete ok
Sep 22 11:56:40 ovz04 crmd: [12686]: info: match_graph_event: Action
failover-ip_start_0 (6) confirmed on ovz04.icrcom.ch (rc=0)
Sep 22 11:56:40 ovz04 crmd: [12686]: info: run_graph:
====================================================
Sep 22 11:56:40 ovz04 crmd: [12686]: notice: run_graph: Transition 6
(Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0,
Source=/var/lib/pengine/pe-warn-336.bz2): Stopped
Sep 22 11:56:40 ovz04 crmd: [12686]: info: te_graph_trigger: Transition
6 is now complete
Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing
failed op ldirectord_1_stop_0 on ovz04.icrcom.ch: unknown error
Sep 22 11:56:40 ovz04 pengine: [12685]: info: native_add_running:
resource ldirectord_1 isnt managed
Sep 22 11:56:40 ovz04 pengine: [12685]: info: determine_online_status:
Node ovz01.icrcom.ch is online
Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op:
ldirectord_1_monitor_0 on ovz01.icrcom.ch returned 1 (unknown error)
instead of the expected value: 7 (not running)
Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing
failed op ldirectord_1_monitor_0 on ovz01.icrcom.ch: unknown error
====================================


Thank you
Urs

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Sep 22, 2009, 3:57 AM

Post #2 of 7 (1975 views)
Permalink
Re: Heartbeat 3 / Openais / ldirectord [In reply to]

Hi,

On Tue, Sep 22, 2009 at 12:06:48PM +0200, Urs Weiss wrote:
> Hello
>
> I'm trying to get this working since two days now, but ldirectord
> somehow does not work. Had no problem with it on older Heartbeat 2. Hope
> you can give me a hint.
>
>
> My setup:
> - CentOS 5.3
> - HA packages from
> "http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_
> $releasever/":
> - heartbeat-3.0.0-33.2
> - openais-0.80.5-15.1
> - libopenais2-0.80.5-15.1
> - pacemaker-1.0.5-4.1
> - pacemaker-libs-1.0.5-4.1
>
>
> The goal:
> - ldirectord with failover to second node
>
>
> The current config looks like this:
> ====================================
> crm(live)# configure show
> node ovz01.icrcom.ch
> node ovz04.icrcom.ch
> primitive failover-ip ocf:heartbeat:IPaddr \
> params ip="172.30.101.110" nic="eth0" netmask="24"
> broadcast="172.30.101.255" \
> op monitor interval="5s" timeout="15s"
> primitive ldirectord_1 ocf:heartbeat:ldirectord \
> params 1="ldirectord.cf" target_role="started" \
> op monitor interval="120s" role="Started" timeout="60s" start_delay="0"
> disabled="false"
> property $id="cib-bootstrap-options" \
> dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
> cluster-infrastructure="Heartbeat" \
> symetric-cluster="true" \
> stonith-enabled="false" \
> no-quorum-policy="stop" \
> default-resource-stickiness="0" \
> default-resource-failure-stickiness="0" \
> stop-orphan-actions="true" \
> stop-orphan-resources="true" \
> remove-after-stop="false" \
> short-resource-names="true" \
> transition-idle-timeout="5min" \
> default-action-timeout="15s" \
> is-managed-default="true" \
> expected-quorum-votes="2" \
> last-lrm-refresh="1253609925"
> ====================================
>
>
> The IP looks good, but not ldirectord:
> ====================================
> # crm_mon --one-shot
>
> ============
> Last updated: Tue Sep 22 11:57:06 2009
> Stack: openais
> Current DC: ovz04.icrcom.ch - partition with quorum
> Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
>
> Online: [ ovz04.icrcom.ch ovz01.icrcom.ch ]
>
> failover-ip (ocf::heartbeat:IPaddr): Started ovz04.icrcom.ch
> ldirectord_1 (ocf::heartbeat:ldirectord) Started [ ovz04.icrcom.ch
> ovz01.icrcom.ch ]
>
> Failed actions:
> ldirectord_1_monitor_0 (node=ovz04.icrcom.ch, call=3, rc=1,
> status=complete): unknown error
> ldirectord_1_stop_0 (node=ovz04.icrcom.ch, call=4, rc=1,
> status=complete): unknown error
> ldirectord_1_monitor_0 (node=ovz01.icrcom.ch, call=3, rc=1,
> status=complete): unknown error
> ====================================
>
>
> >From the logs:
> ====================================
> Sep 22 11:56:40 ovz04 pengine: [12685]: info: determine_online_status:
> Node ovz04.icrcom.ch is online
> Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op:
> ldirectord_1_monitor_0 on ovz04.icrcom.ch returned 1 (unknown error)
> instead of the expected value: 7 (not running)
> Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing
> failed op ldirectord_1_monitor_0 on ovz04.icrcom.ch: unknown error
> Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op:
> ldirectord_1_stop_0 on ovz04.icrcom.ch returned 1 (unknown error)
> instead of the expected value: 0 (ok)
> Sep 22 11:56:40 ovz04 crmd: [12686]: info: process_lrm_event: LRM
> operation failover-ip_start_0 (call=5, rc=0, cib-update=60,
> confirmed=true) complete ok
> Sep 22 11:56:40 ovz04 crmd: [12686]: info: match_graph_event: Action
> failover-ip_start_0 (6) confirmed on ovz04.icrcom.ch (rc=0)
> Sep 22 11:56:40 ovz04 crmd: [12686]: info: run_graph:
> ====================================================
> Sep 22 11:56:40 ovz04 crmd: [12686]: notice: run_graph: Transition 6
> (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0,
> Source=/var/lib/pengine/pe-warn-336.bz2): Stopped
> Sep 22 11:56:40 ovz04 crmd: [12686]: info: te_graph_trigger: Transition
> 6 is now complete
> Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing
> failed op ldirectord_1_stop_0 on ovz04.icrcom.ch: unknown error
> Sep 22 11:56:40 ovz04 pengine: [12685]: info: native_add_running:
> resource ldirectord_1 isnt managed
> Sep 22 11:56:40 ovz04 pengine: [12685]: info: determine_online_status:
> Node ovz01.icrcom.ch is online
> Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op:
> ldirectord_1_monitor_0 on ovz01.icrcom.ch returned 1 (unknown error)
> instead of the expected value: 7 (not running)
> Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing
> failed op ldirectord_1_monitor_0 on ovz01.icrcom.ch: unknown error
> ====================================

Look for 'lrmd.*ldirector' on all nodes where it failed. That
should show you what's happening with the resource.

Thanks,

Dejan

>
>
> Thank you
> Urs
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


uweiss at icrcom

Sep 22, 2009, 4:26 AM

Post #3 of 7 (1979 views)
Permalink
Re: Heartbeat 3 / Openais / ldirectord [In reply to]

Hello Dejan

On Tue, 2009-09-22 at 12:57 +0200, Dejan Muhamedagic wrote:
> Hi,
>
> On Tue, Sep 22, 2009 at 12:06:48PM +0200, Urs Weiss wrote:
> > Hello
> >
> > I'm trying to get this working since two days now, but ldirectord
> > somehow does not work. Had no problem with it on older Heartbeat 2. Hope
> > you can give me a hint.
> >
> >
> > My setup:
> > - CentOS 5.3
> > - HA packages from
> > "http://download.opensuse.org/repositories/server:/ha-clustering/RHEL_
> > $releasever/":
> > - heartbeat-3.0.0-33.2
> > - openais-0.80.5-15.1
> > - libopenais2-0.80.5-15.1
> > - pacemaker-1.0.5-4.1
> > - pacemaker-libs-1.0.5-4.1
> >
> >
> > The goal:
> > - ldirectord with failover to second node
> >
> >
> > The current config looks like this:
> > ====================================
> > crm(live)# configure show
> > node ovz01.icrcom.ch
> > node ovz04.icrcom.ch
> > primitive failover-ip ocf:heartbeat:IPaddr \
> > params ip="172.30.101.110" nic="eth0" netmask="24"
> > broadcast="172.30.101.255" \
> > op monitor interval="5s" timeout="15s"
> > primitive ldirectord_1 ocf:heartbeat:ldirectord \
> > params 1="ldirectord.cf" target_role="started" \
> > op monitor interval="120s" role="Started" timeout="60s" start_delay="0"
> > disabled="false"
> > property $id="cib-bootstrap-options" \
> > dc-version="1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7" \
> > cluster-infrastructure="Heartbeat" \
> > symetric-cluster="true" \
> > stonith-enabled="false" \
> > no-quorum-policy="stop" \
> > default-resource-stickiness="0" \
> > default-resource-failure-stickiness="0" \
> > stop-orphan-actions="true" \
> > stop-orphan-resources="true" \
> > remove-after-stop="false" \
> > short-resource-names="true" \
> > transition-idle-timeout="5min" \
> > default-action-timeout="15s" \
> > is-managed-default="true" \
> > expected-quorum-votes="2" \
> > last-lrm-refresh="1253609925"
> > ====================================
> >
> >
> > The IP looks good, but not ldirectord:
> > ====================================
> > # crm_mon --one-shot
> >
> > ============
> > Last updated: Tue Sep 22 11:57:06 2009
> > Stack: openais
> > Current DC: ovz04.icrcom.ch - partition with quorum
> > Version: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa7
> > 2 Nodes configured, 2 expected votes
> > 2 Resources configured.
> > ============
> >
> > Online: [ ovz04.icrcom.ch ovz01.icrcom.ch ]
> >
> > failover-ip (ocf::heartbeat:IPaddr): Started ovz04.icrcom.ch
> > ldirectord_1 (ocf::heartbeat:ldirectord) Started [ ovz04.icrcom.ch
> > ovz01.icrcom.ch ]
> >
> > Failed actions:
> > ldirectord_1_monitor_0 (node=ovz04.icrcom.ch, call=3, rc=1,
> > status=complete): unknown error
> > ldirectord_1_stop_0 (node=ovz04.icrcom.ch, call=4, rc=1,
> > status=complete): unknown error
> > ldirectord_1_monitor_0 (node=ovz01.icrcom.ch, call=3, rc=1,
> > status=complete): unknown error
> > ====================================
> >
> >
> > >From the logs:
> > ====================================
> > Sep 22 11:56:40 ovz04 pengine: [12685]: info: determine_online_status:
> > Node ovz04.icrcom.ch is online
> > Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op:
> > ldirectord_1_monitor_0 on ovz04.icrcom.ch returned 1 (unknown error)
> > instead of the expected value: 7 (not running)
> > Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing
> > failed op ldirectord_1_monitor_0 on ovz04.icrcom.ch: unknown error
> > Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op:
> > ldirectord_1_stop_0 on ovz04.icrcom.ch returned 1 (unknown error)
> > instead of the expected value: 0 (ok)
> > Sep 22 11:56:40 ovz04 crmd: [12686]: info: process_lrm_event: LRM
> > operation failover-ip_start_0 (call=5, rc=0, cib-update=60,
> > confirmed=true) complete ok
> > Sep 22 11:56:40 ovz04 crmd: [12686]: info: match_graph_event: Action
> > failover-ip_start_0 (6) confirmed on ovz04.icrcom.ch (rc=0)
> > Sep 22 11:56:40 ovz04 crmd: [12686]: info: run_graph:
> > ====================================================
> > Sep 22 11:56:40 ovz04 crmd: [12686]: notice: run_graph: Transition 6
> > (Complete=2, Pending=0, Fired=0, Skipped=1, Incomplete=0,
> > Source=/var/lib/pengine/pe-warn-336.bz2): Stopped
> > Sep 22 11:56:40 ovz04 crmd: [12686]: info: te_graph_trigger: Transition
> > 6 is now complete
> > Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing
> > failed op ldirectord_1_stop_0 on ovz04.icrcom.ch: unknown error
> > Sep 22 11:56:40 ovz04 pengine: [12685]: info: native_add_running:
> > resource ldirectord_1 isnt managed
> > Sep 22 11:56:40 ovz04 pengine: [12685]: info: determine_online_status:
> > Node ovz01.icrcom.ch is online
> > Sep 22 11:56:40 ovz04 pengine: [12685]: info: unpack_rsc_op:
> > ldirectord_1_monitor_0 on ovz01.icrcom.ch returned 1 (unknown error)
> > instead of the expected value: 7 (not running)
> > Sep 22 11:56:40 ovz04 pengine: [12685]: WARN: unpack_rsc_op: Processing
> > failed op ldirectord_1_monitor_0 on ovz01.icrcom.ch: unknown error
> > ====================================
>
> Look for 'lrmd.*ldirector' on all nodes where it failed. That
> should show you what's happening with the resource.
>

Ouch! The log i didn't looked at (ldirectord.log) ....
Did not found the ldirectord.cf . Oh man, thank you. Simple error,
stupid admin....

But one more (simple) question:
The IP and ldirectord are running now, but not at the some node. What
they have to of course in this setup. How can i make them always running
on the same node?

Thank you
Urs


> Thanks,
>
> Dejan
>
> >
> >
> > Thank you
> > Urs
> >
> > _______________________________________________
> > Linux-HA mailing list
> > Linux-HA [at] lists
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


misch at multinet

Sep 22, 2009, 4:31 AM

Post #4 of 7 (1973 views)
Permalink
Re: Heartbeat 3 / Openais / ldirectord [In reply to]

Am Dienstag, 22. September 2009 13:26:23 schrieb Urs Weiss:
> Hello Dejan
(...)

> But one more (simple) question:
> The IP and ldirectord are running now, but not at the some node. What
> they have to of course in this setup. How can i make them always running
> on the same node?

A colocation constraint.

Have you enabled the sync function of the ipvs module? This syncs the states
of ipvs to the other node. All downloads will continue after a failover.

Greetings,
--
Dr. Michael Schwartzkopff
MultiNET Services GmbH
Addresse: Bretonischer Ring 7; 85630 Grasbrunn; Germany
Tel: +49 - 89 - 45 69 11 0
Fax: +49 - 89 - 45 69 11 21
mob: +49 - 174 - 343 28 75

mail: misch [at] multinet
web: www.multinet.de

Sitz der Gesellschaft: 85630 Grasbrunn
Registergericht: Amtsgericht München HRB 114375
Geschäftsführer: Günter Jurgeneit, Hubert Martens

---

PGP Fingerprint: F919 3919 FF12 ED5A 2801 DEA6 AA77 57A4 EDD8 979B
Skype: misch42
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Darren.Mansell at opengi

Sep 22, 2009, 4:41 AM

Post #5 of 7 (1977 views)
Permalink
Re: Heartbeat 3 / Openais / ldirectord [In reply to]

> Hello Dejan

(...)



> But one more (simple) question:

> The IP and ldirectord are running now, but not at the some node. What

> they have to of course in this setup. How can i make them always
running

> on the same node?



A colocation constraint.



Have you enabled the sync function of the ipvs module? This syncs the
states

of ipvs to the other node. All downloads will continue after a failover.



Greetings,

--

Dr. Michael Schwartzkopff









Can you give more details on how to do this? Does it need to go in
ldirectord.cf, or the resource at all?



Thanks

Darren

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


uweiss at icrcom

Sep 22, 2009, 5:18 AM

Post #6 of 7 (1974 views)
Permalink
Re: Heartbeat 3 / Openais / ldirectord [In reply to]

> > But one more (simple) question:
>
> > The IP and ldirectord are running now, but not at the some node.
> What
>
> > they have to of course in this setup. How can i make them always
> running
>
> > on the same node?
>
>
>
> A colocation constraint.
>
>
>
> Have you enabled the sync function of the ipvs module? This syncs the
> states
>
> of ipvs to the other node. All downloads will continue after a
> failover.
>

Wau, thats cool. Didn't knew that.

I found the options from ipvsadm:
# ipvsadm --start-daemon master
# ipvsadm --start-daemon backup

Are they persistent? Or do i have to add them somewhere in a file?


To make a colocation constraint i did that in crm configuration (for my
example):
# colocation ldirector_on_ipaddr inf: failover-ip ldirectord_1
# commit

seems to work...

Thanks
Urs

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


uweiss at icrcom

Sep 22, 2009, 5:21 AM

Post #7 of 7 (1971 views)
Permalink
Re: Heartbeat 3 / Openais / ldirectord [In reply to]

> > But one more (simple) question:
>
> > The IP and ldirectord are running now, but not at the some node. What
>
> > they have to of course in this setup. How can i make them always
> running
>
> > on the same node?
>
>
>
> A colocation constraint.
>
>
>
> Have you enabled the sync function of the ipvs module? This syncs the
> states
>
> of ipvs to the other node. All downloads will continue after a failover.
>


Oh, totally forgot:
The Novell docs are a good resource to find such stuff:
http://www.novell.com/documentation/sle_ha/book_sleha/?page=/documentation/sle_ha/book_sleha/data/sec_ha_manual_config_constraints.html

Urs

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.