Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

Re: problem configuring DRBD resource in "Floating Peers" mode

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


DBOYN at POSTPATH

May 27, 2009, 8:25 PM

Post #1 of 2 (373 views)
Permalink
Re: problem configuring DRBD resource in "Floating Peers" mode

Thank you Raoul, for sharing these important! Changes!

Now my #crm_mon -r -V -i 2 looks better:
"
============
Last updated: Wed May 27 20:14:45 2009
Current DC: c001mlb_node01b - partition with quorum
Version: 1.0.3-b133b3f19797c00f9189f4b66b513963f9d25db9
26 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ c001mlb_node01a c001mlb_node01b ]
OFFLINE: [. c001mlb_node02a c001mlb_node03a c001mlb_node04a c001mlb_node05a c001mlb_node06a c001mlb_node07a c001mlb_node08a c001mlb_node09a c001mlb_node10a c
001mlb_node11a c001mlb_node12a c001mlb_node13a c001mlb_node02b c001mlb_node03b c001mlb_node04b c001mlb_node05b c001mlb_node06b c001mlb_node07b c001mlb_node0
8b c001mlb_node09b c001mlb_node10b c001mlb_node11b c001mlb_node12b c001mlb_node13b ]

Full list of resources:

ip-c001drbd01aš (ocf::heartbeat:IPaddr2):šššššš Started c001mlb_node01a
ip-c001drbd01bš (ocf::heartbeat:IPaddr2):šššššš Started c001mlb_node01b
Master/Slave Set: ms-drbd0
ššššššš Stopped: [ drbd0:0 drbd0:1 ]

Failed actions:
ššš drbd0:0_start_0 (node=c001mlb_node01a, call=6, rc=1, status=complete): unknown error
ššš drbd0:0_start_0 (node=c001mlb_node01b, call=6, rc=1, status=complete): unknown error
"

I tried changing "clone-overrides-hostname ="no"and setting drbd.conf to the correct "uname -n"values and it was working!

Has anybody succeed recently to configure floating peers?
Also Andrew Beekhof [andrew [at] beekhof] told me last night that "Its no longer possible to create colocation constraints for a specific
clone instance" -Which makes impossible to enforce limitations like:
<constraints>
<rsc_colocation id= colocate-drbd0-ip-c001drbd01a rsc= ip-c001drbd01a with-rsc= ms-drbd0:0 score= INFINITY />
<rsc_colocation id= colocate-drbd0-ip-c001drbd01b rsc= ip-c001drbd01b with-rsc= ms-drbd0:1 score= INFINITY />
</constraints>

To my view it becomes impossible to assure correct resource collocations between two sites in the context of DRBD -am I missing something?

From: Raoul Bhatia [IPAX] [mailto:r.bhatia [at] ipax]

Sent: Wednesday, May 27, 2009 1:33 AM

To: pacemaker [at] clusterlabs

Cc: pacemaker [at] clusterlabs

Subject: Re: [Pacemaker] problem configuring DRBD resource in Floating Peers mode



dไษอษิ฿า โฯสฮ wrote:
>Hi,
>
>My ultimate goal is to run a bunch of servers/nodes that shall be able
>to handle a bunch of floating drbd peers.

the last time i tried to use floating peers, it did not work out properly - especially when using outdating/.. mechanisms.

i recall a discussion last year [1] but cannot remember the conclusion.

i saw your colocation request. did you check/resolve that?

i'll have a look anyways ...


>c001mlb_node01a:root >cat /etc/drbd.conf
>
># /usr/share/doc/drbd82/drbd.conf
looks fine.

>c001mlb_node01a:root >cibadmin -Q
*snip*
>
>šššššš <master id= ms-drbd0 >
>
>šššššššš <meta_attributes id= ma-ms-drbd0 >
>
>šššššššššš <nvpair id= ma-ms-drbd0-1 name= clone_max value= 2 />
all _ have been replaced by - -> clone_max is now clone-max. please recheck your configuration to comply with pacemaker-1.0 and above.

imho, that is the reason for having 26 clone instances of each drbd
device: drbd0:0 to drbd0:25.

moreover, and withouth looking further down, i would say that this might fix some of your issues.

>
>šššššššššš <nvpair id= ma-ms-drbd0-2 name= clone_node_max value= 1 />
>
>šššššššššš <nvpair id= ma-ms-drbd0-3 name= master_max value= 1 />
>
>šššššššššš <nvpair id= ma-ms-drbd0-4 name= master_node_max
>value= 1 />
>
>šššššššššš <nvpair id= ma-ms-drbd0-5 name= notify value= yes />
>
>šššššššššš <nvpair id= ma-ms-drbd0-6 name= globally_unique
>value= true />
>
>šššššššššš <nvpair id= ma-ms-drbd0-7 name= target_role
>value= started />
>
>šššššššš</meta_attributes>
>
>šššššššš <primitive class= ocf provider= heartbeat type= drbd
>id= drbd0 >
>
>šššššššššš <instance_attributes id= ia-drbd0 >
>
>šššššššššššš <nvpair id= ia-drbd0-1 name= drbd_resource
>value= drbd0 />
>
>šššššššš šššš<nvpair id= ia-drbd0-2 name= clone_overrides_hostname
>value= yes />
>
>šššššššššš </instance_attributes>
>

>šššš <constraints>
>
>šššššš <rsc_location id= location-ip-c001drbd01a rsc= ip-c001drbd01a >
>
>šššššššš <rule id= ip-c001drbd01a-rule score= -INFINITY >
>
>šššššššššš <expression id= exp-ip-c001drbd01a-rule value= b
>attribute= site operation= eq />
>
>šššššššš </rule>
>
>šššššš </rsc_location>
>
>šššššš <rsc_location id= location-ip-c001drbd01b rsc= ip-c001drbd01b >
>
>šššš šššš<rule id= ip-c001drbd01b-rule score= -INFINITY >
>
>šššššššššš <expression id= exp-ip-c001drbd01b-rule value= a
>attribute= site operation= eq />
>
>šššššššš </rule>
>
>šššššš </rsc_location>
>
>šššššš <rsc_location id= drbd0-master-1 rsc= ms-drbd0 >
>
>šššššššš <rule id= drbd0-master-on-c001mlb_node01a role= master
>score= 100 >
>
>šššššššššš <expression id= expression-1 attribute= #uname
>operation= eq value= c001mlb_node01a />
>
>šššššššš </rule>
>
>šššššš </rsc_location>
>
>šš šššš<rsc_order id= order-drbd0-after-ip-c001drbd01a
>first= ip-c001drbd01a then= ms-drbd0 score= 1 />
>
>šššššš <rsc_order id= order-drbd0-after-ip-c001drbd01b
>first= ip-c001drbd01b then= ms-drbd0 score= 1 />
>
>šššššš <rsc_colocation rsc= ip-c001drbd01a score= INFINITY
>id= colocate-drbd0-ip-c001drbd01a with-rsc= ms-drbd0 />
>
>šššššš <rsc_colocation rsc= ip-c001drbd01b score= INFINITY
>id= colocate-drbd0-ip-c001drbd01b with-rsc= ms-drbd0 />
>
>šššš </constraints>
>
>šš </configuration>
*snip*

>ip-c001drbd01aš (ocf::heartbeat:IPaddr2):šššššš Started c001mlb_node01a
>
>Master/Slave Set: ms-drbd0
>
>šššššššš Stopped: [. drbd0:0 drbd0:1 drbd0:2 drbd0:3 drbd0:4 drbd0:5
>drbd0:6 drbd0:7 drbd0:8 drbd0:9 drbd0:10 drbd0:11 drbd0:12 drbd0:13
>drbd0:14 drbd0:15 drbd0:1
>6 drbd0:17 drbd0:18 drbd0:19 drbd0:20 drbd0:21 drbd0:22 drbd0:23
>drbd0:24 drbd0:25 ]
too many clones, see above.


->
>May 26 19:14:48 c001mlb_node01a cib: [31521]: info: retrieveCib:
>Reading cluster configuration from: /var/lib/heartbeat/crm/cib.2Ij02L (digest:
>/var/lib/heartbeat/crm/cib.G64hBN)
>
>May 26 19:14:48 c001mlb_node01a pengine: [25734]: notice: print_list:š
>Stopped: [. drbd0:0 drbd0:1 drbd0:2 drbd0:3 drbd0:4 drbd0:5 drbd0:6
>drbd0:7 drbd0:8 drbd0:9 drbd0:10 drbd0:11 drbd0:12 drbd0:13 drbd0:14
>drbd0:15 drbd0:16 drbd0:17 drbd0:18 drbd0:19 drbd0:20 drbd0:21
>drbd0:22
>drbd0:23 drbd0:24 drbd0:25 ]
all resources stopped. seems ok.

*snip*

>May 26 19:14:48 c001mlb_node01a pengine: [25734]: info: master_color:
>Promoting drbd0:0 (Stopped c001mlb_node01a)
>
>May 26 19:14:48 c001mlb_node01a pengine: [25734]: info: master_color:
>ms-drbd0: Promoted 1 instances of a possible 1 to master
>
>May 26 19:14:48 c001mlb_node01a pengine: [25734]: info: master_color:
>ms-drbd0: Promoted 1 instances of a possible 1 to master

2 instances have been promoted. that is correct.

>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info:
>do_state_transition: State transition S_POLICY_ENGINE ->
>S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
>origin=handle_response ]
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: ERROR: do_fsa_action:
>Action A_DC_TIMER_STOP took 812418818s to complete
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: ERROR: do_fsa_action:
>Action A_INTEGRATE_TIMER_STOP took 812418818s to complete
>
>May 26 19:14:48 c001mlb_node01a pengine: [25734]: WARN:
>process_pe_message: Transition 7: WARNINGs found during PE processing.
>PEngine Input stored in: /var/lib/pengine/pe-warn-771.bz2
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: ERROR: do_fsa_action:
>Action A_FINALIZE_TIMER_STOP took 812418818s to complete
>
>May 26 19:14:48 c001mlb_node01a pengine: [25734]: info:
>process_pe_message: Configuration WARNINGs found during PE processing.
>Please run crm_verify -L to identify issues.

warnings found - please see this pe-warn file and/or run crm_verify -L

>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: unpack_graph:
>Unpacked transition 7: 17 actions in 17 synapses
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: do_te_invoke:
>Processing graph 7 (ref=pe_calc-dc-1243365288-55) derived from
>/var/lib/pengine/pe-warn-771.bz2
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: ERROR: do_fsa_action:
>Action A_TE_INVOKE took 812418817s to complete
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_rsc_command:
>Initiating action 5: start ip-c001drbd01a_start_0 on c001mlb_node01a
>(local)
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: do_lrm_rsc_op:
>Performing key=5:7:0:c58a32ec-ae57-4bc8-8a1e-5d7069c2f2bd
>op=ip-c001drbd01a_start_0 )

drbd seems to be up, starting ip below

>May 26 19:14:48 c001mlb_node01a lrmd: [25732]: info: rsc:ip-c001drbd01a:
>start
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_pseudo_action:
>Pseudo action 14 fired and confirmed
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_pseudo_action:
>Pseudo action 15 fired and confirmed
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_pseudo_action:
>Pseudo action 16 fired and confirmed
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_pseudo_action:
>Pseudo action 25 fired and confirmed
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_pseudo_action:
>Pseudo action 28 fired and confirmed
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_rsc_command:
>Initiating action 141: notify drbd0:0_post_notify_start_0 on
>c001mlb_node01a (local)
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: do_lrm_rsc_op:
>Performing key=141:7:0:c58a32ec-ae57-4bc8-8a1e-5d7069c2f2bd
>op=drbd0:0_notify_0 )
>
>May 26 19:14:48 c001mlb_node01a lrmd: [25732]: info: rsc:drbd0:0:
>notify
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_rsc_command:
>Initiating action 143: notify drbd0:0_post_notify_promote_0 on
>c001mlb_node01a (local)
>
>May 26 19:14:48 c001mlb_node01a lrmd: [25732]: info: RA output:
>(ip-c001drbd01a:start:stderr) eth0:0: warning: name may be invalid
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: do_lrm_rsc_op:
>Performing key=143:7:0:c58a32ec-ae57-4bc8-8a1e-5d7069c2f2bd
>op=drbd0:0_notify_0 )
>
>May 26 19:14:48 c001mlb_node01a lrmd: [25732]: info: RA output:
>(drbd0:0:notify:stderr) 2009/05/26_19:14:48 INFO: drbd0: Using
>hostname node_0
>
>May 26 19:14:48 c001mlb_node01a lrmd: [25732]: info: RA output:
>(ip-c001drbd01a:start:stderr) 2009/05/26_19:14:48 INFO: ip -f inet
>addr add 192.168.80.213/32 brd 192.168.80.213 dev eth0 label eth0:0
>
>May 26 19:14:48 c001mlb_node01a lrmd: [25732]: info: RA output:
>(ip-c001drbd01a:start:stderr) 2009/05/26_19:14:48 INFO: ip link set
>eth0 up 2009/05/26_19:14:48 INFO: /usr/lib64/heartbeat/send_arp -i 200
>-r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.80.213
>eth0
>192.168.80.213 auto not_used not_used

error starting ip.

as far as i can see, this is now repeating for different nodes and as pacemaker tries to recover.

please fix the above suggestions and reply with new findings.

i usually log to syslog so that i also see the drbd/daemon/... messages to aid with debugging.

moreover, please attach the files from now on and do not c/p them into one big email. it is a lot of work to read your email...

cheers,
raoul
[1] http://www.gossamer-threads.com/lists/linuxha/dev/50929
[2] http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc.ššššššššš email.ššššššššš r.bhatia [at] ipax
Technischer Leiter

IPAX - Aloy Bhatia Hava OEGšššššššš web.ššššššššš http://www.ipax.at
Barawitzkagasse 10/2/2/11šššššššššš email.ššššššššššš office [at] ipax
1190 Wienšššššššššššššššššššššššššš tel.šššššššš šššššš+43 1 3670030
FN 277995t HG Wienššššššššššššššššš fax.ššššššššššš +43 1 3670030 15
____________________________________________________________________

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker


DBOYN at POSTPATH

May 27, 2009, 8:36 PM

Post #2 of 2 (354 views)
Permalink
Re: problem configuring DRBD resource in "Floating Peers" mode [In reply to]

Hi again!
šI forgot to share my changed configurations. Please see attachments.
Thank you for the support!

From: ไษอษิ฿า โฯสฮ

Sent: Wednesday, May 27, 2009 8:26 PM

To: 'pacemaker [at] clusterlabs'

Subject: RE: [Pacemaker] problem configuring DRBD resource in Floating Peers mode



Thank you Raoul, for sharing these important! Changes!

Now my #crm_mon -r -V -i 2 looks better:
"
============
Last updated: Wed May 27 20:14:45 2009
Current DC: c001mlb_node01b - partition with quorum
Version: 1.0.3-b133b3f19797c00f9189f4b66b513963f9d25db9
26 Nodes configured, 2 expected votes
3 Resources configured.
============

Online: [ c001mlb_node01a c001mlb_node01b ]
OFFLINE: [. c001mlb_node02a c001mlb_node03a c001mlb_node04a c001mlb_node05a c001mlb_node06a c001mlb_node07a c001mlb_node08a c001mlb_node09a c001mlb_node10a c
001mlb_node11a c001mlb_node12a c001mlb_node13a c001mlb_node02b c001mlb_node03b c001mlb_node04b c001mlb_node05b c001mlb_node06b c001mlb_node07b c001mlb_node0
8b c001mlb_node09b c001mlb_node10b c001mlb_node11b c001mlb_node12b c001mlb_node13b ]

Full list of resources:

ip-c001drbd01a (ocf::heartbeat:IPaddr2): Started c001mlb_node01a
ip-c001drbd01b (ocf::heartbeat:IPaddr2): Started c001mlb_node01b
Master/Slave Set: ms-drbd0
Stopped: [ drbd0:0 drbd0:1 ]

Failed actions:
drbd0:0_start_0 (node=c001mlb_node01a, call=6, rc=1, status=complete): unknown error
drbd0:0_start_0 (node=c001mlb_node01b, call=6, rc=1, status=complete): unknown error
"

I tried changing "clone-overrides-hostname ="no"and setting drbd.conf to the correct "uname -n"values and it was working!

Has anybody succeed recently to configure floating peers?
Also Andrew Beekhof [andrew [at] beekhof] told me last night that "Its no longer possible to create colocation constraints for a specific
clone instance" -Which makes impossible to enforce limitations like:
<constraints>
<rsc_colocation id= colocate-drbd0-ip-c001drbd01a rsc= ip-c001drbd01a with-rsc= ms-drbd0:0 score= INFINITY />
<rsc_colocation id= colocate-drbd0-ip-c001drbd01b rsc= ip-c001drbd01b with-rsc= ms-drbd0:1 score= INFINITY />
</constraints>

To my view it becomes impossible to assure correct resource collocations between two sites in the context of DRBD -am I missing something?

From: Raoul Bhatia [IPAX] [mailto:r.bhatia [at] ipax]

Sent: Wednesday, May 27, 2009 1:33 AM

To: pacemaker [at] clusterlabs

Cc: pacemaker [at] clusterlabs

Subject: Re: [Pacemaker] problem configuring DRBD resource in Floating Peers mode



dไษอษิ฿า โฯสฮ wrote:
>Hi,
>
>My ultimate goal is to run a bunch of servers/nodes that shall be able
>to handle a bunch of floating drbd peers.

the last time i tried to use floating peers, it did not work out properly - especially when using outdating/.. mechanisms.

i recall a discussion last year [1] but cannot remember the conclusion.

i saw your colocation request. did you check/resolve that?

i'll have a look anyways ...


>c001mlb_node01a:root >cat /etc/drbd.conf
>
># /usr/share/doc/drbd82/drbd.conf
looks fine.

>c001mlb_node01a:root >cibadmin -Q
*snip*
>
> <master id= ms-drbd0 >
>
> <meta_attributes id= ma-ms-drbd0 >
>
> <nvpair id= ma-ms-drbd0-1 name= clone_max value= 2 />
all _ have been replaced by - -> clone_max is now clone-max. please recheck your configuration to comply with pacemaker-1.0 and above.

imho, that is the reason for having 26 clone instances of each drbd
device: drbd0:0 to drbd0:25.

moreover, and withouth looking further down, i would say that this might fix some of your issues.

>
> <nvpair id= ma-ms-drbd0-2 name= clone_node_max value= 1 />
>
> <nvpair id= ma-ms-drbd0-3 name= master_max value= 1 />
>
> <nvpair id= ma-ms-drbd0-4 name= master_node_max
>value= 1 />
>
> <nvpair id= ma-ms-drbd0-5 name= notify value= yes />
>
> <nvpair id= ma-ms-drbd0-6 name= globally_unique
>value= true />
>
> <nvpair id= ma-ms-drbd0-7 name= target_role
>value= started />
>
> </meta_attributes>
>
> <primitive class= ocf provider= heartbeat type= drbd
>id= drbd0 >
>
> <instance_attributes id= ia-drbd0 >
>
> <nvpair id= ia-drbd0-1 name= drbd_resource
>value= drbd0 />
>
> <nvpair id= ia-drbd0-2 name= clone_overrides_hostname
>value= yes />
>
> </instance_attributes>
>

> <constraints>
>
> <rsc_location id= location-ip-c001drbd01a rsc= ip-c001drbd01a >
>
> <rule id= ip-c001drbd01a-rule score= -INFINITY >
>
> <expression id= exp-ip-c001drbd01a-rule value= b
>attribute= site operation= eq />
>
> </rule>
>
> </rsc_location>
>
> <rsc_location id= location-ip-c001drbd01b rsc= ip-c001drbd01b >
>
> <rule id= ip-c001drbd01b-rule score= -INFINITY >
>
> <expression id= exp-ip-c001drbd01b-rule value= a
>attribute= site operation= eq />
>
> </rule>
>
> </rsc_location>
>
> <rsc_location id= drbd0-master-1 rsc= ms-drbd0 >
>
> <rule id= drbd0-master-on-c001mlb_node01a role= master
>score= 100 >
>
> <expression id= expression-1 attribute= #uname
>operation= eq value= c001mlb_node01a />
>
> </rule>
>
> </rsc_location>
>
> <rsc_order id= order-drbd0-after-ip-c001drbd01a
>first= ip-c001drbd01a then= ms-drbd0 score= 1 />
>
> <rsc_order id= order-drbd0-after-ip-c001drbd01b
>first= ip-c001drbd01b then= ms-drbd0 score= 1 />
>
> <rsc_colocation rsc= ip-c001drbd01a score= INFINITY
>id= colocate-drbd0-ip-c001drbd01a with-rsc= ms-drbd0 />
>
> <rsc_colocation rsc= ip-c001drbd01b score= INFINITY
>id= colocate-drbd0-ip-c001drbd01b with-rsc= ms-drbd0 />
>
> </constraints>
>
> </configuration>
*snip*

>ip-c001drbd01a (ocf::heartbeat:IPaddr2): Started c001mlb_node01a
>
>Master/Slave Set: ms-drbd0
>
> Stopped: [. drbd0:0 drbd0:1 drbd0:2 drbd0:3 drbd0:4 drbd0:5
>drbd0:6 drbd0:7 drbd0:8 drbd0:9 drbd0:10 drbd0:11 drbd0:12 drbd0:13
>drbd0:14 drbd0:15 drbd0:1
>6 drbd0:17 drbd0:18 drbd0:19 drbd0:20 drbd0:21 drbd0:22 drbd0:23
>drbd0:24 drbd0:25 ]
too many clones, see above.


->
>May 26 19:14:48 c001mlb_node01a cib: [31521]: info: retrieveCib:
>Reading cluster configuration from: /var/lib/heartbeat/crm/cib.2Ij02L (digest:
>/var/lib/heartbeat/crm/cib.G64hBN)
>
>May 26 19:14:48 c001mlb_node01a pengine: [25734]: notice: print_list:
>Stopped: [. drbd0:0 drbd0:1 drbd0:2 drbd0:3 drbd0:4 drbd0:5 drbd0:6
>drbd0:7 drbd0:8 drbd0:9 drbd0:10 drbd0:11 drbd0:12 drbd0:13 drbd0:14
>drbd0:15 drbd0:16 drbd0:17 drbd0:18 drbd0:19 drbd0:20 drbd0:21
>drbd0:22
>drbd0:23 drbd0:24 drbd0:25 ]
all resources stopped. seems ok.

*snip*

>May 26 19:14:48 c001mlb_node01a pengine: [25734]: info: master_color:
>Promoting drbd0:0 (Stopped c001mlb_node01a)
>
>May 26 19:14:48 c001mlb_node01a pengine: [25734]: info: master_color:
>ms-drbd0: Promoted 1 instances of a possible 1 to master
>
>May 26 19:14:48 c001mlb_node01a pengine: [25734]: info: master_color:
>ms-drbd0: Promoted 1 instances of a possible 1 to master

2 instances have been promoted. that is correct.

>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info:
>do_state_transition: State transition S_POLICY_ENGINE ->
>S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE
>origin=handle_response ]
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: ERROR: do_fsa_action:
>Action A_DC_TIMER_STOP took 812418818s to complete
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: ERROR: do_fsa_action:
>Action A_INTEGRATE_TIMER_STOP took 812418818s to complete
>
>May 26 19:14:48 c001mlb_node01a pengine: [25734]: WARN:
>process_pe_message: Transition 7: WARNINGs found during PE processing.
>PEngine Input stored in: /var/lib/pengine/pe-warn-771.bz2
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: ERROR: do_fsa_action:
>Action A_FINALIZE_TIMER_STOP took 812418818s to complete
>
>May 26 19:14:48 c001mlb_node01a pengine: [25734]: info:
>process_pe_message: Configuration WARNINGs found during PE processing.
>Please run crm_verify -L to identify issues.

warnings found - please see this pe-warn file and/or run crm_verify -L

>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: unpack_graph:
>Unpacked transition 7: 17 actions in 17 synapses
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: do_te_invoke:
>Processing graph 7 (ref=pe_calc-dc-1243365288-55) derived from
>/var/lib/pengine/pe-warn-771.bz2
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: ERROR: do_fsa_action:
>Action A_TE_INVOKE took 812418817s to complete
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_rsc_command:
>Initiating action 5: start ip-c001drbd01a_start_0 on c001mlb_node01a
>(local)
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: do_lrm_rsc_op:
>Performing key=5:7:0:c58a32ec-ae57-4bc8-8a1e-5d7069c2f2bd
>op=ip-c001drbd01a_start_0 )

drbd seems to be up, starting ip below

>May 26 19:14:48 c001mlb_node01a lrmd: [25732]: info: rsc:ip-c001drbd01a:
>start
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_pseudo_action:
>Pseudo action 14 fired and confirmed
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_pseudo_action:
>Pseudo action 15 fired and confirmed
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_pseudo_action:
>Pseudo action 16 fired and confirmed
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_pseudo_action:
>Pseudo action 25 fired and confirmed
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_pseudo_action:
>Pseudo action 28 fired and confirmed
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_rsc_command:
>Initiating action 141: notify drbd0:0_post_notify_start_0 on
>c001mlb_node01a (local)
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: do_lrm_rsc_op:
>Performing key=141:7:0:c58a32ec-ae57-4bc8-8a1e-5d7069c2f2bd
>op=drbd0:0_notify_0 )
>
>May 26 19:14:48 c001mlb_node01a lrmd: [25732]: info: rsc:drbd0:0:
>notify
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: te_rsc_command:
>Initiating action 143: notify drbd0:0_post_notify_promote_0 on
>c001mlb_node01a (local)
>
>May 26 19:14:48 c001mlb_node01a lrmd: [25732]: info: RA output:
>(ip-c001drbd01a:start:stderr) eth0:0: warning: name may be invalid
>
>May 26 19:14:48 c001mlb_node01a crmd: [25735]: info: do_lrm_rsc_op:
>Performing key=143:7:0:c58a32ec-ae57-4bc8-8a1e-5d7069c2f2bd
>op=drbd0:0_notify_0 )
>
>May 26 19:14:48 c001mlb_node01a lrmd: [25732]: info: RA output:
>(drbd0:0:notify:stderr) 2009/05/26_19:14:48 INFO: drbd0: Using
>hostname node_0
>
>May 26 19:14:48 c001mlb_node01a lrmd: [25732]: info: RA output:
>(ip-c001drbd01a:start:stderr) 2009/05/26_19:14:48 INFO: ip -f inet
>addr add 192.168.80.213/32 brd 192.168.80.213 dev eth0 label eth0:0
>
>May 26 19:14:48 c001mlb_node01a lrmd: [25732]: info: RA output:
>(ip-c001drbd01a:start:stderr) 2009/05/26_19:14:48 INFO: ip link set
>eth0 up 2009/05/26_19:14:48 INFO: /usr/lib64/heartbeat/send_arp -i 200
>-r 5 -p /var/run/heartbeat/rsctmp/send_arp/send_arp-192.168.80.213
>eth0
>192.168.80.213 auto not_used not_used

error starting ip.

as far as i can see, this is now repeating for different nodes and as pacemaker tries to recover.

please fix the above suggestions and reply with new findings.

i usually log to syslog so that i also see the drbd/daemon/... messages to aid with debugging.

moreover, please attach the files from now on and do not c/p them into one big email. it is a lot of work to read your email...

cheers,
raoul
[1] http://www.gossamer-threads.com/lists/linuxha/dev/50929
[2] http://www.clusterlabs.org/wiki/DRBD_HowTo_1.0
--
____________________________________________________________________
DI (FH) Raoul Bhatia M.Sc. email. r.bhatia [at] ipax
Technischer Leiter

IPAX - Aloy Bhatia Hava OEG web. http://www.ipax.at
Barawitzkagasse 10/2/2/11 email. office [at] ipax
1190 Wien tel. +43 1 3670030
FN 277995t HG Wien fax. +43 1 3670030 15
____________________________________________________________________

_______________________________________________
Pacemaker mailing list
Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker
Attachments: current-cib.xml (19.9 KB)
  drbf.conf (0.95 KB)

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.