Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Manual Resource Migration/Move

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


tobias.brunner at nine

Jul 31, 2012, 2:38 AM

Post #1 of 4 (313 views)
Permalink
Manual Resource Migration/Move

Hi list,

While trying to manually migrate a resource from one node to the other one, nothing happens. I don't have any ideas anymore, so here is some information what I did and how the configuration/logs look like:

crm status
----------
============
Last updated: Tue Jul 31 11:20:24 2012
Last change: Tue Jul 31 11:17:25 2012 via crm_resource on halab3
Stack: openais
Current DC: halab4 - partition with quorum
Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
2 Nodes configured, 2 expected votes
5 Resources configured.
============

Online: [ halab3 halab4 ]

Resource Group: groupMysql
resFsMysql (ocf::heartbeat:Filesystem): Started halab3
resIPMysql (ocf::heartbeat:IPaddr2): Started halab3
resMysql (ocf::heartbeat:mysql): Started halab3
Master/Slave Set: ms-resDRBDMysql [resDRBDMysql]
Masters: [ halab3 ]
Slaves: [ halab4 ]

crm configure show
------------------
node halab3
node halab4
primitive resDRBDMysql ocf:linbit:drbd \
params drbd_resource="mysql" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100"
primitive resFsMysql ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/mysql" directory="/var/lib/mysql" fstype="ext4" \
op start interval="0" timeout="60s" \
op stop interval="0" timeout="60s"
primitive resIPMysql ocf:heartbeat:IPaddr2 \
params ip="192.168.1.10" nic="eth0" cidr_netmask="28" \
op monitor interval="30s"
primitive resMysql ocf:heartbeat:mysql \
params config="/etc/mysql/my.cnf" datadir="/var/lib/mysql" log="/var/log/mysql.log" pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" additional_parameters="--bind-address=0.0.0.0" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s" \
op monitor interval="60s" timeout="30s"
group groupMysql resFsMysql resIPMysql resMysql
ms ms-resDRBDMysql resDRBDMysql \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Master"
location location-groupMysql-on-node1 groupMysql inf: halab3
colocation colo-groupMysql-ms-resDRBDMysql inf: groupMysql ms-resDRBDMysql:Master
order order-groupMysql-after-ms-resDRBDMysql inf: ms-resDRBDMysql:promote groupMysql:start
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false"

crm_simulate -sL
----------------
Current cluster status:
Online: [ halab3 halab4 ]

Resource Group: groupMysql
resFsMysql (ocf::heartbeat:Filesystem): Started halab3
resIPMysql (ocf::heartbeat:IPaddr2): Started halab3
resMysql (ocf::heartbeat:mysql): Started halab3
Master/Slave Set: ms-resDRBDMysql [resDRBDMysql]
Masters: [ halab3 ]
Slaves: [ halab4 ]

Allocation scores:
group_color: groupMysql allocation score on halab3: INFINITY
group_color: groupMysql allocation score on halab4: 0
group_color: resFsMysql allocation score on halab3: INFINITY
group_color: resFsMysql allocation score on halab4: 0
group_color: resIPMysql allocation score on halab3: 0
group_color: resIPMysql allocation score on halab4: 0
group_color: resMysql allocation score on halab3: 0
group_color: resMysql allocation score on halab4: 0
clone_color: ms-resDRBDMysql allocation score on halab3: INFINITY
clone_color: ms-resDRBDMysql allocation score on halab4: 0
clone_color: resDRBDMysql:0 allocation score on halab3: 10001
clone_color: resDRBDMysql:0 allocation score on halab4: 0
clone_color: resDRBDMysql:1 allocation score on halab3: 0
clone_color: resDRBDMysql:1 allocation score on halab4: 10001
native_color: resDRBDMysql:0 allocation score on halab3: 10001
native_color: resDRBDMysql:0 allocation score on halab4: 0
native_color: resDRBDMysql:1 allocation score on halab3: -INFINITY
native_color: resDRBDMysql:1 allocation score on halab4: 10001
resDRBDMysql:0 promotion score on halab3: INFINITY
resDRBDMysql:1 promotion score on halab4: 10000
native_color: resFsMysql allocation score on halab3: INFINITY
native_color: resFsMysql allocation score on halab4: -INFINITY
native_color: resIPMysql allocation score on halab3: 0
native_color: resIPMysql allocation score on halab4: -INFINITY
native_color: resMysql allocation score on halab3: 0
native_color: resMysql allocation score on halab4: -INFINITY

Transition Summary:

What I did
----------
"crm resource migrate groupMysql halab4"

What I see in the logs /var/log/corosync.log
--------------------------------------------
halab4:
Jul 31 11:30:14 halab4 cib: [13557]: info: cib_process_request: Operation complete: op cib_delete for section constraints (origin=halab3/crm_resource/3, version=0.17.28): ok (rc=0)
Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: - <cib admin_epoch="0" epoch="17" num_updates="28" />
Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + <cib epoch="18" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Tue Jul 31 11:28:04 2012" crm_feature_set="3.0.6" update-origin="halab3" update-client="cibadmin" have-quorum="1" dc-uuid="halab4" >
Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + <configuration >
Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + <constraints >
Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + <rsc_location id="cli-prefer-groupMysql" rsc="groupMysql" __crm_diff_marker__="added:top" >
Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + <rule id="cli-prefer-rule-groupMysql" score="INFINITY" boolean-op="and" >
Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + <expression id="cli-prefer-expr-groupMysql" attribute="#uname" operation="eq" value="halab4" type="string" />
Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + </rule>
Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + </rsc_location>
Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + </constraints>
Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + </configuration>
Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + </cib>
Jul 31 11:30:14 halab4 crmd: [13562]: info: abort_transition_graph: te_update_diff:126 - Triggered transition abort (complete=1, tag=diff, id=(null), magic=NA, cib=0.18.1) : Non-status change
Jul 31 11:30:14 halab4 cib: [13557]: info: cib_process_request: Operation complete: op cib_modify for section constraints (origin=halab3/crm_resource/4, version=0.18.1): ok (rc=0)
Jul 31 11:30:14 halab4 crmd: [13562]: notice: do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
Jul 31 11:30:14 halab4 pengine: [13561]: notice: unpack_config: On loss of CCM Quorum: Ignore
Jul 31 11:30:14 halab4 pengine: [13561]: notice: unpack_rsc_op: Operation monitor found resource resDRBDMysql:0 active in master mode on halab3
Jul 31 11:30:14 halab4 crmd: [13562]: notice: do_state_transition: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response ]
Jul 31 11:30:14 halab4 crmd: [13562]: info: do_te_invoke: Processing graph 2499 (ref=pe_calc-dc-1343727014-3833) derived from /var/lib/pengine/pe-input-21.bz2
Jul 31 11:30:14 halab4 crmd: [13562]: notice: run_graph: ==== Transition 2499 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pengine/pe-input-21.bz2): Complete
Jul 31 11:30:14 halab4 crmd: [13562]: notice: do_state_transition: State transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=notify_crmd ]
Jul 31 11:30:14 halab4 pengine: [13561]: notice: process_pe_message: Transition 2499: PEngine Input stored in: /var/lib/pengine/pe-input-21.bz2

halab3:
nothing logged at this time

What my problem is
------------------
The resource group "groupMysql" should migrate from halab3 to halab4, but it doesn't. If I manually stop corosync on halab3, the resource group "groupMysql" successfully starts on halab4. I don't understand why the manual migration does not work. Does anyone have any ideas? How can I debug such problems?

Thanks for every help!

Cheers,
Tobias
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


lars.ellenberg at linbit

Jul 31, 2012, 12:19 PM

Post #2 of 4 (295 views)
Permalink
Re: Manual Resource Migration/Move [In reply to]

On Tue, Jul 31, 2012 at 11:38:28AM +0200, Tobias Brunner wrote:
> Hi list,
>
> While trying to manually migrate a resource from one node to the other one, nothing happens. I don't have any ideas anymore, so here is some information what I did and how the configuration/logs look like:
>
> crm status
> ----------
> ============
> Last updated: Tue Jul 31 11:20:24 2012
> Last change: Tue Jul 31 11:17:25 2012 via crm_resource on halab3
> Stack: openais
> Current DC: halab4 - partition with quorum
> Version: 1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff
> 2 Nodes configured, 2 expected votes
> 5 Resources configured.
> ============
>
> Online: [ halab3 halab4 ]
>
> Resource Group: groupMysql
> resFsMysql (ocf::heartbeat:Filesystem): Started halab3
> resIPMysql (ocf::heartbeat:IPaddr2): Started halab3
> resMysql (ocf::heartbeat:mysql): Started halab3
> Master/Slave Set: ms-resDRBDMysql [resDRBDMysql]
> Masters: [ halab3 ]
> Slaves: [ halab4 ]
>
> crm configure show
> ------------------
> node halab3
> node halab4
> primitive resDRBDMysql ocf:linbit:drbd \
> params drbd_resource="mysql" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="100"
> primitive resFsMysql ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/mysql" directory="/var/lib/mysql" fstype="ext4" \
> op start interval="0" timeout="60s" \
> op stop interval="0" timeout="60s"
> primitive resIPMysql ocf:heartbeat:IPaddr2 \
> params ip="192.168.1.10" nic="eth0" cidr_netmask="28" \
> op monitor interval="30s"
> primitive resMysql ocf:heartbeat:mysql \
> params config="/etc/mysql/my.cnf" datadir="/var/lib/mysql" log="/var/log/mysql.log" pid="/var/run/mysqld/mysqld.pid" socket="/var/run/mysqld/mysqld.sock" additional_parameters="--bind-address=0.0.0.0" \
> op start interval="0" timeout="120s" \
> op stop interval="0" timeout="120s" \
> op monitor interval="60s" timeout="30s"
> group groupMysql resFsMysql resIPMysql resMysql
> ms ms-resDRBDMysql resDRBDMysql \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" target-role="Master"
> location location-groupMysql-on-node1 groupMysql inf: halab3

So you have a "mandatory" location constraint saying
run this thing only on halab3

And then you add ...

> colocation colo-groupMysql-ms-resDRBDMysql inf: groupMysql ms-resDRBDMysql:Master
> order order-groupMysql-after-ms-resDRBDMysql inf: ms-resDRBDMysql:promote groupMysql:start
> property $id="cib-bootstrap-options" \
> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> no-quorum-policy="ignore" \
> stonith-enabled="false"

,..

> What I did
> ----------
> "crm resource migrate groupMysql halab4"
>
> What I see in the logs /var/log/corosync.log
> --------------------------------------------
> halab4:
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib_process_request: Operation complete: op cib_delete for section constraints (origin=halab3/crm_resource/3, version=0.17.28): ok (rc=0)
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: - <cib admin_epoch="0" epoch="17" num_updates="28" />
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + <cib epoch="18" num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2" cib-last-written="Tue Jul 31 11:28:04 2012" crm_feature_set="3.0.6" update-origin="halab3" update-client="cibadmin" have-quorum="1" dc-uuid="halab4" >
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + <configuration >
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + <constraints >
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + <rsc_location id="cli-prefer-groupMysql" rsc="groupMysql" __crm_diff_marker__="added:top" >
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + <rule id="cli-prefer-rule-groupMysql" score="INFINITY" boolean-op="and" >
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + <expression id="cli-prefer-expr-groupMysql" attribute="#uname" operation="eq" value="halab4" type="string" />
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + </rule>
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + </rsc_location>
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + </constraints>
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + </configuration>
> Jul 31 11:30:14 halab4 cib: [13557]: info: cib:diff: + </cib>

... yet an other contraint, crm shell syntax equivalent below.

from the crm resource migrate:
location cli-prefer-groupMysql groupMysql inf: halab4
from your config:
location location-groupMysql-on-node1 groupMysql inf: halab3

So pacemaker gets to chose which infinity is more infinite.

> What my problem is
> ------------------
> The resource group "groupMysql" should migrate from halab3 to halab4,
> but it doesn't. If I manually stop corosync on halab3, the resource
> group "groupMysql" successfully starts on halab4. I don't understand
> why the manual migration does not work. Does anyone have any ideas?

Remove the inf: halab3, or replace it with some not infinite score.

Or did you actually do that, but did not tell us here?

Depending on what behaviour you actually want,
you may need to specify a resource stickiness as well.

> How can I debug such problems?

Experience helps ;-)

> Thanks for every help!


Lars


--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


tobias.brunner at nine

Aug 3, 2012, 7:37 AM

Post #3 of 4 (296 views)
Permalink
Re: Manual Resource Migration/Move [In reply to]

Hi list,

Thanks for the input so far, here are new findings.

> > meta master-max="1" master-node-max="1" clone-max="2"
> > clone-node-max="1" notify="true" target-role="Master">
> > location location-groupMysql-on-node1 groupMysql inf: halab3
>
> So you have a "mandatory" location constraint saying
> run this thing only on halab3
>

You're right, that's not what I want.

> Remove the inf: halab3, or replace it with some not infinite score.

Ok, that's done! Now here is a "crm configure show" from another cluster on
which "crm resource move groupApache nodeha2" doesn't work (same configuration
as halab3):

node nodeha1
node nodeha2
primitive resApache ocf:heartbeat:apache \
params configfile="/etc/apache2/apache2.conf"
statusurl="http://localhost/server-status" \
op monitor interval="1min" \
op start interval="0" timeout="40" \
op stop interval="0" timeout="60"
primitive resDRBDApache ocf:linbit:drbd \
params drbd_resource="www-data" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100"
primitive resDRBDPostgresql ocf:linbit:drbd \
params drbd_resource="postgresql" \
op start interval="0" timeout="240" \
op stop interval="0" timeout="100"
primitive resFsApache ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/www-data" directory="/home/www-data"
fstype="ext4" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
primitive resFsPostgresql ocf:heartbeat:Filesystem \
params device="/dev/drbd/by-res/postgresql"
directory="/var/lib/postgresql" fstype="ext4" \
op start interval="0" timeout="60" \
op stop interval="0" timeout="60"
primitive resIPApache ocf:heartbeat:IPaddr2 \
params ip="178.209.1.10" nic="eth0" cidr_netmask="28" \
op monitor interval="30s"
primitive resIPPostgresql ocf:heartbeat:IPaddr2 \
params ip="178.209.1.11" nic="eth0" cidr_netmask="28" \
op monitor interval="30s"
primitive resPostgresql ocf:heartbeat:pgsql \
params pgctl="/usr/lib/postgresql/8.4/bin/pg_ctl"
psql="/usr/lib/postgresql/8.4/bin/psql" pgdata="/var/lib/postgresql/8.4/main"
pghost="178.209.1.11" config="/etc/postgresql/8.4/main/postgresql.conf"
logfile="/var/log/postgresql/postgresql-8.4-main.log" pgdb="template1"
monitor_user="monitor" monitor_password="123" \
op monitor interval="30" timeout="30" depth="0" \
op start interval="0" timeout="120" \
op stop interval="0" timeout="120"
group groupApache resFsApache resIPApache resApache
group groupPostgresql resFsPostgresql resIPPostgresql resPostgresql
ms msResDRBDApache resDRBDApache \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-
max="1" notify="true" target-role="Master"
ms msResDRBDPostgresql resDRBDPostgresql \
meta master-max="1" master-node-max="1" clone-max="2" clone-node-
max="1" notify="true" target-role="Master"
location location-groupApache-on-node1 groupApache 50: nodeha1
location location-groupPostgresql-on-node1 groupPostgresql 50: nodeha1
colocation colo-groupApache-msResDRBDApache inf: groupApache
msResDRBDApache:Master
colocation colo-groupPostgresql-msResDRBDPostgresql inf: groupPostgresql
msResDRBDPostgresql:Master
order orderGroupApache-after-msResDRBDApache inf: msResDRBDApache:promote
groupApache:start
order orderGroupPostgresql-after-msResDRBDPostgresql inf:
msResDRBDPostgresql:promote groupPostgresql:start
property $id="cib-bootstrap-options" \
dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
last-lrm-refresh="1343987736"
rsc_defaults $id="rsc-options" \
resource-stickiness="100"


Before "crm resource move groupApache nodeha2":
./showscores.sh
Resource Score Node Stickiness #Fail
Migration-Threshold
resApache 100 clientisha1 100 0
resApache -INFINITY clientisha2 100 0
resDRBDApache:0 0 clientisha2 100 0
resDRBDApache:0 10100 clientisha1 100 0
resDRBDApache:0_(master) 10700 clientisha1 100 0
resDRBDApache:1 100 clientisha2 100 0
resDRBDApache:1 -INFINITY clientisha1 100 0
resDRBDApache:1_(master) -1 clientisha2 100 0
resDRBDPostgresql:0 0 clientisha2 100 0
resDRBDPostgresql:0 10100 clientisha1 100 0
resDRBDPostgresql:0_(master) 10700 clientisha1 100 0
resDRBDPostgresql:1 100 clientisha2 100 0
resDRBDPostgresql:1 -INFINITY clientisha1 100 0
resDRBDPostgresql:1_(master) -1 clientisha2 100 0
resFsApache 10450 clientisha1 100 0
resFsApache -INFINITY clientisha2 100 0
resFsPostgresql 10450 clientisha1 100 0
resFsPostgresql -INFINITY clientisha2 100 0
resIPApache 200 clientisha1 100 0
resIPApache -INFINITY clientisha2 100 0
resIPPostgresql 200 clientisha1 100 0
resIPPostgresql -INFINITY clientisha2 100 0
resPostgresql 100 clientisha1 100 0
resPostgresql -INFINITY clientisha2 100 0

After "crm resource move groupApache nodeha2":

The constraint is added:
location cli-prefer-groupApache groupApache \
rule $id="cli-prefer-rule-groupApache" inf: #uname eq nodeha2

./showscores.sh
Resource Score Node Stickiness #Fail
Migration-Threshold
resApache 100 clientisha1 100 0
resApache -INFINITY clientisha2 100 0
resDRBDApache:0 0 clientisha2 100 0
resDRBDApache:0 10100 clientisha1 100 0
resDRBDApache:0_(master) 10700 clientisha1 100 0
resDRBDApache:1 100 clientisha2 100 0
resDRBDApache:1 -INFINITY clientisha1 100 0
resDRBDApache:1_(master) -1 clientisha2 100 0
resDRBDPostgresql:0 0 clientisha2 100 0
resDRBDPostgresql:0 10100 clientisha1 100 0
resDRBDPostgresql:0_(master) 10700 clientisha1 100 0
resDRBDPostgresql:1 100 clientisha2 100 0
resDRBDPostgresql:1 -INFINITY clientisha1 100 0
resDRBDPostgresql:1_(master) -1 clientisha2 100 0
resFsApache 10450 clientisha1 100 0
resFsApache -INFINITY clientisha2 100 0
resFsPostgresql 10450 clientisha1 100 0
resFsPostgresql -INFINITY clientisha2 100 0
resIPApache 200 clientisha1 100 0
resIPApache -INFINITY clientisha2 100 0
resIPPostgresql 200 clientisha1 100 0
resIPPostgresql -INFINITY clientisha2 100 0
resPostgresql 100 clientisha1 100 0
resPostgresql -INFINITY clientisha2 100 0

The scores don't look like they are changing.

The log looks like that:
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib_process_request: Operation
complete: op cib_delete for section constraints
(origin=nodeha1/crm_resource/3, version=0.69.2): ok (rc=0)
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: - <cib admin_epoch="0"
epoch="69" num_updates="2" />
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <cib epoch="70"
num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2"
crm_feature_set="3.0.6" update-origin="nodeha1" update-client="crm_resource"
cib-last-written="Fri Aug 3 16:31:40 2012" have-quorum="1" dc-uuid="nodeha2"
>
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <configuration >
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <constraints >
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <rsc_location
id="cli-prefer-groupApache" rsc="groupApache" __crm_diff_marker__="added:top"
>
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <rule id="cli-
prefer-rule-groupApache" score="INFINITY" boolean-op="and" >
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <expression
id="cli-prefer-expr-groupApache" attribute="#uname" operation="eq"
value="nodeha2" type="string" />
Aug 03 16:33:11 nodeha2 crmd: [4173]: info: abort_transition_graph:
te_update_diff:126 - Triggered transition abort (complete=1, tag=diff,
id=(null), magic=NA, cib=0.70.1) : Non-status change
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </rule>
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </rsc_location>
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </constraints>
Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: do_state_transition: State
transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
origin=abort_transition_graph ]
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </configuration>
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </cib>
Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib_process_request: Operation
complete: op cib_modify for section constraints
(origin=nodeha1/crm_resource/4, version=0.70.1): ok (rc=0)
Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: unpack_config: On loss of CCM
Quorum: Ignore
Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: unpack_rsc_op: Operation
monitor found resource resDRBDPostgresql:0 active in master mode on nodeha1
Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: unpack_rsc_op: Operation
monitor found resource resDRBDApache:0 active in master mode on nodeha1
Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: do_state_transition: State
transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
cause=C_IPC_MESSAGE origin=handle_response ]
Aug 03 16:33:11 nodeha2 crmd: [4173]: info: do_te_invoke: Processing graph 332
(ref=pe_calc-dc-1344004391-579) derived from /var/lib/pengine/pe-input-87.bz2
Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: run_graph: ==== Transition 332
(Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
Source=/var/lib/pengine/pe-input-87.bz2): Complete
Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: do_state_transition: State
transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
cause=C_FSA_INTERNAL origin=notify_crmd ]
Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: process_pe_message:
Transition 332: PEngine Input stored in: /var/lib/pengine/pe-input-87.bz2

Maybe I need to clear some counters or score caches?

> > How can I debug such problems?
>
> Experience helps ;-)

That's really true. And I'm actually in the process of gaining experience =)

Cheers,
Tobias

--
Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich
Support +41 44 637 40 40 | Tel +41 44 637 40 00 | Direct +41 44 637 40 13
Skype nine.ch_support
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dejanmm at fastmail

Aug 8, 2012, 7:39 AM

Post #4 of 4 (265 views)
Permalink
Re: Manual Resource Migration/Move [In reply to]

Hi,

On Fri, Aug 03, 2012 at 04:37:55PM +0200, Tobias Brunner wrote:
> Hi list,
>
> Thanks for the input so far, here are new findings.
>
> > > meta master-max="1" master-node-max="1" clone-max="2"
> > > clone-node-max="1" notify="true" target-role="Master">
> > > location location-groupMysql-on-node1 groupMysql inf: halab3
> >
> > So you have a "mandatory" location constraint saying
> > run this thing only on halab3
> >
>
> You're right, that's not what I want.
>
> > Remove the inf: halab3, or replace it with some not infinite score.
>
> Ok, that's done! Now here is a "crm configure show" from another cluster on
> which "crm resource move groupApache nodeha2" doesn't work (same configuration
> as halab3):
>
> node nodeha1
> node nodeha2
> primitive resApache ocf:heartbeat:apache \
> params configfile="/etc/apache2/apache2.conf"
> statusurl="http://localhost/server-status" \
> op monitor interval="1min" \
> op start interval="0" timeout="40" \
> op stop interval="0" timeout="60"
> primitive resDRBDApache ocf:linbit:drbd \
> params drbd_resource="www-data" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="100"
> primitive resDRBDPostgresql ocf:linbit:drbd \
> params drbd_resource="postgresql" \
> op start interval="0" timeout="240" \
> op stop interval="0" timeout="100"
> primitive resFsApache ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/www-data" directory="/home/www-data"
> fstype="ext4" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="60"
> primitive resFsPostgresql ocf:heartbeat:Filesystem \
> params device="/dev/drbd/by-res/postgresql"
> directory="/var/lib/postgresql" fstype="ext4" \
> op start interval="0" timeout="60" \
> op stop interval="0" timeout="60"
> primitive resIPApache ocf:heartbeat:IPaddr2 \
> params ip="178.209.1.10" nic="eth0" cidr_netmask="28" \
> op monitor interval="30s"
> primitive resIPPostgresql ocf:heartbeat:IPaddr2 \
> params ip="178.209.1.11" nic="eth0" cidr_netmask="28" \
> op monitor interval="30s"
> primitive resPostgresql ocf:heartbeat:pgsql \
> params pgctl="/usr/lib/postgresql/8.4/bin/pg_ctl"
> psql="/usr/lib/postgresql/8.4/bin/psql" pgdata="/var/lib/postgresql/8.4/main"
> pghost="178.209.1.11" config="/etc/postgresql/8.4/main/postgresql.conf"
> logfile="/var/log/postgresql/postgresql-8.4-main.log" pgdb="template1"
> monitor_user="monitor" monitor_password="123" \
> op monitor interval="30" timeout="30" depth="0" \
> op start interval="0" timeout="120" \
> op stop interval="0" timeout="120"
> group groupApache resFsApache resIPApache resApache
> group groupPostgresql resFsPostgresql resIPPostgresql resPostgresql
> ms msResDRBDApache resDRBDApache \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-
> max="1" notify="true" target-role="Master"
> ms msResDRBDPostgresql resDRBDPostgresql \
> meta master-max="1" master-node-max="1" clone-max="2" clone-node-
> max="1" notify="true" target-role="Master"
> location location-groupApache-on-node1 groupApache 50: nodeha1
> location location-groupPostgresql-on-node1 groupPostgresql 50: nodeha1
> colocation colo-groupApache-msResDRBDApache inf: groupApache
> msResDRBDApache:Master
> colocation colo-groupPostgresql-msResDRBDPostgresql inf: groupPostgresql
> msResDRBDPostgresql:Master
> order orderGroupApache-after-msResDRBDApache inf: msResDRBDApache:promote
> groupApache:start
> order orderGroupPostgresql-after-msResDRBDPostgresql inf:
> msResDRBDPostgresql:promote groupPostgresql:start
> property $id="cib-bootstrap-options" \
> dc-version="1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> no-quorum-policy="ignore" \
> stonith-enabled="false" \
> last-lrm-refresh="1343987736"
> rsc_defaults $id="rsc-options" \
> resource-stickiness="100"
>
>
> Before "crm resource move groupApache nodeha2":
> ./showscores.sh
> Resource Score Node Stickiness #Fail
> Migration-Threshold
> resApache 100 clientisha1 100 0
> resApache -INFINITY clientisha2 100 0
> resDRBDApache:0 0 clientisha2 100 0
> resDRBDApache:0 10100 clientisha1 100 0
> resDRBDApache:0_(master) 10700 clientisha1 100 0
> resDRBDApache:1 100 clientisha2 100 0
> resDRBDApache:1 -INFINITY clientisha1 100 0
> resDRBDApache:1_(master) -1 clientisha2 100 0
> resDRBDPostgresql:0 0 clientisha2 100 0
> resDRBDPostgresql:0 10100 clientisha1 100 0
> resDRBDPostgresql:0_(master) 10700 clientisha1 100 0
> resDRBDPostgresql:1 100 clientisha2 100 0
> resDRBDPostgresql:1 -INFINITY clientisha1 100 0
> resDRBDPostgresql:1_(master) -1 clientisha2 100 0
> resFsApache 10450 clientisha1 100 0
> resFsApache -INFINITY clientisha2 100 0
> resFsPostgresql 10450 clientisha1 100 0
> resFsPostgresql -INFINITY clientisha2 100 0
> resIPApache 200 clientisha1 100 0
> resIPApache -INFINITY clientisha2 100 0
> resIPPostgresql 200 clientisha1 100 0
> resIPPostgresql -INFINITY clientisha2 100 0
> resPostgresql 100 clientisha1 100 0
> resPostgresql -INFINITY clientisha2 100 0

abs(-inf) > inf

I guess that you need to do some resource cleanup to remove
record of old failures.

It's interesting that you have three sets of node names (one
from the config, another from showscores, and third from you).
Whoever got confused.

Thanks,

Dejan

> After "crm resource move groupApache nodeha2":
>
> The constraint is added:
> location cli-prefer-groupApache groupApache \
> rule $id="cli-prefer-rule-groupApache" inf: #uname eq nodeha2
>
> ./showscores.sh
> Resource Score Node Stickiness #Fail
> Migration-Threshold
> resApache 100 clientisha1 100 0
> resApache -INFINITY clientisha2 100 0
> resDRBDApache:0 0 clientisha2 100 0
> resDRBDApache:0 10100 clientisha1 100 0
> resDRBDApache:0_(master) 10700 clientisha1 100 0
> resDRBDApache:1 100 clientisha2 100 0
> resDRBDApache:1 -INFINITY clientisha1 100 0
> resDRBDApache:1_(master) -1 clientisha2 100 0
> resDRBDPostgresql:0 0 clientisha2 100 0
> resDRBDPostgresql:0 10100 clientisha1 100 0
> resDRBDPostgresql:0_(master) 10700 clientisha1 100 0
> resDRBDPostgresql:1 100 clientisha2 100 0
> resDRBDPostgresql:1 -INFINITY clientisha1 100 0
> resDRBDPostgresql:1_(master) -1 clientisha2 100 0
> resFsApache 10450 clientisha1 100 0
> resFsApache -INFINITY clientisha2 100 0
> resFsPostgresql 10450 clientisha1 100 0
> resFsPostgresql -INFINITY clientisha2 100 0
> resIPApache 200 clientisha1 100 0
> resIPApache -INFINITY clientisha2 100 0
> resIPPostgresql 200 clientisha1 100 0
> resIPPostgresql -INFINITY clientisha2 100 0
> resPostgresql 100 clientisha1 100 0
> resPostgresql -INFINITY clientisha2 100 0
>
> The scores don't look like they are changing.
>
> The log looks like that:
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib_process_request: Operation
> complete: op cib_delete for section constraints
> (origin=nodeha1/crm_resource/3, version=0.69.2): ok (rc=0)
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: - <cib admin_epoch="0"
> epoch="69" num_updates="2" />
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <cib epoch="70"
> num_updates="1" admin_epoch="0" validate-with="pacemaker-1.2"
> crm_feature_set="3.0.6" update-origin="nodeha1" update-client="crm_resource"
> cib-last-written="Fri Aug 3 16:31:40 2012" have-quorum="1" dc-uuid="nodeha2"
> >
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <configuration >
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <constraints >
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <rsc_location
> id="cli-prefer-groupApache" rsc="groupApache" __crm_diff_marker__="added:top"
> >
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <rule id="cli-
> prefer-rule-groupApache" score="INFINITY" boolean-op="and" >
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + <expression
> id="cli-prefer-expr-groupApache" attribute="#uname" operation="eq"
> value="nodeha2" type="string" />
> Aug 03 16:33:11 nodeha2 crmd: [4173]: info: abort_transition_graph:
> te_update_diff:126 - Triggered transition abort (complete=1, tag=diff,
> id=(null), magic=NA, cib=0.70.1) : Non-status change
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </rule>
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </rsc_location>
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </constraints>
> Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: do_state_transition: State
> transition S_IDLE -> S_POLICY_ENGINE [ input=I_PE_CALC cause=C_FSA_INTERNAL
> origin=abort_transition_graph ]
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </configuration>
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib:diff: + </cib>
> Aug 03 16:33:11 nodeha2 cib: [4168]: info: cib_process_request: Operation
> complete: op cib_modify for section constraints
> (origin=nodeha1/crm_resource/4, version=0.70.1): ok (rc=0)
> Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: unpack_config: On loss of CCM
> Quorum: Ignore
> Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: unpack_rsc_op: Operation
> monitor found resource resDRBDPostgresql:0 active in master mode on nodeha1
> Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: unpack_rsc_op: Operation
> monitor found resource resDRBDApache:0 active in master mode on nodeha1
> Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: do_state_transition: State
> transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE [ input=I_PE_SUCCESS
> cause=C_IPC_MESSAGE origin=handle_response ]
> Aug 03 16:33:11 nodeha2 crmd: [4173]: info: do_te_invoke: Processing graph 332
> (ref=pe_calc-dc-1344004391-579) derived from /var/lib/pengine/pe-input-87.bz2
> Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: run_graph: ==== Transition 332
> (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0,
> Source=/var/lib/pengine/pe-input-87.bz2): Complete
> Aug 03 16:33:11 nodeha2 crmd: [4173]: notice: do_state_transition: State
> transition S_TRANSITION_ENGINE -> S_IDLE [ input=I_TE_SUCCESS
> cause=C_FSA_INTERNAL origin=notify_crmd ]
> Aug 03 16:33:11 nodeha2 pengine: [4172]: notice: process_pe_message:
> Transition 332: PEngine Input stored in: /var/lib/pengine/pe-input-87.bz2
>
> Maybe I need to clear some counters or score caches?
>
> > > How can I debug such problems?
> >
> > Experience helps ;-)
>
> That's really true. And I'm actually in the process of gaining experience =)
>
> Cheers,
> Tobias
>
> --
> Nine Internet Solutions AG, Albisriederstr. 243a, CH-8047 Zuerich
> Support +41 44 637 40 40 | Tel +41 44 637 40 00 | Direct +41 44 637 40 13
> Skype nine.ch_support
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.