Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Pacemaker

Pacemaker cannot start the failed master as a new slave?

 

 

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded


quanta.linux at gmail

Jul 8, 2012, 9:11 PM

Post #1 of 5 (430 views)
Permalink
Pacemaker cannot start the failed master as a new slave?

Related thread:
http://oss.clusterlabs.org/pipermail/pacemaker/2011-December/012499.html

I'm going to setup failover for MySQL replication (1 master and 1 slave)
follow this guide:
https://github.com/jayjanssen/Percona-Pacemaker-Resource-Agents/blob/master/doc/PRM-setup-guide.rst

Here're the output of `crm configure show`:

node serving-6192 \
attributes p_mysql_mysql_master_IP="192.168.6.192"
node svr184R-638.localdomain \
attributes p_mysql_mysql_master_IP="192.168.6.38"
primitive p_mysql ocf:percona:mysql \
params config="/etc/my.cnf" pid="/var/run/mysqld/mysqld.pid"
socket="/var/lib/mysql/mysql.sock" replication_user="repl"
replication_passwd="x" test_user="test_user" test_passwd="x" \
op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \
op monitor interval="2s" role="Slave" timeout="30s"
OCF_CHECK_LEVEL="1" \
op start interval="0" timeout="120s" \
op stop interval="0" timeout="120s"
primitive writer_vip ocf:heartbeat:IPaddr2 \
params ip="192.168.6.8" cidr_netmask="32" \
op monitor interval="10s" \
meta is-managed="true"
ms ms_MySQL p_mysql \
meta master-max="1" master-node-max="1" clone-max="2"
clone-node-max="1" notify="true" globally-unique="false"
target-role="Master" is-managed="true"
colocation writer_vip_on_master inf: writer_vip ms_MySQL:Master
order ms_MySQL_promote_before_vip inf: ms_MySQL:promote writer_vip:start
property $id="cib-bootstrap-options" \
dc-version="1.0.12-unknown" \
cluster-infrastructure="openais" \
expected-quorum-votes="2" \
no-quorum-policy="ignore" \
stonith-enabled="false" \
last-lrm-refresh="1341801689"
property $id="mysql_replication" \
p_mysql_REPL_INFO="192.168.6.192|mysql-bin.000006|338"

`crm_mon`:

Last updated: Mon Jul 9 10:30:01 2012
Stack: openais
Current DC: serving-6192 - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ serving-6192 svr184R-638.localdomain ]

Master/Slave Set: ms_MySQL
Masters: [ serving-6192 ]
Slaves: [ svr184R-638.localdomain ]
writer_vip (ocf::heartbeat:IPaddr2): Started serving-6192
Editing `/etc/my.cnf` on the serving-6192 of wrong syntax to test
failover and it's working fine:
- svr184R-638.localdomain being promoted to become the master
- writer_vip switch to svr184R-638.localdomain

Last updated: Mon Jul 9 10:35:57 2012
Stack: openais
Current DC: serving-6192 - partition with quorum
Version: 1.0.12-unknown
2 Nodes configured, 2 expected votes
2 Resources configured.
============

Online: [ serving-6192 svr184R-638.localdomain ]

Master/Slave Set: ms_MySQL
Masters: [ svr184R-638.localdomain ]
Stopped: [ p_mysql:0 ]
writer_vip (ocf::heartbeat:IPaddr2): Started svr184R-638.localdomain

Failed actions:
p_mysql:0_monitor_5000 (node=serving-6192, call=15, rc=7,
status=complete): not running
p_mysql:0_demote_0 (node=serving-6192, call=22, rc=7,
status=complete): not running
p_mysql:0_start_0 (node=serving-6192, call=26, rc=-2, status=Timed
Out): unknown exec error

Remove the wrong syntax from `/etc/my.cnf` on serving-6192, and restart
corosync, what I would like to see is serving-6192 was started as a new
slave but it doesn't:

Failed actions:
p_mysql:0_start_0 (node=serving-6192, call=4, rc=1,
status=complete): unknown error

Here're snippet of the logs which I'm suspecting:

Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: rsc:p_mysql:0:4: start
Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: RA output:
(p_mysql:0:start:stderr) Error performing operation: The
object/attribute does not exist

Jul 09 10:46:32 serving-6192 crm_attribute: [7420]: info: Invoked:
/usr/sbin/crm_attribute -N serving-6192 -l reboot --name readable -v 0

The strange thing is I can starting it manually:

export OCF_ROOT=/usr/lib/ocf
export OCF_RESKEY_config="/etc/my.cnf"
export OCF_RESKEY_pid="/var/run/mysqld/mysqld.pid"
export OCF_RESKEY_socket="/var/lib/mysql/mysql.sock"
export OCF_RESKEY_replication_user="repl"
export OCF_RESKEY_replication_passwd="x"
export OCF_RESKEY_max_slave_lag="60"
export OCF_RESKEY_evict_outdated_slaves="false"
export OCF_RESKEY_test_user="test_user"
export OCF_RESKEY_test_passwd="x"

`sh -x /usr/lib/ocf/resource.d/percona/mysql start`: http://fpaste.org/RVGh/

Did I make something wrong?


andreas at hastexo

Jul 9, 2012, 3:08 PM

Post #2 of 5 (421 views)
Permalink
Re: Pacemaker cannot start the failed master as a new slave? [In reply to]

On 07/09/2012 06:11 AM, quanta wrote:
> Related thread:
> http://oss.clusterlabs.org/pipermail/pacemaker/2011-December/012499.html
>
> I'm going to setup failover for MySQL replication (1 master and 1 slave)
> follow this guide:
> https://github.com/jayjanssen/Percona-Pacemaker-Resource-Agents/blob/master/doc/PRM-setup-guide.rst

and you also use the latest mysql RA from resource-agents github?

>
> Here're the output of `crm configure show`:
>
> node serving-6192 \
> attributes p_mysql_mysql_master_IP="192.168.6.192"
> node svr184R-638.localdomain \
> attributes p_mysql_mysql_master_IP="192.168.6.38"
> primitive p_mysql ocf:percona:mysql \
> params config="/etc/my.cnf" pid="/var/run/mysqld/mysqld.pid"
> socket="/var/lib/mysql/mysql.sock" replication_user="repl"
> replication_passwd="x" test_user="test_user" test_passwd="x" \
> op monitor interval="5s" role="Master" OCF_CHECK_LEVEL="1" \
> op monitor interval="2s" role="Slave" timeout="30s"
> OCF_CHECK_LEVEL="1" \
> op start interval="0" timeout="120s" \
> op stop interval="0" timeout="120s"
> primitive writer_vip ocf:heartbeat:IPaddr2 \
> params ip="192.168.6.8" cidr_netmask="32" \
> op monitor interval="10s" \
> meta is-managed="true"
> ms ms_MySQL p_mysql \
> meta master-max="1" master-node-max="1" clone-max="2"
> clone-node-max="1" notify="true" globally-unique="false"
> target-role="Master" is-managed="true"
> colocation writer_vip_on_master inf: writer_vip ms_MySQL:Master
> order ms_MySQL_promote_before_vip inf: ms_MySQL:promote writer_vip:start
> property $id="cib-bootstrap-options" \
> dc-version="1.0.12-unknown" \
> cluster-infrastructure="openais" \
> expected-quorum-votes="2" \
> no-quorum-policy="ignore" \
> stonith-enabled="false" \
> last-lrm-refresh="1341801689"
> property $id="mysql_replication" \
> p_mysql_REPL_INFO="192.168.6.192|mysql-bin.000006|338"
>
> `crm_mon`:
>
> Last updated: Mon Jul 9 10:30:01 2012
> Stack: openais
> Current DC: serving-6192 - partition with quorum
> Version: 1.0.12-unknown
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
>
> Online: [ serving-6192 svr184R-638.localdomain ]
>
> Master/Slave Set: ms_MySQL
> Masters: [ serving-6192 ]
> Slaves: [ svr184R-638.localdomain ]
> writer_vip (ocf::heartbeat:IPaddr2): Started serving-6192
> Editing `/etc/my.cnf` on the serving-6192 of wrong syntax to test
> failover and it's working fine:
> - svr184R-638.localdomain being promoted to become the master
> - writer_vip switch to svr184R-638.localdomain
>
> Last updated: Mon Jul 9 10:35:57 2012
> Stack: openais
> Current DC: serving-6192 - partition with quorum
> Version: 1.0.12-unknown
> 2 Nodes configured, 2 expected votes
> 2 Resources configured.
> ============
>
> Online: [ serving-6192 svr184R-638.localdomain ]
>
> Master/Slave Set: ms_MySQL
> Masters: [ svr184R-638.localdomain ]
> Stopped: [ p_mysql:0 ]
> writer_vip (ocf::heartbeat:IPaddr2): Started svr184R-638.localdomain
>
> Failed actions:
> p_mysql:0_monitor_5000 (node=serving-6192, call=15, rc=7,
> status=complete): not running
> p_mysql:0_demote_0 (node=serving-6192, call=22, rc=7,
> status=complete): not running
> p_mysql:0_start_0 (node=serving-6192, call=26, rc=-2, status=Timed
> Out): unknown exec error
>
> Remove the wrong syntax from `/etc/my.cnf` on serving-6192, and restart
> corosync, what I would like to see is serving-6192 was started as a new
> slave but it doesn't:
>
> Failed actions:
> p_mysql:0_start_0 (node=serving-6192, call=4, rc=1,
> status=complete): unknown error
>
> Here're snippet of the logs which I'm suspecting:
>
> Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: rsc:p_mysql:0:4: start
> Jul 09 10:46:32 serving-6192 lrmd: [7321]: info: RA output:
> (p_mysql:0:start:stderr) Error performing operation: The
> object/attribute does not exist
>
> Jul 09 10:46:32 serving-6192 crm_attribute: [7420]: info: Invoked:
> /usr/sbin/crm_attribute -N serving-6192 -l reboot --name readable -v 0

Not enough logs ... at least for me ... to give more hints.

Regards,
Andreas

--
Need help with Pacemaker?
http://www.hastexo.com/now


>
> The strange thing is I can starting it manually:
>
> export OCF_ROOT=/usr/lib/ocf
> export OCF_RESKEY_config="/etc/my.cnf"
> export OCF_RESKEY_pid="/var/run/mysqld/mysqld.pid"
> export OCF_RESKEY_socket="/var/lib/mysql/mysql.sock"
> export OCF_RESKEY_replication_user="repl"
> export OCF_RESKEY_replication_passwd="x"
> export OCF_RESKEY_max_slave_lag="60"
> export OCF_RESKEY_evict_outdated_slaves="false"
> export OCF_RESKEY_test_user="test_user"
> export OCF_RESKEY_test_passwd="x"
>
> `sh -x /usr/lib/ocf/resource.d/percona/mysql start`: http://fpaste.org/RVGh/
>
> Did I make something wrong?
>
>
>
>
> _______________________________________________
> Pacemaker mailing list: Pacemaker [at] oss
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
>
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: http://bugs.clusterlabs.org
>
Attachments: signature.asc (0.22 KB)


quanta.linux at gmail

Jul 9, 2012, 7:27 PM

Post #3 of 5 (416 views)
Permalink
Re: Cannot start the failed MySQL master as a new slave? [In reply to]

On 07/10/2012 05:08 AM, Andreas Kurz wrote:
> and you also use the latest mysql RA from resource-agents github?
Yes.
> Not enough logs ... at least for me ... to give more hints. Regards,
> Andreas

Sorry, here for you: http://fpaste.org/AyOZ/

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org


quanta.linux at gmail

Jul 12, 2012, 9:21 PM

Post #4 of 5 (405 views)
Permalink
Re: Cannot start the failed MySQL master as a new slave? [In reply to]

As Patrick pointed here:
http://serverfault.com/questions/405982/mysql-pacemaker-cannot-start-the-failed-master-as-a-new-slave

I need to set the additional variables to tells the script its a
master/slave resource.
So, I've appended the following variables to my |~/.bash_profile|:

|export OCF_RESKEY_CRM_meta_clone_max="2"
export OCF_RESKEY_CRM_meta_role="Slave"|

Make it take effect |. ~/.bash_profile| and manually start mysql resource:

|sh -x /usr/lib/ocf/resource.d/percona/mysql start|: http://fpaste.org/EMwa/

and it works fine:

|mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.6.38
Master_User: repl
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000072
Read_Master_Log_Pos: 1428602
Relay_Log_File: mysqld-relay-bin.000006
Relay_Log_Pos: 39370
Relay_Master_Log_File: mysql-bin.000072
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 1428602
Relay_Log_Space: 39527
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 123
1 row in set (0.00 sec)|

I've been turing on the debug, stop corosync and restart, here're the
logs: http://fpaste.org/mZzS/
As you can see, nothing details than 'unknown error':

1.
Jul 13 10:48:06 serving-6192 crmd: [3341]: debug:
get_xpath_object: No match for
//cib_update_result//diff-added//crm_config in
/notify/cib_update_result/diff
2.
Jul 13 10:48:06 serving-6192 lrmd: [3338]: WARN: Managed
p_mysql:1:start process 3416 exited with return code 1.
3.
Jul 13 10:48:06 serving-6192 crmd: [3341]: info:
process_lrm_event: LRM operation p_mysql:1_start_0 (call=4,
rc=1, cib-update=10, confirmed=true) unknown error

Any thoughts?


On 07/10/2012 09:27 AM, quanta wrote:
>
> On 07/10/2012 05:08 AM, Andreas Kurz wrote:
>> and you also use the latest mysql RA from resource-agents github?
> Yes.
>> Not enough logs ... at least for me ... to give more hints. Regards,
>> Andreas
> Sorry, here for you: http://fpaste.org/AyOZ/


andrew at beekhof

Jul 15, 2012, 9:29 PM

Post #5 of 5 (383 views)
Permalink
Re: Cannot start the failed MySQL master as a new slave? [In reply to]

On Fri, Jul 13, 2012 at 2:21 PM, quanta <quanta.linux [at] gmail> wrote:
> As Patrick pointed here:
> http://serverfault.com/questions/405982/mysql-pacemaker-cannot-start-the-failed-master-as-a-new-slave
>
> I need to set the additional variables to tells the script its a
> master/slave resource.
> So, I've appended the following variables to my ~/.bash_profile:
>
> export OCF_RESKEY_CRM_meta_clone_max="2"
> export OCF_RESKEY_CRM_meta_role="Slave"
>
> Make it take effect . ~/.bash_profile and manually start mysql resource:

Do not, under any circumstances, define OCF_RESKEY_ variables in your
shell's profile.

These are set for you when you configure properties for the resource
in the cluster configuration. That you need to set them indicates
your config is broken.

_______________________________________________
Pacemaker mailing list: Pacemaker [at] oss
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Linux-HA pacemaker RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.