Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

MMM conflict with Pacemaker

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


marcus at synchromedia

Feb 16, 2012, 1:17 AM

Post #1 of 5 (706 views)
Permalink
MMM conflict with Pacemaker

I have 5 servers where 2 are running a redundant web front-end with pacemaker (managing a single floating IP), two are running MySQL with mmm agents and the last one is running the mmm monitor node. So at present there is no overlap between these groups. I need to retire one of the web servers and its functions will be moved to the machine currently doing mmm monitoring. Easier said than done.
If I install pacemaker (from the linux-ha PPA for Lucid, with empty initial config, as per the docs) and start its corosync service, mmm's monitor goes nuts, loses connectivity to agents causes them to drops their floating IP (even though it's not on the machines involved with pacemaker). I can appreciate that there is some overlap in functionality, but I don't see why it should conflict like this. Anyone got an explanation? Is anyone else running this combo?

I've temporarily bypassed the front-end so I can work on this, so I'm clear to start entirely from scratch. This is proving difficult too, since the shifting terminology means documentation is mostly out of sync - of the three guides I've tried so far, one doesn't mention ha.cf at all (others do, but with obsolete options), one suggests doing everything with corosync (though appears to be missing any config for pacemaker). One thing that would be very helpful is something to explain the relative merits of ucast, bcast and mcast options, as I suspect they may be part of the problem I'm seeing with mmm.

(and I'm not looking to switch to DRBD!)

Marcus
--
Marcus Bointon
Synchromedia Limited: Creators of http://www.smartmessages.net/
UK info [at] han CRM solutions
marcus [at] synchromedia | http://www.synchromedia.co.uk/



_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


mark at grennan

Feb 16, 2012, 9:00 AM

Post #2 of 5 (664 views)
Permalink
Re: MMM conflict with Pacemaker [In reply to]

Hi Marcus,

One Issue I can think of is, Pacemaker wants to bind the floating IP as eth#:#, while MMM wants to use a different method that can only be seen with the IP command. I think they are fighting over who owns the floating IP.

Have you read my full HOWTO at http://www.mysqlfanboy.com/2012/02/the-full-monty-version-2-3/ ?

Yes HA systems are very confusing. Pacemaker is the name of an older application. Corasync is it's new name but some of the files still maintain the old name.

You should know, even the developer of MMM has abandoned it. The author of MMM (Alexey Kovyrin) said in a reply to Brian his blog “…Every time I try to add HA to my clusters I remember MMM and want to stab myself because I simply could not trust my data to the tool…”. Read (http://www.xaprb.com/blog/2011/05/04/whats-wrong-with-mmm/)

Pacemaker is the way to go. But, yes, it is difficult. I hope my HOWTO helps.


----- Original Message -----
From: "Marcus Bointon" <marcus [at] synchromedia>
To: linux-ha [at] lists
Sent: Thursday, February 16, 2012 3:17:32 AM
Subject: [Linux-HA] MMM conflict with Pacemaker

I have 5 servers where 2 are running a redundant web front-end with pacemaker (managing a single floating IP), two are running MySQL with mmm agents and the last one is running the mmm monitor node. So at present there is no overlap between these groups. I need to retire one of the web servers and its functions will be moved to the machine currently doing mmm monitoring. Easier said than done.
If I install pacemaker (from the linux-ha PPA for Lucid, with empty initial config, as per the docs) and start its corosync service, mmm's monitor goes nuts, loses connectivity to agents causes them to drops their floating IP (even though it's not on the machines involved with pacemaker). I can appreciate that there is some overlap in functionality, but I don't see why it should conflict like this. Anyone got an explanation? Is anyone else running this combo?

I've temporarily bypassed the front-end so I can work on this, so I'm clear to start entirely from scratch. This is proving difficult too, since the shifting terminology means documentation is mostly out of sync - of the three guides I've tried so far, one doesn't mention ha.cf at all (others do, but with obsolete options), one suggests doing everything with corosync (though appears to be missing any config for pacemaker). One thing that would be very helpful is something to explain the relative merits of ucast, bcast and mcast options, as I suspect they may be part of the problem I'm seeing with mmm.

(and I'm not looking to switch to DRBD!)

Marcus
--
Marcus Bointon
Synchromedia Limited: Creators of http://www.smartmessages.net/
UK info [at] han CRM solutions
marcus [at] synchromedia | http://www.synchromedia.co.uk/



_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


marcus at synchromedia

Feb 16, 2012, 10:30 AM

Post #3 of 5 (676 views)
Permalink
Re: MMM conflict with Pacemaker [In reply to]

On 16 Feb 2012, at 18:00, Mark Grennan wrote:

> Yes HA systems are very confusing.

It's not so much that - it's more that heartbeat/crm/pacemaker/corosync is confusing, not least because it keeps changing its name. Constant changing of names, nomenclature and config settings guarantees that any articles written about it won't work for long.

> Pacemaker is the name of an older application. Corasync is it's new name but some of the files still maintain the old name.

Huh? So why does corosync need setting up to work with pacemaker if it is now pacemaker? Even your doc installs them (and heartbeat) from separate packages!

> One Issue I can think of is, Pacemaker wants to bind the floating IP as eth#:#, while MMM wants to use a different method that can only be seen with the IP command. I think they are fighting over who owns the floating IP.

But pacemaker isn't even running on the machines the mmm float is on! It's somehow interfering with the monitoring node, not the float that it's managing. I don't have a problem with using the ip command - I was under the impression it's how things are supposed to be done now? I've seen mixtures of ifconfig-style network config coexisting quite happily with ip-style ones before.

My original config:

server1: pacemaker
server2: pacemaker
server3: mmm monitor
server4: mmm agent
server5: mmm agent

There is a floating IP on servers 1 and 2, and another one on servers 4 and 5.

What I want to change to:

server2: pacemaker
server3: pacemaker + mmm monitor
server4: mmm agent
server5: mmm agent

Here there is a floating IP on 2 and 3, and another on 4 and 5. I don't see any reason they should conflict since there is no overlap of machines that floats are on. What seems to happen is that as soon as corosync is started, the mmm monitor can no longer see the network at all. I suspect this could be something to do with the suggested setting of using the network address for bindnetaddr in corosync.

I'm still mystified by whether I should use ucast, mcast or bcast - previous setups I've done with crm have used ucast. I see in your example you're binding to a private IP for corosync, but I can't understand why you're using a public IP for mcast, or why it's even there at all.

Your guide wasn't one of the ones I'd found, so thanks for the pointer. The most interesting one for me was this one, since it is closest to my own config and seems quite recent (i.e. it even mentions corosync): https://wiki.ubuntu.com/ClusterStack/LucidTesting
The official 'cluster from scratch' PDF skips over quite a few bits of vital info, so I found I couldn't really use it.

My mmm config was originally installed by Percona, and I've done several others since. mmm has always worked beautifully for me (even through multiple hardware and network failures), and the main complaint I've seen about it (1062 errors) is nothing to do with mmm. I fully understand that it has problems, however it has the advantage of being very stable and trivially easy to understand and configure. While I keep reading good things about pacemaker, the practical aspects of getting it to work have always turned into a yak-shaving festival, so I've always been put off pursuing it for anything beyond management of a single IP. One critical aspect of an HA system is that it should be really easy to deal with when things go wrong; I'd put xtrabackup in this category - it's great (though I hope you have automated tests for your restores as it went through a patch late last year when they were broken!).

Marcus
--
Marcus Bointon
Synchromedia Limited: Creators of http://www.smartmessages.net/
UK info [at] han CRM solutions
marcus [at] synchromedia | http://www.synchromedia.co.uk/



_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


andrew at beekhof

Feb 17, 2012, 2:27 AM

Post #4 of 5 (666 views)
Permalink
Re: MMM conflict with Pacemaker [In reply to]

On Fri, Feb 17, 2012 at 4:00 AM, Mark Grennan <mark [at] grennan> wrote:
> Hi Marcus,
>
> One Issue I can think of is, Pacemaker wants to bind the floating IP as eth#:#, while MMM wants to use a different method that can only be seen with the IP command. I think they are fighting over who owns the floating IP.
>
> Have you read my full HOWTO at http://www.mysqlfanboy.com/2012/02/the-full-monty-version-2-3/ ?
>
> Yes HA systems are very confusing. Pacemaker is the name of an older application. Corasync is it's new name but some of the files still maintain the old name.

Not quite. Pacemaker uses Corosync to send messages to instances of
itself on other nodes.

Recommended reading:
http://theclusterguy.clusterlabs.org/post/1262495133/pacemaker-heartbeat-corosync-wtf

>
> You should know, even the developer of MMM has abandoned it. The author of MMM (Alexey Kovyrin) said in a reply to Brian his blog Every time I try to add HA to my clusters I remember MMM and want to stab myself because I simply could not trust my data to the tool. Read (http://www.xaprb.com/blog/2011/05/04/whats-wrong-with-mmm/)
>
> Pacemaker is the way to go. But, yes, it is difficult. I hope my HOWTO helps.
>
>
> ----- Original Message -----
> From: "Marcus Bointon" <marcus [at] synchromedia>
> To: linux-ha [at] lists
> Sent: Thursday, February 16, 2012 3:17:32 AM
> Subject: [Linux-HA] MMM conflict with Pacemaker
>
> I have 5 servers where 2 are running a redundant web front-end with pacemaker (managing a single floating IP), two are running MySQL with mmm agents and the last one is running the mmm monitor node. So at present there is no overlap between these groups. I need to retire one of the web servers and its functions will be moved to the machine currently doing mmm monitoring. Easier said than done.
> If I install pacemaker (from the linux-ha PPA for Lucid, with empty initial config, as per the docs) and start its corosync service, mmm's monitor goes nuts, loses connectivity to agents causes them to drops their floating IP (even though it's not on the machines involved with pacemaker). I can appreciate that there is some overlap in functionality, but I don't see why it should conflict like this. Anyone got an explanation? Is anyone else running this combo?
>
> I've temporarily bypassed the front-end so I can work on this, so I'm clear to start entirely from scratch. This is proving difficult too, since the shifting terminology means documentation is mostly out of sync - of the three guides I've tried so far, one doesn't mention ha.cf at all (others do, but with obsolete options), one suggests doing everything with corosync (though appears to be missing any config for pacemaker). One thing that would be very helpful is something to explain the relative merits of ucast, bcast and mcast options, as I suspect they may be part of the problem I'm seeing with mmm.
>
> (and I'm not looking to switch to DRBD!)
>
> Marcus
> --
> Marcus Bointon
> Synchromedia Limited: Creators of http://www.smartmessages.net/
> UK info [at] han CRM solutions
> marcus [at] synchromedia | http://www.synchromedia.co.uk/
>
>
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


mark at grennan

Feb 17, 2012, 8:29 AM

Post #5 of 5 (675 views)
Permalink
Re: MMM conflict with Pacemaker [In reply to]

Did I say all this is confusing? Here is a really a great talk on HA. There is a good history of Linux HA and Pacemaker starting at about 9:00 minutes in. Some of parts of systems have been broken into project of their own while others have been combined. Cluster resource are at the heart of what you want done but this is also some of the smoke and mirrors that of some of these packages. Some just call your init scripts. (/etc/init.d) some call their on init scripts. (Heartbeat) and still others expect you to write your own.

> But pacemaker isn't even running on the machines the mmm float is on!

Remember MMM and MHA for that matter, use SSH, with certs, to reach out and run stuff on remote systems. Where pacemaker or MMM is running and what can be effected is a bit tricky.


> ...I can't understand why you're using a public IP for mcast, or why it's even there at all....

There are many refinements I could make to my setup document. I took some short cuts to help people just get it working. But, not so sort they would have to rebuild the system to make it better. Networking is one of them. I like to use multiple network interfaces to isolate the database traffic from all the "systems" traffic. Multi nics is also good because pacemaker can check through a different path.

> I'm still mystified by whether I should use ucast, mcast or bcast...

If I am using a crossover cable to connect two hosts together, I just broadcast the heartbeat out of the appropriate interface. (bcast eth3) If there are more then two hosts in the Pacemaker cluster on the same private network I use mcast.

> ...can't understand why you're using a public IP for mcast...

I'm using mcast because it's the best way to talk to multiple nodes and I expect some people will try that. 239.255.42.0 is not a public IP. (http://tldp.org/HOWTO/Multicast-HOWTO-2.html) The range 224.0.0.0 - 239.255.255.255 is reserved for Multi-Cast and the range 239.0.0.0 to 239.255.255.255 is reserved for this administrative scoping.

> My mmm config was originally installed by Percona, and I've done several others since.

MMM was the way to go until just recently. If it's working for you keep using it. But it may be already at it's end of life. Here is another resource (http://technocation.org/content/oursql-episode-67%3A-ha-and-replication) on what's been happening.

> One critical aspect of an HA system is that it should be really easy to deal with when things go wrong;

This may be the biggest problem with HA/MySQL systems. If you can't fix it when it breaks what good is it. And, complicated is the enemy or reliability.


_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.