Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Interesting scenario, is there a solution?

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


davies147 at gmail

Sep 21, 2010, 8:06 AM

Post #1 of 2 (208 views)
Permalink
Interesting scenario, is there a solution?

Hi,

I have a very simple setup with 2 nodes, using the basic resource
manager. We are space constrained on the server so cannot easily
install all of the many dependencies for the more complex resource
managers, but our needs are simple :)

- Start with 2 running servers in master/slave mode, and all happy.
- Kill the master (A).
- The slave (B) is coming up
- Some transient issue prevents the RC scripts running on (B).
- (B) backs down and requests to become slave again
- (A) is down, so (B) never gets confirmation of its slave request.

Nothing more happens. A is down and B is sulking!

Can a node be persuaded to retry under these circumstances?

Perhaps there is a way to identify this odd intermediate state so we
can force a heartbeat restart or reinitialise?

Thanks for any pointers.

Regards,
Steve
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


dmaziuk at bmrb

Sep 22, 2010, 9:42 AM

Post #2 of 2 (188 views)
Permalink
Re: Interesting scenario, is there a solution? [In reply to]

On 9/21/2010 10:06 AM, Steve Davies wrote:

> - Kill the master (A).
> - The slave (B) is coming up
> - Some transient issue prevents the RC scripts running on (B).
> - (B) backs down and requests to become slave again
> - (A) is down, so (B) never gets confirmation of its slave request.
>
> Nothing more happens. A is down and B is sulking!
>
> Can a node be persuaded to retry under these circumstances?

Generally, no: there is no way to know how "transient" the "issue" is.
E.g. if a backhoe ate your uplink fiber and telco techies will fix the
cut "in a day or two" -- do you want the other node retrying for a day?
Or two? Or a week?

> Perhaps there is a way to identify this odd intermediate state so we
> can force a heartbeat restart or reinitialise?

What I have is a separate nagios setup that monitors cluster IP and
services and sends me nastygrams if they disappear.

Dima
_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.