Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Primary/Secondary + Secondary/Primary

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


wyldfury at gmail

Mar 11, 2008, 8:47 AM

Post #1 of 7 (334 views)
Permalink
Primary/Secondary + Secondary/Primary

Hi,

I'm still testing a setup with 2 nodes and a primary partition on each
machine with the secondary on the opposite machine. I'm running drbd8
with Heartbeat 2.1.3 + CRM. It works fine if one of the machines fails
completely (hold in the power til it shuts off or stopping Heartbeat).
I'm running into a problem when I just pull the network cable on one
machine and then plug it in again after a while. Obviously nothing has
failed as far as heartbeat is concerned so the "failed" node takes
back it's primary drbd partition and causes a split brain when I plug
the cable back in. Would I need to add something like a pingd
primitive and base the promotion of a drbd partition on the result
from pingd?
Or would STONITH be what I need to look at? I've not looked at STONITH
at all yet.

If/when I use this in a production environment the machines are going
to be administered remotely so the machines need to sort this sort of
thing out from rules rather than intervention from me.

I have one other stupid question, once I've brought a failed node back
up and checked that it's ok, how do I switch the primary partition
back to the recovered node? Make the changes through drbdadm or using
crm_resource and those programs? Like I said, the machines would be
administered remotely so I can't use a GUI. I need to be able to do it
all from commandline.

Thanks
Guy

--
Don't just do something...sit there!
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


pc at matrixonline

Mar 11, 2008, 9:08 AM

Post #2 of 7 (313 views)
Permalink
Re: Primary/Secondary + Secondary/Primary [In reply to]

Guy wrote:
> Hi,
>
> I'm still testing a setup with 2 nodes and a primary partition on each
> machine with the secondary on the opposite machine. I'm running drbd8
> with Heartbeat 2.1.3 + CRM. It works fine if one of the machines fails
> completely (hold in the power til it shuts off or stopping Heartbeat).
> I'm running into a problem when I just pull the network cable on one
> machine and then plug it in again after a while. Obviously nothing has
> failed as far as heartbeat is concerned so the "failed" node takes
> back it's primary drbd partition and causes a split brain when I plug
> the cable back in. Would I need to add something like a pingd
> primitive and base the promotion of a drbd partition on the result
> from pingd?

Do you have a secondary communication line for heartbeat? Like a searial
cable or crossover network cable. If so then you need to look into dopd.
By using dopd one node can tell the other that it has been outdated
using heartbeats secondary communication channel.

Then, yes, you need some sort of pingd based rule so that the node that
has lost its network connections knows it is in a weaker state than the
other one.

With both these things together, you should be able to have a very
reliable cluster without the need for a stonith (but you will still need
to take care that should the problem fix itself, the node doesn't come
back online and start messing things up).

Paul
(aka Gargoyle)
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


wyldfury at gmail

Mar 13, 2008, 2:01 AM

Post #3 of 7 (307 views)
Permalink
Re: Primary/Secondary + Secondary/Primary [In reply to]

On 11/03/2008, Paul Court <pc[at]matrixonline.co.uk> wrote:
> Do you have a secondary communication line for heartbeat? Like a searial
> cable or crossover network cable. If so then you need to look into dopd.
> By using dopd one node can tell the other that it has been outdated
> using heartbeats secondary communication channel.

I do now. But I don't see much in the way of howto's or wiki entries
for either the heartbeat peer outdater or the one with drbd. Are there
any good docs on how it works and suchlike?

> Then, yes, you need some sort of pingd based rule so that the node that
> has lost its network connections knows it is in a weaker state than the
> other one.

I'm assuming dopd would need pingd to tell it whether it's the weaker
machine? I'm assuming I can do the whole dopd thing through crm as
with pingd?

Thanks
Guy

--
Don't just do something...sit there!
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


florian.schmidt at altroconsult

Mar 13, 2008, 2:52 AM

Post #4 of 7 (307 views)
Permalink
AW: Primary/Secondary + Secondary/Primary [In reply to]

Mit freundlichen Grüßen

> On 11/03/2008, Paul Court <pc[at]matrixonline.co.uk> wrote:
> > Do you have a secondary communication line for heartbeat? Like a searial
> > cable or crossover network cable. If so then you need to look into dopd.
> > By using dopd one node can tell the other that it has been outdated
> > using heartbeats secondary communication channel.
>
> I do now. But I don't see much in the way of howto's or wiki entries
> for either the heartbeat peer outdater or the one with drbd. Are there
> any good docs on how it works and suchlike?

Here's something to read about dopd:
http://blogs.linbit.com/florian/2007/10/01/an-underrated-cluster-admins-companion-dopd/


> > Then, yes, you need some sort of pingd based rule so that the node that
> > has lost its network connections knows it is in a weaker state than the
> > other one.
>
> I'm assuming dopd would need pingd to tell it whether it's the weaker
> machine? I'm assuming I can do the whole dopd thing through crm as
> with pingd?

These are 2 differnt things:
dopd is for outdating the secondaries DRBD-resources in case of DRBD split-brain, which is not the same as Heartbeat split-brain. It only works, if there are Heartbeat-lines which are different from DRBD-connection.

Pingd is for checking connectivity to other ping nodes (such as routers or switches) and maybe to switch the resources in case that other cluster-nodes have better connectivity

Regards
Florian
>
> Thanks
> Guy
>
> --
> Don't just do something...sit there!
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


wyldfury at gmail

Mar 13, 2008, 4:24 AM

Post #5 of 7 (309 views)
Permalink
Re: Primary/Secondary + Secondary/Primary [In reply to]

On 13/03/2008, Schmidt, Florian <florian.schmidt[at]altroconsult.de> wrote:
> Here's something to read about dopd:
> http://blogs.linbit.com/florian/2007/10/01/an-underrated-cluster-admins-companion-dopd/

I read that and then used the instructions from
http://www.drbd.org/users-guide/s-heartbeat-dopd.html.

> These are 2 differnt things:
> dopd is for outdating the secondaries DRBD-resources in case of DRBD split-brain, which is not the same as Heartbeat split-brain. It only works, if there are Heartbeat-lines which are different from DRBD-connection.
>
> Pingd is for checking connectivity to other ping nodes (such as routers or switches) and maybe to switch the resources in case that other cluster-nodes have better connectivity

I'm using the heartbeat outdater at the moment. My setup is using two
nics. One to the local network and one with crossover cable to the
other node (the production machines will have 3 nics, 1
external/internet, 1 internal network and one with crossover). When I
take out the network cable from node1 it fences the secondary
partition (/dev/drbd1) on node1 correctly, but the primary partition
(/dev/drbd0) on node1 becomes split-brain. Node2 in the mean time
keeps its primary partition (/dev/drbd1) and changes /dev/drbd0 to
primary just fine. So I still land up having to manually intervene to
fix the split brain of /dev/drbd0 on node1 before I can bring it back
in. After the manual intervention, node1 takes back control of
/dev/drbd0 and nfs etc switchs nodes successfully.

I need node1 to outdate both partitions since it has lost connectivity
so that they both sync from node2 when node1 returns. Then I can check
node1 and let it take back it's primary drbd partition. (I'm still not
totally sure how to do this without just restarting heartbeat on
node1.)

I've attached my ha.cf, cib.xml and drbd.conf. If someone has a little
time to skim them and point out any errors. Besides the split brain
when network is lost the setup seems to be working nicely.

Thanks for all the help and patience from some of you guys so far.

Thanks for any further help.
Guy

--
Don't just do something...sit there!
Attachments: ha.cf (11.1 KB)
  drbd.conf (1.84 KB)
  cib.xml (10.7 KB)


andreas.kurz at gmail

Mar 13, 2008, 5:47 AM

Post #6 of 7 (309 views)
Permalink
Re: Primary/Secondary + Secondary/Primary [In reply to]

On Thu, Mar 13, 2008 at 12:24 PM, Guy <wyldfury[at]gmail.com> wrote:
> On 13/03/2008, Schmidt, Florian <florian.schmidt[at]altroconsult.de> wrote:
> > Here's something to read about dopd:
> > http://blogs.linbit.com/florian/2007/10/01/an-underrated-cluster-admins-companion-dopd/
>
> I read that and then used the instructions from
> http://www.drbd.org/users-guide/s-heartbeat-dopd.html.
>
>
> > These are 2 differnt things:
> > dopd is for outdating the secondaries DRBD-resources in case of DRBD split-brain, which is not the same as Heartbeat split-brain. It only works, if there are Heartbeat-lines which are different from DRBD-connection.
> >
> > Pingd is for checking connectivity to other ping nodes (such as routers or switches) and maybe to switch the resources in case that other cluster-nodes have better connectivity
>
> I'm using the heartbeat outdater at the moment. My setup is using two
> nics. One to the local network and one with crossover cable to the
> other node (the production machines will have 3 nics, 1
> external/internet, 1 internal network and one with crossover). When I
> take out the network cable from node1 it fences the secondary
> partition (/dev/drbd1) on node1 correctly, but the primary partition
> (/dev/drbd0) on node1 becomes split-brain. Node2 in the mean time
> keeps its primary partition (/dev/drbd1) and changes /dev/drbd0 to
> primary just fine. So I still land up having to manually intervene to
> fix the split brain of /dev/drbd0 on node1 before I can bring it back
> in. After the manual intervention, node1 takes back control of
> /dev/drbd0 and nfs etc switchs nodes successfully.

compare the paths of the outdate-peer handlers in your drbd.conf ....

Regards,
Andreas

>
> I need node1 to outdate both partitions since it has lost connectivity
> so that they both sync from node2 when node1 returns. Then I can check
> node1 and let it take back it's primary drbd partition. (I'm still not
> totally sure how to do this without just restarting heartbeat on
> node1.)
>
> I've attached my ha.cf, cib.xml and drbd.conf. If someone has a little
> time to skim them and point out any errors. Besides the split brain
> when network is lost the setup seems to be working nicely.
>
> Thanks for all the help and patience from some of you guys so far.
>
> Thanks for any further help.
>
>
> Guy
>
> --
> Don't just do something...sit there!
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA[at]lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


wyldfury at gmail

Mar 13, 2008, 8:54 AM

Post #7 of 7 (312 views)
Permalink
Re: Primary/Secondary + Secondary/Primary [In reply to]

Hi Andreas,

On 13/03/2008, Andreas Kurz <andreas.kurz[at]gmail.com> wrote:
> compare the paths of the outdate-peer handlers in your drbd.conf ....
>

I think I've found the problem. I had after-sb-0pri etc all set to
disconnect, so it was doing exactly what I'd told it to do.

Best way to learn is to do something stupid, not likely to forget this one now.

Thanks!
Guy

--
Don't just do something...sit there!
_______________________________________________
Linux-HA mailing list
Linux-HA[at]lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.