Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: Linux-HA: Users

Antw: Bond mode for 2 node direct link

 

 

Linux-HA users RSS feed   Index | Next | Previous | View Threaded


Ulrich.Windl at rz

Jul 16, 2012, 11:17 PM

Post #1 of 8 (292 views)
Permalink
Antw: Bond mode for 2 node direct link

>>> Volker Poplawski <volker.poplawski [at] atrics> schrieb am 16.07.2012 um 11:53 in
Nachricht <5003E4B3.5060508 [at] atrics>:
> Hello everyone.
>
> Could you please tell me the recommended mode for a bonded network
> interface, which is used as the direct link in a two machine cluster?
>
> There are 'balance-rr', 'active-backup', 'balance-xor' etc
> Which one to choose for maximum fault tolerance?

Hi!

Active-backup is probably the most fool-proof mode. We had some issues with balance-rr here. If your infrastructure can do it, I'd go for LACP; everything else looks like a work-around. (MHO)

Regards,
Ulrich

>
> I'm using corosync as cluster communication layer - if that matters
>
>
> Regards
> ....Volker
>
> _______________________________________________
> Linux-HA mailing list
> Linux-HA [at] lists
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>




_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


misch at clusterbau

Jul 16, 2012, 11:47 PM

Post #2 of 8 (294 views)
Permalink
Re: Antw: Bond mode for 2 node direct link [In reply to]

> >>> Volker Poplawski <volker.poplawski [at] atrics> schrieb am 16.07.2012 um
> >>> 11:53 in
>
> Nachricht <5003E4B3.5060508 [at] atrics>:
> > Hello everyone.
> >
> > Could you please tell me the recommended mode for a bonded network
> > interface, which is used as the direct link in a two machine cluster?
> >
> > There are 'balance-rr', 'active-backup', 'balance-xor' etc
> > Which one to choose for maximum fault tolerance?
>
> Hi!
>
> Active-backup is probably the most fool-proof mode. We had some issues with
> balance-rr here. If your infrastructure can do it, I'd go for LACP;
> everything else looks like a work-around. (MHO)
>
> Regards,
> Ulrich

You have to get your fault-detection right. Independent of the mode.

I saw a lot of problems with miimon depending of the network card and driver.
Just test and tweak it until it works.

Greetings,

--
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98
Attachments: signature.asc (0.19 KB)


misch at clusterbau

Jul 17, 2012, 12:14 AM

Post #3 of 8 (294 views)
Permalink
Re: Antw: Bond mode for 2 node direct link [In reply to]

> >>> Volker Poplawski <volker.poplawski [at] atrics> schrieb am 16.07.2012 um
> >>> 11:53 in
>
> Nachricht <5003E4B3.5060508 [at] atrics>:
> > Hello everyone.
> >
> > Could you please tell me the recommended mode for a bonded network
> > interface, which is used as the direct link in a two machine cluster?
> >
> > There are 'balance-rr', 'active-backup', 'balance-xor' etc
> > Which one to choose for maximum fault tolerance?

active/backup can lead to trouble when the servers are directly connected.

Imagine, one node reboots and and uses the "other" interface as active it
might happen that both nodes detect both lines as "down" and close the
communication.

--
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98
Attachments: signature.asc (0.19 KB)


volker.poplawski at atrics

Jul 17, 2012, 2:11 AM

Post #4 of 8 (288 views)
Permalink
Re: Antw: Bond mode for 2 node direct link [In reply to]

On 17.07.2012 09:14, Michael Schwartzkopff wrote:
>>>>> Volker Poplawski<volker.poplawski [at] atrics> schrieb am 16.07.2012 um
>>>>> 11:53 in
>>
>> Nachricht<5003E4B3.5060508 [at] atrics>:
>>> Hello everyone.
>>>
>>> Could you please tell me the recommended mode for a bonded network
>>> interface, which is used as the direct link in a two machine cluster?
>>>
>>> There are 'balance-rr', 'active-backup', 'balance-xor' etc
>>> Which one to choose for maximum fault tolerance?
>
> active/backup can lead to trouble when the servers are directly connected.
>
> Imagine, one node reboots and and uses the "other" interface as active it
> might happen that both nodes detect both lines as "down" and close the
> communication.

That's what I was wondering about active-backup. If there is a order
defined on the enslaved NICs and you have to connect the right ones
vice-versa or this is somehow detected.

I figure I don't have that problem with balance-rr.

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


arnold at arnoldarts

Jul 17, 2012, 2:44 PM

Post #5 of 8 (279 views)
Permalink
Re: Antw: Bond mode for 2 node direct link [In reply to]

On 17.07.2012 11:11, Volker Poplawski wrote:
> On 17.07.2012 09:14, Michael Schwartzkopff wrote:
>>>>>> Volker Poplawski<volker.poplawski [at] atrics> schrieb am 16.07.2012 um
>>>>>> 11:53 in
>>> Nachricht<5003E4B3.5060508 [at] atrics>:
>>>> Could you please tell me the recommended mode for a bonded network
>>>> interface, which is used as the direct link in a two machine cluster?
>>>>
>>>> There are 'balance-rr', 'active-backup', 'balance-xor' etc
>>>> Which one to choose for maximum fault tolerance?
>> active/backup can lead to trouble when the servers are directly connected.
>> Imagine, one node reboots and and uses the "other" interface as active it
>> might happen that both nodes detect both lines as "down" and close the
>> communication.
> That's what I was wondering about active-backup. If there is a order
> defined on the enslaved NICs and you have to connect the right ones
> vice-versa or this is somehow detected.
> I figure I don't have that problem with balance-rr.

Additionally: If its two direct links dedicated to your storage network,
there is no reason going active/backup and discarding half of the
available bandwidth.
Just use one of the real modes, balance-rr, balance-xor, 802.3ad or
maybe balance-alb. But the last one is only needed when there are
components involved that don't understand the bonding. Which is not the
case when its just two machines directly connected...

Arnold
--
Dieses Email wurde elektronisch erstellt und ist ohne handschriftliche
Unterschrift gültig.
Attachments: signature.asc (0.19 KB)


lmb at suse

Jul 17, 2012, 3:32 PM

Post #6 of 8 (280 views)
Permalink
Re: Antw: Bond mode for 2 node direct link [In reply to]

On 2012-07-17T23:44:13, Arnold Krille <arnold [at] arnoldarts> wrote:

> Additionally: If its two direct links dedicated to your storage network,
> there is no reason going active/backup and discarding half of the
> available bandwidth.

Since the system must be designed for one link to have adequate
bandwidth to provide the service (otherwise it could not actually cope
with one link failing), this shouldn't be a significant problem ;-)

> Just use one of the real modes, balance-rr, balance-xor, 802.3ad or
> maybe balance-alb. But the last one is only needed when there are
> components involved that don't understand the bonding. Which is not the
> case when its just two machines directly connected...

They also increase the interdependency between the supposedly
independent network links. The simpler the better.

And for maximum FT, broadcast is the best choice.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


arnold at arnoldarts

Jul 18, 2012, 11:01 AM

Post #7 of 8 (274 views)
Permalink
Re: Antw: Bond mode for 2 node direct link [In reply to]

On Wednesday 18 July 2012 00:32:16 Lars Marowsky-Bree wrote:
> On 2012-07-17T23:44:13, Arnold Krille <arnold [at] arnoldarts> wrote:
> > Additionally: If its two direct links dedicated to your storage network,
> > there is no reason going active/backup and discarding half of the
> > available bandwidth.
> Since the system must be designed for one link to have adequate
> bandwidth to provide the service (otherwise it could not actually cope
> with one link failing), this shouldn't be a significant problem ;-)

That would mean that your system runs the same whether one or two links are
present. But fault-tolerant doesn't necessarily mean that performance is the
same in the clean and in the faulty state.

And selling "two with double throughput but it also works when in fault-state"
sells better than "two but you won't see it except for a slightly better
fault-tolerance". And then count in the failure-probability of a direct link.
Compared to the inter-linked failure-probability of two-port network-cards
where still both links are down when the card breaks. Two-port cards only
protect against single-cable-failure (with a direct link in the same rack?) or
failure of the (physical) network-drivers. When the chip or the cards
powersupply fails, both links are down, doesn't matter if active-backup,
balance-rr, lacp, or broadcast...

And when the scenario is the prototype of HA: one service provided in HA with
an active-backup-setup of two machines that do nothing else? then I want the
interlink to be as reliable _and_ as fast as possible so I don't loose the
last bit of information because the disk-mirroring hasn't pushed out the data
fast enough because of using only one 1GB link where two where available.

Have fun,

Arnold
Attachments: signature.asc (0.19 KB)


lmb at suse

Jul 18, 2012, 1:15 PM

Post #8 of 8 (273 views)
Permalink
Re: Antw: Bond mode for 2 node direct link [In reply to]

On 2012-07-18T20:01:35, Arnold Krille <arnold [at] arnoldarts> wrote:

> That would mean that your system runs the same whether one or two links are
> present.

That's not what I said. What I said (or at least meant ;-) is that, even
in the degraded state, the performance must still be within acceptable
range.

Hence, the performance boost provided by actually utilizing the
redundancy during the fault-free phase can't be critical but only
optional. (Or put differently: nice, but not required.)

> And selling "two with double throughput but it also works when in fault-state"
> sells better than "two but you won't see it except for a slightly better
> fault-tolerance". And then count in the failure-probability of a direct link.

Agreed. But the question was what provides maximum fault tolerance.

I've seen too many cases where link down events were not detected
through bonding (because of intermittent switches or weird failure
modes). I'd be much happier if instead of dumb bonding something like
OSPF was used across the hosts ;-)

FWIW, heartbeat used to broadcast it's traffic all the time.

> And when the scenario is the prototype of HA: one service provided in HA with
> an active-backup-setup of two machines that do nothing else? then I want the
> interlink to be as reliable _and_ as fast as possible so I don't loose the
> last bit of information because the disk-mirroring hasn't pushed out the data
> fast enough because of using only one 1GB link where two where available.

You don't lose it. Because mirroring is not asymmetric, and writes
aren't confirmed before fsync() et al return. The performance impact is,
of course, granted.


Regards,
Lars

--
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg)
"Experience is the name everyone gives to their mistakes." -- Oscar Wilde

_______________________________________________
Linux-HA mailing list
Linux-HA [at] lists
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Linux-HA users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.