doug at warner
Mar 5, 2012, 1:11 PM
Post #3 of 3
So far we've received the same suggestion from F10 to increase the LACP timers
Re: LACP drops simultaneously across multiple switches/products/versions
[In reply to]
and I agree that it basically means losing the feature we're trying to use.
Unfortunately I don't really have a whole lot of additional logging; I see the
LACP groups ungroup, RSTP changes, then LACP regroups, more RSTP changes, etc.
I *finally* got a CPU interrupt watchdog notice on my S50n stack, but I've
seen this over half a dozen times now with no other error messages.
I appreciate the anecdotal support that others are seeing the same thing.
On 03/05/2012 03:37 PM, Matt Hite wrote:
> You don't go into detail as to the log messages you see during the
> failure, so it's certainly hard to diagnose with anything but
> anecdote. However, here's my anecdote....
> I have encountered similar sporadic LACP issues across numerous
> switches on an extremely large scale. The best Force10 could suggest
> was to try using 30 second LACP heartbeat timers, presumably so their
> control plane had sufficient time to reply to heartbeat messages. To
> be honest, this particular scenario was not acceptable so I didn't
> even bother to validate if this actually "fixed" anything.
> This is pretty much why we dropped all our layer 2 link aggregation
> and moved to L3 ECMP load balancing across links.
> In my opinion, a lot of these problems are fundamental design issues
> with regards to control plane management.
> On Thu, Mar 1, 2012 at 6:34 AM, Doug Warner <doug [at] warner> wrote:
>> We're having a strange issue where LACP will bounce on multiple switches
>> simultaneously, typically several times in a row.
>> We previously would see this on our S50n stack when it was our core switch,
>> but it hadn't happened in over a year. Now that we have a C300 in addition to
>> the S50n stack we've seen it 5 times in 4 days.
>> What we've seen so far is two LACP groups from the C300 to our only two S55s
>> will bounce, then all the LACP groups on the C300 will bounce as well as all
>> the LACP groups on the S50n stack.
>> We don't get any CPU watchdog notices, and traces don't show that the LACP
>> process has restarted.
>> Has anyone experienced these types of problems? I have an open TAC case
>> currently but want to get others experiences here.