
doug at warner
Mar 5, 2012, 1:11 PM
Post #3 of 3
(921 views)
Permalink
|
|
Re: LACP drops simultaneously across multiple switches/products/versions
[In reply to]
|
|
So far we've received the same suggestion from F10 to increase the LACP timers and I agree that it basically means losing the feature we're trying to use. Unfortunately I don't really have a whole lot of additional logging; I see the LACP groups ungroup, RSTP changes, then LACP regroups, more RSTP changes, etc. I *finally* got a CPU interrupt watchdog notice on my S50n stack, but I've seen this over half a dozen times now with no other error messages. I appreciate the anecdotal support that others are seeing the same thing. -Doug On 03/05/2012 03:37 PM, Matt Hite wrote: > You don't go into detail as to the log messages you see during the > failure, so it's certainly hard to diagnose with anything but > anecdote. However, here's my anecdote.... > > I have encountered similar sporadic LACP issues across numerous > switches on an extremely large scale. The best Force10 could suggest > was to try using 30 second LACP heartbeat timers, presumably so their > control plane had sufficient time to reply to heartbeat messages. To > be honest, this particular scenario was not acceptable so I didn't > even bother to validate if this actually "fixed" anything. > > This is pretty much why we dropped all our layer 2 link aggregation > and moved to L3 ECMP load balancing across links. > > In my opinion, a lot of these problems are fundamental design issues > with regards to control plane management. > > On Thu, Mar 1, 2012 at 6:34 AM, Doug Warner <doug [at] warner> wrote: >> We're having a strange issue where LACP will bounce on multiple switches >> simultaneously, typically several times in a row. >> >> We previously would see this on our S50n stack when it was our core switch, >> but it hadn't happened in over a year. Now that we have a C300 in addition to >> the S50n stack we've seen it 5 times in 4 days. >> >> What we've seen so far is two LACP groups from the C300 to our only two S55s >> will bounce, then all the LACP groups on the C300 will bounce as well as all >> the LACP groups on the S50n stack. >> >> We don't get any CPU watchdog notices, and traces don't show that the LACP >> process has restarted. >> >> Has anyone experienced these types of problems? I have an open TAC case >> currently but want to get others experiences here. >> >> -Doug
|