jeff.cleverley at avagotech
Apr 25, 2012, 2:55 PM
We upgraded a pair of 6080s and 6040s from 7.3.3P5 to 126.96.36.199P4 at the
first of the year. Within 1 hour of the upgrade, our OpenNMS server
started getting timeouts for SNMP polling. We had ~130 per head for
the 6080s in the next 2 days. We had 0 in the prior 2 months that I
checked. We also noticed the average CPU load for each head went from
~40% to close to 60% utilization comparing December to January.
Anytime the cluster is in a failover mode, we get quite a few customer
complaints due to slowness. While I can't say for certain we did not
have this slowness before, none of us remember any complaints when the
cluster was failed before. Obviously trying to run 120% utilization
on 1 filer instead of 80% will cause this issue :-) The 6040 cluster
did not give the polling errors, but it looks like the CPU load is
higher on them also. The load is usually low enough to gracefully
cover a cluster failover.
I've had a case opened since the first of the year and it looks like
we are now out of options and ideas with no explanation or resolution.
At this point our plan is to roll back to the 7.3.3P5 OS since it
seemed to behave better. Since there is no identifiable problem or
solution, it makes us unwilling to jump to a newer OS since we have no
reason to believe the issue isn't inherited by later revisions.
Rolling back to a known good OS seems to make the most sense. We will
roll the 6080 cluster back first and see how it works out before we
roll the 6040 cluster.
My question to the group is whether anyone else has had a similar
issue but did jump to a newer OnTap release that fixed the issues?
The vast majority of our data access is via nfs from RHEL5 clients and
Unix Systems Administrator
4380 Ziegler Road
Fort Collins, Colorado 80525
Toasters mailing list
Toasters [at] teaparty