bicknell at ufp
Apr 16, 2012, 5:39 AM
Post #39 of 49
In a message written on Sun, Apr 15, 2012 at 09:54:14PM -0400, Luke S. Crawford wrote:
> On my current fleet (well under 100 servers) single bit errors are so rare
> that if I get one, I schedule that machine for removal from production.
In a previous life, in a previous time, I worked at a place that
had a bunch of Cisco's with parity RAM. For the time, these boxes
had a lot of RAM, as they had distributed line cards each with their
own processor memory.
Cisco was rather famous for these parity errors, mostly because of
their stock answer: sunspots. The answer was in fact largely
correct, but it's just not a great response from a vendor. They
had a bunch of statistics though, collected from many of these
We ran the statistics, and given hundreds of routers, each with
many line cards the math told us we should have approximately 1
router every 9-10 months get one parity error from sunspots and
other random activity (e.g. not a failing RAM module with hundreds
of repeatable errors). This was, in fact, close to what we observed.
This experience gave me two takeaways. First, single bit flips are
rare, but when you have enough boxes rare shows up often. It's
very similar to anyone with petabytes of storage, disks fail every
couple of days because you have so many of them. At the same time
a home user might not see a failure in their lifetime (of disk or
Second though, if you're running a business, ECC is a must because
the message is so bad. "This was caused by sunspots" is not a
customer inspiring response, no matter how correct. "We could have
prevented this by spending an extra $50 on proper RAM for your $1M
box" is even worse.
Some quick looking at Newegg, 4GB DDR3 1333 ECC DIMM, $33.99. 4GB
DDR3 1333 Non-ECC DIMM, $21.99. Savings, $12. (Yes, I realize the
Motherboard also needs some extra circuitry, I expect it's less than $1
in quantity though).
Pretty much everyone I know values their data at more than $12 if it
Leo Bicknell - bicknell [at] ufp - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/