Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: NANOG: users

Most energy efficient (home) setup

 

 

First page Previous page 1 2 Next page Last page  View All NANOG users RSS feed   Index | Next | Previous | View Threaded


mysidia at gmail

Apr 14, 2012, 2:26 PM

Post #26 of 49 (642 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

On Wed, Feb 22, 2012 at 3:48 PM, Joe Greco <jgreco [at] ns> wrote:
> The current Mac mini "Server" model sports an i7 2.0GHz quad-core CPU
> and up to 16GB RAM (see OWC for that, IIRC).  Two drives, up to 750GB
> each, or SSD's if you prefer.

The Mac mini server is quite intringuing with that low power
requirement . Unfortunately... 16 GB _Non-ECC_ memory. I sure
would not want to run a NAS VM on a server with non-ECC memory that
cannot correct single-bit errors, at least with any data I cared
much about..

When you have such a large quantity of RAM, single-bit/fade errors
caused by background irradiation happen often, although at a fairly
low rate. Usually on a workstation it's not an issue, because there
is not a massive quantity of idle memory.

If you're running this 24x7 with VMs and Non-ECC memory, it's only a
question of time,
before silent memory corruption results in one of the VMs.

And silent memory corruption can make its way to the filesystem, or
applications' internal saved data structures (such as the contents
of a VM's registry database).

True can be partially mitigated with backups; but the idea of VMs
blue-screening or ESXi crashing with purple screen every 3 or 4
months sounds annoying.

> 12 frickin' watts when idle.  Or thereabouts.  Think about 40 watts
> when running full tilt, maybe a bit more.

--
-JH


jgreco at ns

Apr 14, 2012, 11:46 PM

Post #27 of 49 (645 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

> On Wed, Feb 22, 2012 at 3:48 PM, Joe Greco <jgreco [at] ns> wrote:
> > The current Mac mini "Server" model sports an i7 2.0GHz quad-core CPU
> > and up to 16GB RAM (see OWC for that, IIRC). =A0Two drives, up to 750GB
> > each, or SSD's if you prefer.
>
> The Mac mini server is quite intringuing with that low power
> requirement . Unfortunately... 16 GB _Non-ECC_ memory. I sure
> would not want to run a NAS VM on a server with non-ECC memory that
> cannot correct single-bit errors, at least with any data I cared
> much about..
>
> When you have such a large quantity of RAM, single-bit/fade errors
> caused by background irradiation happen often, although at a fairly
> low rate. Usually on a workstation it's not an issue, because there
> is not a massive quantity of idle memory.
>
> If you're running this 24x7 with VMs and Non-ECC memory, it's only a
> question of time,
> before silent memory corruption results in one of the VMs.
>
> And silent memory corruption can make its way to the filesystem, or
> applications' internal saved data structures (such as the contents
> of a VM's registry database).
>
> True can be partially mitigated with backups; but the idea of VMs
> blue-screening or ESXi crashing with purple screen every 3 or 4
> months sounds annoying.

While I don't disagree with the general thought, one could also say
it's just a matter of time before your server's power supply fails, or
a fan fails, or a hard drive fails.

Since we don't hear about Mac mini server users screaming about how
their servers are constantly crashing, the severity and frequency of
memory corruption events may not be anywhere near what you suggest.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


Valdis.Kletnieks at vt

Apr 15, 2012, 2:54 AM

Post #28 of 49 (639 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

On Sun, 15 Apr 2012 01:46:29 -0500, Joe Greco said:

> Since we don't hear about Mac mini server users screaming about how
> their servers are constantly crashing, the severity and frequency of

Googling for 'mac mini server crash' gets about 11.6M hits. I gave up after
10 pages of results, but up till that point most did in fact seem to be about
crashes on Mac mini servers (the mail you replied to was on page 8 at
the time).

> memory corruption events may not be anywhere near what you suggest.

"the severity and frequency of *noticed* memory corruption events".

FTFY.

(Keep in mind that if the box doesn't have ECC or at least parity, you *won't
know* you had a bit fllip until you dereference that memory location. At which
point if you're *lucky* you'll get a random crash that forces ou to reboot right
away. If you're unlucky, you won't notice till you try to re-mount the disks after
a reboot 2-3 months later....)


george.herbert at gmail

Apr 15, 2012, 4:28 AM

Post #29 of 49 (639 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

With RAID 4, the parity disk IOPS on write will rate-limit the whole LUN...

No big deal on a 4-drive LUN; terror on a 15-drive LUN...


George William Herbert
Sent from my iPhone

On Apr 14, 2012, at 8:04, Chris Adams <cmadams [at] hiwaay> wrote:

> Once upon a time, Jeroen van Aart <jeroen [at] mompl> said:
>> There may be a performance penalty using raid4, because it uses one
>> parity disk. Although that system looks like it can be useful for some
>> purposes it looks less ideal for home use. Also I don't see how it would
>> allow you to install your own OS.
>
> For read-mostly storage, there's no penalty as long as there's no disk
> failure. The parity drive wouldn't even spin up for reads.
> --
> Chris Adams <cmadams [at] hiwaay>
> Systems and Network Administrator - HiWAAY Internet Services
> I don't speak for anybody but myself - that's enough trouble.
>


cmorris at cs

Apr 15, 2012, 8:45 AM

Post #30 of 49 (636 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

>> And silent memory corruption can make its way to the filesystem, or
>> applications' internal saved data structures (such as the contents
>> of a VM's registry database).

> Since we don't hear about Mac mini server users screaming about how
> their servers are constantly crashing, the severity and frequency of
> memory corruption events may not be anywhere near what you suggest.
>

ECC is an absolute MUST. Case closed-
unless you like corrupt encryption keys that blow away an entire volume.


mysidia at gmail

Apr 15, 2012, 8:52 AM

Post #31 of 49 (639 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

On Sun, Apr 15, 2012 at 1:46 AM, Joe Greco <jgreco [at] ns> wrote:
> Since we don't hear about Mac mini server users screaming about how
Do you hear of lots of Mac mini server users loading up 16GB of RAM?
----
> it's just a matter of time before your server's power supply fails, or

The difference is power supplies don't fail nearly as often as 1-bit
DRAM errors, except when subject to harsh conditions. HDD errors
are comparably rare also; and yet, the drive surface of any HDD has
error correction codes, because disk surfaces are subject to similar
problems.

Consumer desktop hard drives use non-ECC memory inside the drive for
the cache/buffer memory, to save $$$: but it's typically only
12MB or so of memory, so it's approximately 300 days before you
have a 50% chance of a single bit error caused by background
radiation, and those are good odds,
but nevertheless, people get corrupted files, so maybe they aren't that good.

Consider that the probability 16GB of SDRAM experiences at least one
single bit error at sea level,
in a given 6 hour period exceeds 66% = 1 - (1 - 1.3e-12 * 6)^(16 *
2^30 * 8). In any given 24 hour period, the probability of at least
one single bit error exceeds 98%. Assuming the memory is good and
functioning correctly;

It's expected to see on average approximately 3 to 4 1-bit errors
per day. More are frequently seen.

Now if most of this 16GB of memory is unused, you will never notice
that over 30 days, 120 or so bits have been flipped from their
proper value..


On the other hand, if you have some filesystem read cache for a NAS
VM or database
application in the effected space, and moderately important data is
being damaged
well, that's just plain uncool



>
> ... JG
--
-JH


laurent at guerby

Apr 15, 2012, 2:26 PM

Post #32 of 49 (632 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

On Sun, 2012-04-15 at 10:52 -0500, Jimmy Hess wrote:
> In any given 24 hour period, the probability of at least
> one single bit error exceeds 98%. Assuming the memory is good and
> functioning correctly;
>
> It's expected to see on average approximately 3 to 4 1-bit errors
> per day. More are frequently seen.
>
> Now if most of this 16GB of memory is unused, you will never notice
> that over 30 days, 120 or so bits have been flipped from their
> proper value..

Hi,

I've been operating 4 desktop PCs with each the following configuration:
16 GB of RAM (4x4GB Kingston) running Linux about 15 VM (KVM) on DRBD
disks using more than 10 GB of RAM for nearly a year now in a room
without cooling. Over the year I've got one dead HDD and one dead SSD
(both replaced) but no data corruption or host or VM crash.

Do you have reference to recent papers with experimental data about non
ECC memory errors? It should be fairly easy to do (write and read scan
memory in a loop) and given your computations you should get bit errors
in less than a day.

I remember this paper in 2003 but this was using abnormal heat:
http://www.cs.princeton.edu/~sudhakar/papers/memerr-slashdot-commentary.html

Thanks in advance,

Sincerely,

Laurent


ispbuilder at gmail

Apr 15, 2012, 3:35 PM

Post #33 of 49 (632 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

I think the simple test for this problem is to take a non-ECC machine, boot
from a CD/USB Key/etc with memtest or memtest86+ on it, and see if you get
errors over the course of a few days.

Getting errors will certainly prove that this problem exists (or that you
have bad ram).


mysidia at gmail

Apr 15, 2012, 5:12 PM

Post #34 of 49 (638 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

On Sun, Apr 15, 2012 at 5:35 PM, Mike <ispbuilder [at] gmail> wrote:

It's not like ECC memory requires a lot of power, a full-blown ATX
board or something; there is the Intel S1200KP Mini-ITX board.

See,
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.117.5936&rep=rep1&type=pdf

But the exact rate of single bit errors in non-ECC memory today is not
necessarily predictable based on past studies from the 90s, and
depends on environment also -- local lightning, solar activity, which
is increasing lately; how much extra shielding you have in place
(Server placed inside a Faraday cage/Lead box ?), etc --- you'd
need measurements for your specific hardware; there are likely
dependencies on the size of the memory cells, the vertical cross
section, other components in the system.


> I think the simple test for this problem is to take a non-ECC machine, boot
> from a CD/USB Key/etc with memtest or memtest86+ on it, and see if you get
> errors over the course of a few days.

Memtest86+ contains a series of tests that help uncover specific
kinds of common memory faults; at any particular point in time, during
a memtest, there is only a confined range of physical memory
addresses under test, a bit flip anywhere else won't be detected.

Which means that Memtest is not likely to detect the error.

Test #11 Bit-Fade with modifications could have some promise; you
need a 24 hour delay instead of a 5 minute delay. You need to
have close to the entire physical address space under test.
And you need truly random bit values stored to some "reliable"
medium, instead of the shortcut of storing known bit patterns.

*Memtest86+ itself and the system BIOS have to be stored in memory or
CPU cache somewhere.
But then again, a random bit flip in non-ECC CPU L2 cache is a
possibility, but software like memtest if suitably modified could be
made to detect a 1-bit error that showed up in the majority of the
memory addresses.


--
-JH


lsc at prgmr

Apr 15, 2012, 6:54 PM

Post #35 of 49 (634 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

On Sun, Apr 15, 2012 at 10:52:51AM -0500, Jimmy Hess wrote:
> Consider that the probability 16GB of SDRAM experiences at least one
> single bit error at sea level,
> in a given 6 hour period exceeds 66% = 1 - (1 - 1.3e-12 * 6)^(16 *
> 2^30 * 8). In any given 24 hour period, the probability of at least
> one single bit error exceeds 98%. Assuming the memory is good and
> functioning correctly;
>
> It's expected to see on average approximately 3 to 4 1-bit errors
> per day. More are frequently seen.
>
> Now if most of this 16GB of memory is unused, you will never notice
> that over 30 days, 120 or so bits have been flipped from their
> proper value..

I think that is an overestimate, at least if single-bit (corrected)
ecc errors are as common as flipped bits on non-ecc ram.

Now, First, count me in the "ECC is a must, full stop." crowd. I
insist on ecc for even my customer's dedicated servers, even though most
of the customers don't care that much. "It's not for you, it's for me."
With ECC? if you have EDAC/bluesmoke setup correctly on a supported
motherboard, you get console spew whenever you have a single-bit error.

This means I can do a very simple grep on the box conserver logs to
and I can find all the failing ram modules I am responsible for.
Without ecc, I have no real way of telling the difference between broken
software and broken ram.

That said, I still think the 120 bits a month estimate is large; I
believe that ECC ram should report correctable errors (assuming a
correctly configured EDAC/bluesmoke module and supported chipset)
about as often as non-ecc ram would get a bit flip.

In a past role, I did spend the time grepping through such a properly
configured cluster, with tens of thousands of nodes, looking for failing
hardware. I should have done a proper paper with statistics, but
I did not. The vast majority of servers had zero correctable ecc errors,
while a few had a lot, which is consistent with the theory that ECC errors
are more often caused by bad ram.

(Of course, all these servers were in proper cases in a proper data center,
which probably gives you a fair bit of shielding.)

On my current fleet (well under 100 servers) single bit errors are so rare
that if I get one, I schedule that machine for removal from production.


jgreco at ns

Apr 15, 2012, 7:00 PM

Post #36 of 49 (632 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

> >> And silent memory corruption can make its way to the filesystem, or
> >> applications' internal saved data structures (such as the contents
> >> of a VM's registry database).
>
> > Since we don't hear about Mac mini server users screaming about how
> > their servers are constantly crashing, the severity and frequency of
> > memory corruption events may not be anywhere near what you suggest.
>
> ECC is an absolute MUST. Case closed-
> unless you like corrupt encryption keys that blow away an entire volume.

You might want to go tell that to all those Mac users who have full
disk encryption...

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


jgreco at ns

Apr 15, 2012, 7:15 PM

Post #37 of 49 (629 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

> In a past role, I did spend the time grepping through such a properly
> configured cluster, with tens of thousands of nodes, looking for failing
> hardware. I should have done a proper paper with statistics, but
> I did not. The vast majority of servers had zero correctable ecc errors,
> while a few had a lot, which is consistent with the theory that ECC errors
> are more often caused by bad ram.

I'd have to say that that's been the experience here as well, ECC is
great, yes, but it just doesn't seem to be something that is "absolutely
vital" on an ongoing basis, as some of the other posters here have
implied, to correct the constant bit errors that are(n't) showing up.

Maybe I'll get bored one of these days and find some devtools to stick
on one of the Macs.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


jamie at photon

Apr 16, 2012, 5:08 AM

Post #38 of 49 (631 views)
Permalink
RE: Most energy efficient (home) setup [In reply to]

> From: Joe Greco [mailto:jgreco [at] ns]

> I'd have to say that that's been the experience here as well, ECC is
> great, yes, but it just doesn't seem to be something that is
> "absolutely
> vital" on an ongoing basis, as some of the other posters here have
> implied, to correct the constant bit errors that are(n't) showing up.
>
> Maybe I'll get bored one of these days and find some devtools to stick
> on one of the Macs.

In all the years I've been playing with high end hardware, the best sample machine I have is an SGI Origin 200 that I had in production for over ten years, with the only downtime during that time being once to add more memory, once to replace a failed drive, once to move the rack and the occasional OS upgrade (I tended to skip a 6.5.x release or two between updates, and after 6.5.30 there were of course no more). That machine was down less than 24 hours cumulative for that entire period. In that ten year span, I saw TWO ECC parity errors (both single bit correctable). On any machine that saw regular ECC errors it was a sign of failing hardware (usually, but not necessarily the memory, there are other parts in there that have to carry that data too).

As much as I prefer ECC, it's not a show stopper for me if it's not there.

Jamie


bicknell at ufp

Apr 16, 2012, 5:39 AM

Post #39 of 49 (632 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

In a message written on Sun, Apr 15, 2012 at 09:54:14PM -0400, Luke S. Crawford wrote:
> On my current fleet (well under 100 servers) single bit errors are so rare
> that if I get one, I schedule that machine for removal from production.

In a previous life, in a previous time, I worked at a place that
had a bunch of Cisco's with parity RAM. For the time, these boxes
had a lot of RAM, as they had distributed line cards each with their
own processor memory.

Cisco was rather famous for these parity errors, mostly because of
their stock answer: sunspots. The answer was in fact largely
correct, but it's just not a great response from a vendor. They
had a bunch of statistics though, collected from many of these
deployed boxes.

We ran the statistics, and given hundreds of routers, each with
many line cards the math told us we should have approximately 1
router every 9-10 months get one parity error from sunspots and
other random activity (e.g. not a failing RAM module with hundreds
of repeatable errors). This was, in fact, close to what we observed.

This experience gave me two takeaways. First, single bit flips are
rare, but when you have enough boxes rare shows up often. It's
very similar to anyone with petabytes of storage, disks fail every
couple of days because you have so many of them. At the same time
a home user might not see a failure in their lifetime (of disk or
memory).

Second though, if you're running a business, ECC is a must because
the message is so bad. "This was caused by sunspots" is not a
customer inspiring response, no matter how correct. "We could have
prevented this by spending an extra $50 on proper RAM for your $1M
box" is even worse.

Some quick looking at Newegg, 4GB DDR3 1333 ECC DIMM, $33.99. 4GB
DDR3 1333 Non-ECC DIMM, $21.99. Savings, $12. (Yes, I realize the
Motherboard also needs some extra circuitry, I expect it's less than $1
in quantity though).

Pretty much everyone I know values their data at more than $12 if it
is lost.

--
Leo Bicknell - bicknell [at] ufp - CCIE 3440
PGP keys at http://www.ufp.org/~bicknell/


jgreco at ns

Apr 16, 2012, 6:14 AM

Post #40 of 49 (631 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

> Some quick looking at Newegg, 4GB DDR3 1333 ECC DIMM, $33.99. 4GB
> DDR3 1333 Non-ECC DIMM, $21.99. Savings, $12. (Yes, I realize the
> Motherboard also needs some extra circuitry, I expect it's less than $1
> in quantity though).
>
> Pretty much everyone I know values their data at more than $12 if it
> is lost.

The problem is that if you want to move past the 4GB modules, things
can get expensive. Bearing in mind the subject line, consider for
example the completely awesome Intel Sandy Bridge E3-1230 with a
board like the Supermicro X9SCL+-F, which can be built into a low
power system that idles around 45W if you're careful.

Problem is, the 8GB modules tend to cost an arm and a leg;

http://www.google.com/products/catalog?q=MEM-DR380L-CL01-EU13&oe=utf-8&rls=org.mozilla:en-US:official&client=firefox-a&um=1&hl=en&bav=on.2,or.r_gc.r_pw.r_qf.,cf.osb&biw=1043&bih=976&ie=UTF-8&tbm=shop&cid=8556948603121267780&sa=X&ei=HxmMT5btB8_PgAfLs5TvCQ&ved=0CD8Q8wIwAA

to outfit a machine with 32GB several months ago cost around *$400*
per module, or $1600 for the machine, whereas the average cost for
a 4GB module was only around $30.

So then you start looking at the less expensive options. When the
average going price for 8GB non-ECC modules is between $50 and $100,
then you're "only" looking at a cost premium of $1200 for ECC.

For $1200, I'm willing to at least consider non-ECC. You can infer
from this message that I'm actually waiting for more reasonable ECC
prices to show up; we're finally seeing somewhat more reasonable prices,
but by that I mean "only" around $130/8GB.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


nanog at voipro

Apr 16, 2012, 11:22 AM

Post #41 of 49 (627 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

Have you looked at the HP ProLiant MicroServer?

Cheers,

Henk


On 13-04-12 12:06, Jeroen van Aart wrote:
> Leo Bicknell wrote:
>> But what's really missing is storage management. RAID5 (and similar)
>> require all drives to be online all the time. I'd love an intelligent
>> file system that could spin down drives when not in use, and even for
>> many workloads spin up only a portion of the drives. It's easy to
>> imagine a system with a small SSD and a pair of disks. Reads spin one
>> disk. Writes go to that disk and the SSD until there are enough, which
>> spins up the second drive and writes them out as a proper mirror. In a
>> home file server drive motors, time you have 4-6 drives, eat most of the
>> power. CPU's speed step down nicely, drives don't.
>
> Late reply by me, but excellent points.
>
> A combination of mdadm and hdparm on linux should suffice to have a raid
> that will spin down the disks when not in use. I have used for years a
> G4 system with a mdadm raid1 (and a separate boot disk) and hdparm
> configured to spin the raid disks down after 10 minutes and it worked
> great.
>
> I think in a raid10 this would only spin up the disk pair that has the
> data you need, but leave the rest asleep. But I didn't try that yet.
>
> What I'd like is to have small disk enclosuer that includes a whole (low
> power) computer capable of having linux installed on some flash memory.
> Say you have an enclosure with space for 4 2.5 inch disks, install
> linux, set it up as a raid10, connect through USB to your computer for
> back up purposes.
>
> Greetings,
> Jeroen
>


eugen at leitl

Apr 16, 2012, 12:21 PM

Post #42 of 49 (623 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

On Mon, Apr 16, 2012 at 11:22:20AM -0700, Henk Hesselink wrote:
> Have you looked at the HP ProLiant MicroServer?

Notice it takes up to 8 GByte ECC memory and supports zfs
via napp-it/Illumos. A hacked BIOS was required to use
the 5th internal SATA port in AHCI mode, maybe that's
no longer necessary with N40L.


jgreco at ns

Apr 16, 2012, 12:38 PM

Post #43 of 49 (621 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

> On Mon, Apr 16, 2012 at 11:22:20AM -0700, Henk Hesselink wrote:
> > Have you looked at the HP ProLiant MicroServer?
>
> Notice it takes up to 8 GByte ECC memory and supports zfs
> via napp-it/Illumos. A hacked BIOS was required to use
> the 5th internal SATA port in AHCI mode, maybe that's
> no longer necessary with N40L.

The MicroServer is actually a nice little platform, one little bright
spot in the small-home-server market.

It does have some other issues though:

1) It's not particularly low-power, as in, I managed to build some Xeon
based systems that run rings around it for only maybe a dozen watts
more, and some of the NAShead guys over at one of the Linux based
projects have a similar but lower-power platform for a lower price,

2) While it has a remote management card available, it's known to not
work with certain things, including FreeBSD,

3) Various problems noted with the eSATA port, such as the inability
to use an external port multiplier.

On the flip side, some people have tossed one of those 4-2.5"-in-a-5.25"
bay racks into the optical bay, along with a PCI controller, to allow
the addition of SSD's or whatever for NAS use. Pretty cool and the
thing *is* pretty compact.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


jeroen at mompl

Apr 17, 2012, 6:05 PM

Post #44 of 49 (615 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

Jimmy Hess wrote:
> Consider that the probability 16GB of SDRAM experiences at least one
> single bit error at sea level,
> in a given 6 hour period exceeds 66% = 1 - (1 - 1.3e-12 * 6)^(16 *
> 2^30 * 8). In any given 24 hour period, the probability of at least
> one single bit error exceeds 98%. Assuming the memory is good and
> functioning correctly;

> application in the effected space, and moderately important data is
> being damaged
> well, that's just plain uncool

Having limited knowledge of which consumer devices support ECC memory
and which don't I was pleasantly surprised to find out the always on IBM
thinkpad I ran for years refused to work with non-ECC memory.

Greetings,
Jeroen

--
Earthquake Magnitude: 6.2
Date: Tuesday, April 17, 2012 19:03:55 UTC
Location: east of the South Sandwich Islands
Latitude: -59.0988; Longitude: -16.6928
Depth: 1.00 km


jeroen at mompl

Apr 18, 2012, 12:35 PM

Post #45 of 49 (605 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

Laurent GUERBY wrote:
> Do you have reference to recent papers with experimental data about non
> ECC memory errors? It should be fairly easy to do

Maybe this provides some information:

http://en.wikipedia.org/wiki/ECC_memory#Problem_background

"Work published between 2007 and 2009 showed widely varying error rates
with over 7 orders of magnitude difference, ranging from 10−10−10−17
error/bit·h, roughly one bit error, per hour, per gigabyte of memory to
one bit error, per century, per gigabyte of memory.[2][4][5] A very
large-scale study based on Google's very large number of servers was
presented at the SIGMETRICS/Performance’09 conference.[4] The actual
error rate found was several orders of magnitude higher than previous
small-scale or laboratory studies, with 25,000 to 70,000 errors per
billion device hours per megabit (about 3–10×10−9 error/bit·h), and more
than 8% of DIMM memory modules affected by errors per year."


--
Earthquake Magnitude: 4.9
Date: Wednesday, April 18, 2012 16:21:41 UTC
Location: Solomon Islands
Latitude: -7.4630; Longitude: 156.7916
Depth: 414.30 km


dotis at mail-abuse

Apr 18, 2012, 2:55 PM

Post #46 of 49 (607 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

On 4/18/12 12:35 PM, Jeroen van Aart wrote:
> Laurent GUERBY wrote:
> > Do you have reference to recent papers with experimental data about
> > non ECC memory errors? It should be fairly easy to do
> Maybe this provides some information:
>
> http://en.wikipedia.org/wiki/ECC_memory#Problem_background
>
> "Work published between 2007 and 2009 showed widely varying error
> rates with over 7 orders of magnitude difference, ranging from
> 10−10−10−17 error/bit·h, roughly one bit error, per hour, per
> gigabyte of memory to one bit error, per century, per gigabyte of
> memory.[2][4][5] A very large-scale study based on Google's very
> large number of servers was presented at the
> SIGMETRICS/Performance’09 conference.[4] The actual error rate found
> was several orders of magnitude higher than previous small-scale or
> laboratory studies, with 25,000 to 70,000 errors per billion device
> hours per megabit (about 3–10×10−9 error/bit·h), and more than 8% of
> DIMM memory modules affected by errors per year."
Dear Jeroen,

In the work that led up to RFC3309, many of the errors found on the
Internet pertained to single interface bits, and not single data bits.
Working at a large chip manufacturer that removed internal memory error
detection to foolishly save space, cost them dearly in then needing to
do far more exhaustive four corner testing. Checksums used by TCP and
UDP are able to detect single bit data errors, but may miss as much as
2% of single interface bit errors. It would be surprising to find
memory designs lacking internal error detection logic.

Regards,
Douglas Otis


smb at cs

Apr 18, 2012, 8:09 PM

Post #47 of 49 (605 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

On Apr 18, 2012, at 5:55 32PM, Douglas Otis wrote:

> On 4/18/12 12:35 PM, Jeroen van Aart wrote:
>> Laurent GUERBY wrote:
>> > Do you have reference to recent papers with experimental data about
>> > non ECC memory errors? It should be fairly easy to do
>> Maybe this provides some information:
>>
>> http://en.wikipedia.org/wiki/ECC_memory#Problem_background
>>
>> "Work published between 2007 and 2009 showed widely varying error
>> rates with over 7 orders of magnitude difference, ranging from
>> 10−10−10−17 error/bit·h, roughly one bit error, per hour, per
>> gigabyte of memory to one bit error, per century, per gigabyte of
>> memory.[2][4][5] A very large-scale study based on Google's very
>> large number of servers was presented at the
>> SIGMETRICS/Performance’09 conference.[4] The actual error rate found
>> was several orders of magnitude higher than previous small-scale or
>> laboratory studies, with 25,000 to 70,000 errors per billion device
>> hours per megabit (about 3–10×10−9 error/bit·h), and more than 8% of
>> DIMM memory modules affected by errors per year."
> Dear Jeroen,
>
> In the work that led up to RFC3309, many of the errors found on the Internet pertained to single interface bits, and not single data bits. Working at a large chip manufacturer that removed internal memory error detection to foolishly save space, cost them dearly in then needing to do far more exhaustive four corner testing. Checksums used by TCP and UDP are able to detect single bit data errors, but may miss as much as 2% of single interface bit errors. It would be surprising to find memory designs lacking internal error detection logic.


mallet:~ smb$ head -14 doc/ietf/rfc/rfc3309.txt | sed 1,7d | sed 2,5d; date
Request for Comments: 3309 Stanford
September 2002

Wed Apr 18 23:07:53 EDT 2012


We are not in a static field... (3309 is one of my favorite RFCs -- but
the specific findings (errors happen more often than you think), as
opposed the general lesson (understand your threat model) may be OBE.


--Steve Bellovin, https://www.cs.columbia.edu/~smb


dotis at mail-abuse

Apr 19, 2012, 3:31 PM

Post #48 of 49 (602 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

On 4/18/12 8:09 PM, Steven Bellovin wrote:
>
> On Apr 18, 2012, at 5:55 32PM, Douglas Otis wrote:
> > Dear Jeroen,
> >
> > In the work that led up to RFC3309, many of the errors found on the
> > Internet pertained to single interface bits, and not single data
> > bits. Working at a large chip manufacturer that removed internal
> > memory error detection to foolishly save space, cost them dearly in
> > then needing to do far more exhaustive four corner testing.
> > Checksums used by TCP and UDP are able to detect single bit data
> > errors, but may miss as much as 2% of single interface bit errors.
> > It would be surprising to find memory designs lacking internal
> > error detection logic.
>
> mallet:~ smb$ head -14 doc/ietf/rfc/rfc3309.txt | sed 1,7d | sed
> 2,5d; date Request for Comments: 3309
> Stanford September 2002
>
> Wed Apr 18 23:07:53 EDT 2012
>
> We are not in a static field... (3309 is one of my favorite RFCs --
> but the specific findings (errors happen more often than you think),
> as opposed the general lesson (understand your threat model) may be
> OBE.
Dear Steve,

You may be right. However back then most were also only considering
random single bit errors as well. Although there was plentiful evidence
for where errors might be occurring, it seems many worked hard to ignore
the clues.

Reminiscent of a drunk searching for keys dropped in the dark under a
light post, mathematics for random single bit errors offer easier
calculations and simpler solutions. While there are indeed fewer
parallel buses today, these structures still exist in memory modules and
other networking components. Manufactures confront increasingly
temperamental bit storage elements, where most include internal error
correction to minimize manufacturing and testing costs. Error sources
are not easily ascertained with simple checksums when errors are not random.

Regards,
Douglas Otis


smb at cs

Apr 19, 2012, 3:37 PM

Post #49 of 49 (605 views)
Permalink
Re: Most energy efficient (home) setup [In reply to]

On Apr 19, 2012, at 6:31 43PM, Douglas Otis wrote:

> On 4/18/12 8:09 PM, Steven Bellovin wrote:
>>
>> On Apr 18, 2012, at 5:55 32PM, Douglas Otis wrote:
>> > Dear Jeroen,
>> >
>> > In the work that led up to RFC3309, many of the errors found on the
>> > Internet pertained to single interface bits, and not single data
>> > bits. Working at a large chip manufacturer that removed internal
>> > memory error detection to foolishly save space, cost them dearly in
>> > then needing to do far more exhaustive four corner testing.
>> > Checksums used by TCP and UDP are able to detect single bit data
>> > errors, but may miss as much as 2% of single interface bit errors.
>> > It would be surprising to find memory designs lacking internal
>> > error detection logic.
>>
>> mallet:~ smb$ head -14 doc/ietf/rfc/rfc3309.txt | sed 1,7d | sed
>> 2,5d; date Request for Comments: 3309
>> Stanford September 2002
>>
>> Wed Apr 18 23:07:53 EDT 2012
>>
>> We are not in a static field... (3309 is one of my favorite RFCs --
>> but the specific findings (errors happen more often than you think),
>> as opposed the general lesson (understand your threat model) may be
>> OBE.
> Dear Steve,
>
> You may be right. However back then most were also only considering random single bit errors as well. Although there was plentiful evidence for where errors might be occurring, it seems many worked hard to ignore the clues.
>
> Reminiscent of a drunk searching for keys dropped in the dark under a light post, mathematics for random single bit errors offer easier calculations and simpler solutions. While there are indeed fewer parallel buses today, these structures still exist in memory modules and other networking components. Manufactures confront increasingly temperamental bit storage elements, where most include internal error correction to minimize manufacturing and testing costs. Error sources are not easily ascertained with simple checksums when errors are not random.
>

Yes -- that's precisely why I like that RFC so much.


--Steve Bellovin, https://www.cs.columbia.edu/~smb

First page Previous page 1 2 Next page Last page  View All NANOG users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.