Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: NANOG: users

HE.net, Fremont-2 outage?

 

 

First page Previous page 1 2 Next page Last page  View All NANOG users RSS feed   Index | Next | Previous | View Threaded


tico-nanog at raapid

Nov 3, 2009, 10:50 AM

Post #1 of 41 (1489 views)
Permalink
HE.net, Fremont-2 outage?

Hey guys,

I can't get through to Hurricane Electric, and they seem to be having an
outage at their Fremont-2 facility again (as of 17:30 UTC or thereabouts) --
ticket system is unanswered, phones go to voicemail, all equipment is
unreachable.

Does anyone here have a presence at 48233 Warm Springs Blvd, that can
provide any information about this? I got hit by the ATS failure last
month, so I guess it's possible that that equipment may have flaked again.

-t


stef-list at memberwebs

Nov 3, 2009, 11:13 AM

Post #2 of 41 (1457 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

Tico wrote:
> Hey guys,
>
> I can't get through to Hurricane Electric, and they seem to be having an
> outage at their Fremont-2 facility again (as of 17:30 UTC or
> thereabouts) --
> ticket system is unanswered, phones go to voicemail, all equipment is
> unreachable.

Yes, there was a power outage. Confirmed with Hurricane Electric. All
our equipment was offline for 5 minutes or so. Discussed over on
outages [at] outages This is the second such data center wide power
outage in 2 months.

I'm unimpressed with their lack of transparency on these issues. It
seems not even their own employees know the causes or their remedial
actions. The impression you get is that it's a pretty wild and crazy
over at Hurricane Electric without real disaster recovery plans or
procedures.

Cheers,

Stef


nenolod at systeminplace

Nov 3, 2009, 11:28 AM

Post #3 of 41 (1459 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

Yeah. They had yet another power outage. The fourth in 16 months.

Luckily we have already begun plans to leave their facility.

William
------Original Message------
From: Tico
To: nanog [at] nanog
Subject: HE.net, Fremont-2 outage?
Sent: Nov 3, 2009 1:50 PM

Hey guys,

I can't get through to Hurricane Electric, and they seem to be having an
outage at their Fremont-2 facility again (as of 17:30 UTC or thereabouts) --
ticket system is unanswered, phones go to voicemail, all equipment is
unreachable.

Does anyone here have a presence at 48233 Warm Springs Blvd, that can
provide any information about this? I got hit by the ATS failure last
month, so I guess it's possible that that equipment may have flaked again.

-t



--
William Pitcock
SystemInPlace - Simple Hosting Solutions
1-866-519-6149


jeffrey.lyon at blacklotus

Nov 3, 2009, 5:04 PM

Post #4 of 41 (1456 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

FWIW: http://www.he.net/releases/release18.html

Jeff

On Tue, Nov 3, 2009 at 2:28 PM, William Pitcock
<nenolod [at] systeminplace> wrote:
> Yeah.  They had yet another power outage.  The fourth in 16 months.
>
> Luckily we have already begun plans to leave their facility.
>
> William
> ------Original Message------
> From: Tico
> To: nanog [at] nanog
> Subject: HE.net, Fremont-2 outage?
> Sent: Nov 3, 2009 1:50 PM
>
> Hey guys,
>
> I can't get through to Hurricane Electric, and they seem to be having an
> outage at their Fremont-2 facility again (as of 17:30 UTC or thereabouts) --
> ticket system is unanswered, phones go to voicemail, all equipment is
> unreachable.
>
> Does anyone here have a presence at 48233 Warm Springs Blvd, that can
> provide any information about this? I got hit by the ATS failure last
> month, so I guess it's possible that that equipment may have flaked again.
>
> -t
>
>
>
> --
> William Pitcock
> SystemInPlace - Simple Hosting Solutions
> 1-866-519-6149
>
>



--
Jeffrey Lyon, Leadership Team
jeffrey.lyon [at] blacklotus | http://www.blacklotus.net
Black Lotus Communications of The IRC Company, Inc.

Platinum sponsor of HostingCon 2010. Come to Austin, TX on July 19 -
21 to find out how to "protect your booty."


mike at m5computersecurity

Nov 3, 2009, 5:41 PM

Post #5 of 41 (1451 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

That release is from 10/31/01.


On Tue, 2009-11-03 at 20:04 -0500, Jeffrey Lyon wrote:
> FWIW: http://www.he.net/releases/release18.html
>
> Jeff
>
> On Tue, Nov 3, 2009 at 2:28 PM, William Pitcock
> <nenolod [at] systeminplace> wrote:
> > Yeah. They had yet another power outage. The fourth in 16 months.
> >
> > Luckily we have already begun plans to leave their facility.
> >
> > William
> > ------Original Message------
> > From: Tico
> > To: nanog [at] nanog
> > Subject: HE.net, Fremont-2 outage?
> > Sent: Nov 3, 2009 1:50 PM
> >
> > Hey guys,
> >
> > I can't get through to Hurricane Electric, and they seem to be having an
> > outage at their Fremont-2 facility again (as of 17:30 UTC or thereabouts) --
> > ticket system is unanswered, phones go to voicemail, all equipment is
> > unreachable.
> >
> > Does anyone here have a presence at 48233 Warm Springs Blvd, that can
> > provide any information about this? I got hit by the ATS failure last
> > month, so I guess it's possible that that equipment may have flaked again.
> >
> > -t
> >
> >
> >
> > --
> > William Pitcock
> > SystemInPlace - Simple Hosting Solutions
> > 1-866-519-6149
> >
> >
>
>
>
--
************************************************************
Michael J. McCafferty
Principal, Security Engineer
M5 Hosting
http://www.m5hosting.com

You can have your own custom Dedicated Server up and running today !
RedHat Enterprise, CentOS, Ubuntu, Debian, OpenBSD, FreeBSD, and more
************************************************************


lyndon at orthanc

Nov 3, 2009, 5:49 PM

Post #6 of 41 (1445 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

> FWIW: http://www.he.net/releases/release18.html

How long can they go on those 3000 gallons under their current
load?


max.clark at gmail

Nov 3, 2009, 7:15 PM

Post #7 of 41 (1445 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

http://www.dieselserviceandsupply.com/Diesel_Fuel_Consumption.aspx

On Tue, Nov 3, 2009 at 5:49 PM, Lyndon Nerenberg (VE6BBM/VE7TFX)
<lyndon [at] orthanc> wrote:
>> FWIW: http://www.he.net/releases/release18.html
>
> How long can they go on those 3000 gallons under their current
> load?
>
>
>


stef-list at memberwebs

Nov 3, 2009, 7:33 PM

Post #8 of 41 (1444 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

Jeffrey Lyon wrote:
> FWIW: http://www.he.net/releases/release18.html

No date on that 'press release' but the way back machine helps put it
somewhere in 2002. A lot of good this "Alameda" sized generator has done
recently...

http://web.archive.org/web/*/http://www.he.net/releases/release18.html

Cheers,

Stef


jgreco at ns

Nov 3, 2009, 8:03 PM

Post #9 of 41 (1446 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

> Jeffrey Lyon wrote:
> > FWIW: http://www.he.net/releases/release18.html
>
> No date on that 'press release' but the way back machine helps put it
> somewhere in 2002. A lot of good this "Alameda" sized generator has done
> recently...
>
> http://web.archive.org/web/*/http://www.he.net/releases/release18.html

2MW isn't super huge or anything. I would expect that, given the size
I have been led to believe HE is, they've got a lot more than that now.

My memory is that Alameda isn't huge, but it isn't small either. I'm
not sure .. ah, here

http://www.reuters.com/article/pressRelease/idUS179594+03-Apr-2009+BW20090403

peak 70MW

I'm not sure what the basis for the claim is that a 2MW generator is
"large enough to power the entire city of Alameda" ... 2MW gensets are
common enough in this business and it's possible to burn through 2MW in
a few hundred racks. It isn't *that* much power.

A more conventional comparison might be to something like a hospital; one
of our local hospitals installed a 1.25MW generator which, IIRC, powers
all critical circuits.

http://hhenergyservices.com/electrical/photos.php?category_id=2845&subcategory_id=5027&id=196&number=7

Sometimes it is easier to picture things that way.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


tico-nanog at raapid

Nov 3, 2009, 11:09 PM

Post #10 of 41 (1439 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

Joe Greco wrote:
>> Jeffrey Lyon wrote:
>>
>>> FWIW: http://www.he.net/releases/release18.html
>>>
>> No date on that 'press release' but the way back machine helps put it
>> somewhere in 2002. A lot of good this "Alameda" sized generator has done
>> recently...
>>
>> http://web.archive.org/web/*/http://www.he.net/releases/release18.html
>>
>
> 2MW isn't super huge or anything. I would expect that, given the size
> I have been led to believe HE is, they've got a lot more than that now.
>
> My memory is that Alameda isn't huge, but it isn't small either. I'm
> not sure .. ah, here
>
> http://www.reuters.com/article/pressRelease/idUS179594+03-Apr-2009+BW20090403
>
> peak 70MW
>
> I'm not sure what the basis for the claim is that a 2MW generator is
> "large enough to power the entire city of Alameda" ... 2MW gensets are
> common enough in this business and it's possible to burn through 2MW in
> a few hundred racks. It isn't *that* much power.
>
> A more conventional comparison might be to something like a hospital; one
> of our local hospitals installed a 1.25MW generator which, IIRC, powers
> all critical circuits.
>
> http://hhenergyservices.com/electrical/photos.php?category_id=2845&subcategory_id=5027&id=196&number=7
>
> Sometimes it is easier to picture things that way.
>

Regardless of generator sizing issues or disparities, if the ATS fails,
then no amount of grid or generator power will keep the cabinets juiced up.

Since this is the second time in recent history that this building has
experienced a short power outage caused by ATS flakiness, perhaps
keeping a small UPS in the cabinet isn't such a bad idea? Even if the
distribution switches/routers lose power, at least the servers wouldn't
have to go through fscks and DB integrity checks due to unplanned power
loss, and the recovery time would be significantly faster.

Hell, for a 5 minute power outage, some of my services were down for 20
minutes. I'll happily take a 75% reduction in downtime for the cost of a UPS
, though clearly redundancy across more reliable datacenters is a better
solution.


> ... JG
>


msa at latt

Nov 3, 2009, 11:17 PM

Post #11 of 41 (1443 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

On Wed, Nov 04, 2009 at 07:09:48AM +0000, Tico wrote:
> Since this is the second time in recent history that this building
> has experienced a short power outage caused by ATS flakiness,
> perhaps keeping a small UPS in the cabinet isn't such a bad idea?

It sounds like a great idea....until one of those small
UPSes smokes out, triggering the fire suppression (or at least
preaction), possibly also causing the power to be cut to the floor.

The customer with the small UPS that smoked out generally
does not like receiving the bill for everyone else's equipment
cleaning, too.

--msa


scott at doc

Nov 3, 2009, 11:22 PM

Post #12 of 41 (1442 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

On Tue, Nov 3, 2009 at 11:09 PM, Tico <tico-nanog [at] raapid> wrote:

> Since this is the second time in recent history that this building has
> experienced a short power outage caused by ATS flakiness, perhaps keeping a
> small UPS in the cabinet isn't such a bad idea? Even if


Although this time it was "short", the outage 5 weeks ago was about 90
minutes.

Scott


jgreco at ns

Nov 3, 2009, 11:23 PM

Post #13 of 41 (1445 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

> Regardless of generator sizing issues or disparities, if the ATS fails,
> then no amount of grid or generator power will keep the cabinets juiced up.

Sure. Having no direct knowledge of the HE DC in question, I was merely
commenting on the issue I replied to.

> Since this is the second time in recent history that this building has
> experienced a short power outage caused by ATS flakiness,

Has this been verified?

> perhaps
> keeping a small UPS in the cabinet isn't such a bad idea? Even if the
> distribution switches/routers lose power, at least the servers wouldn't
> have to go through fscks and DB integrity checks due to unplanned power
> loss, and the recovery time would be significantly faster.

Small UPS's have their own set of ugly failure modes. For example, we
find that the APC Smart-UPS 1400's have a tendency to cook their
batteries; if you don't have monitoring of some sort, you may not find
out that your batteries are cooked until the UPS decides it is hopeless
and shuts itself off. In the meantime, the lingering sulfur smell may
panic someone... or cause a falsing of the fire system...

Colos frequently forbid the use of small UPS's for a variety of reasons.

> Hell, for a 5 minute power outage, some of my services were down for 20
> minutes. I'll happily take a 75% reduction in downtime for the cost of a UPS
> , though clearly redundancy across more reliable datacenters is a better
> solution.

So is redundancy across power systems within the colo, but only for well-
designed colos. Stories omitted.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


dpeterson at sixapart

Nov 3, 2009, 11:36 PM

Post #14 of 41 (1438 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

> Colos frequently forbid the use of small UPS's for a variety of
> reasons.

In my experience they always need to be connected to the EPO switch,
which poses it's own risks. Plus try to find a UPS with that feature
for reasonable prices.

Which leads me to this question: What questions do you ask any
potential colocation provider to determine if they are built out to
your needs?

-Dave


jgreco at ns

Nov 4, 2009, 12:36 AM

Post #15 of 41 (1440 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

> > Colos frequently forbid the use of small UPS's for a variety of
> > reasons.
>
> In my experience they always need to be connected to the EPO switch,
> which poses it's own risks. Plus try to find a UPS with that feature
> for reasonable prices.

APC says it's available on the SUA2200RM2U and SUA3000RM2U, and lists
it as optional for the APC SUA1500RM2U.

I would consider all of these to be reasonably priced.

> Which leads me to this question: What questions do you ask any
> potential colocation provider to determine if they are built out to
> your needs?

See if they'll guarantee diverse power as part of the contract. :-)
It's disappointing to find a colo that feeds you your primary and
redundant power off the same UPS.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


dan.syn.ack at gmail

Nov 4, 2009, 8:00 AM

Post #16 of 41 (1424 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

On Wed, Nov 4, 2009 at 7:09 AM, Tico <tico-nanog [at] raapid> wrote:

>
>> Sometimes it is easier to picture things that way.
>>
>>
>
> Regardless of generator sizing issues or disparities, if the ATS fails,
> then no amount of grid or generator power will keep the cabinets juiced up.
>
> Since this is the second time in recent history that this building has
> experienced a short power outage caused by ATS flakiness, perhaps keeping a
> small UPS in the cabinet isn't such a bad idea? Even if the distribution
> switches/routers lose power, at least the servers wouldn't have to go
> through fscks and DB integrity checks due to unplanned power loss, and the
> recovery time would be significantly faster.
>
> Hell, for a 5 minute power outage, some of my services were down for 20
> minutes. I'll happily take a 75% reduction in downtime for the cost of a UPS
> , though clearly redundancy across more reliable datacenters is a better
> solution.
>
>
Maybe some of us [[soon-to-be-]ex-]customers of Hurricane can bake them a
cake and beg for UPSes.
Or reliable power.
Or for someone to actually answer the voicemails much less phone calls
within even a few hours of an outage.
Or for there to be at the very least a status page notifying customers that
they are, in fact, screwed, and for how long, and that it's useless to
continue trying to get through at such time.

Who's with me?


michael.holstein at csuohio

Nov 4, 2009, 8:19 AM

Post #17 of 41 (1423 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

>> FWIW: http://www.he.net/releases/release18.html
>>
>
>

Well, they say it's a Cat unit, so probably one like this :
http://www.cat.com/cda/components/securedFile/displaySecuredFileServletJSP?x=7&fileId=1081064

> How long can they go on those 3000 gallons under their current
> load?
>
>

That engine is rated to consume 70.9g/hr at 50% .. so using a
conservative estimate, I'd say about 42 hours.


Cheers,

Michael Holstein
Cleveland State University
>
>


mpetach at netflight

Nov 4, 2009, 9:56 AM

Post #18 of 41 (1422 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

On Wed, Nov 4, 2009 at 8:19 AM, Michael Holstein
<michael.holstein [at] csuohio> wrote:
>>> FWIW: http://www.he.net/releases/release18.html
>>>
> Well, they say it's a Cat unit, so probably one like this :
> http://www.cat.com/cda/components/securedFile/displaySecuredFileServletJSP?x=7&fileId=1081064
>
>> How long can they go on those 3000 gallons under their current
>> load?
>>
> That engine is rated to consume 70.9g/hr at 50% .. so using a conservative
> estimate, I'd say about 42 hours.

Wouldn't the conservative estimate be 21 hours? (3000 gallons, 142
gal/hr at 100% load);
you'd get more hours out by guessing at what fraction of full load the
generator is
running, but anything longer than 21 hours is fudge-factor guesstimate
based, and not
to be counted on.

> Cheers,
>
> Michael Holstein
> Cleveland State University

Matt


mksmith at adhost

Nov 4, 2009, 10:21 AM

Post #19 of 41 (1411 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

On 11/4/09 11:44 AM, "Alex Rubenstein" <alex [at] corp> wrote:

>>> Regardless of generator sizing issues or disparities, if the ATS fails,
>>> then no amount of grid or generator power will keep the cabinets juiced up.
>
> That is patently false.
>
At it's root it's true - if an ATS fails the power between the source and
destination will be interrupted.

> Assume N+1 UPS, with each UPS module having its own ATS fed from a utility and
> emergency bus. Then you can even individually maintain each UPS module and
> ATS. Bonus and score.
>
> And if it's a really good place, you have two of the above (2(n+1)) and each
> of your power cords goes to one each.
>
Which doesn't address the failure of one piece of equipment. Of course, if
you're dual chorded from your server through fully redundant switch gear to
multiple, diverse vaults then a single ATS failure shouldn't affect you.

Regards,

Mike


jgreco at ns

Nov 4, 2009, 10:26 AM

Post #20 of 41 (1418 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

> On Wed, Nov 4, 2009 at 8:19 AM, Michael Holstein
> <michael.holstein [at] csuohio> wrote:
> >>> FWIW: http://www.he.net/releases/release18.html
> >>>
> > Well, they say it's a Cat unit, so probably one like this :
> > http://www.cat.com/cda/components/securedFile/displaySecuredFileServletJSP?x=7&fileId=1081064
> >
> >> How long can they go on those 3000 gallons under their current
> >> load?
> >>
> > That engine is rated to consume 70.9g/hr at 50% .. so using a conservative
> > estimate, I'd say about 42 hours.
>
> Wouldn't the conservative estimate be 21 hours? (3000 gallons, 142
> gal/hr at 100% load);
> you'd get more hours out by guessing at what fraction of full load the
> generator is
> running, but anything longer than 21 hours is fudge-factor guesstimate
> based, and not
> to be counted on.

The mildly conservative estimate is 21 hours minus the guaranteed
turnaround time for your fuel vendor to show up, minus some more
fudge factor to allow for someone to actually hook up and actually
refuel, etc.

The paranoid conservative estimate is more complex; you have to assume
you call the primary vendor, they don't show, and then you have to
call your backup(s). If you have a three hour guarantee in the contract,
you have to remember that this can still represent some scrambling by
your vendor, and if you're lights out, it's quite possible that others
are as well, and hospitals and city hall might rate as more urgent.
It's also possible that the truck'll have a flat, mechanical problems,
or try to rush through the railroad crossing about to be rendered
unpassable by a slow-moving freight train. It'll probably take you
an additional hour to panic and call your backup supplier; now you are
a bunch of hours shorter on capacity than you thought.

Of course, a lot of this is simply how you look at the problem. If
we're talking runtime-until-dry, yeah, 21 hours. If we're talking a
practical number of how long can you go until it's proper for some
panic to set in and calls to get made, it's more like half that. ;-)

With power:

N+1 is usually better than N
Best to assume full load when doing math
Things will go wrong, predict common failures
The best plans are still prone to failure
Safety margins can save your rear
etc

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


sethm at rollernet

Nov 4, 2009, 10:39 AM

Post #21 of 41 (1411 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

Joe Greco wrote:
>
> With power:
>
> N+1 is usually better than N
> Best to assume full load when doing math
> Things will go wrong, predict common failures
> The best plans are still prone to failure
> Safety margins can save your rear
> etc
>

I find that electrical panelboards, busways, transfer switches, etc. are
often put in the category of things that don't need maintenance or
routine inspections. Big deal if you can start your fancy generator once
a month (I prefer on-load weekly) but the in between stuff is in
disrepair or full of mice. Even a simple dusty transfer switch could arc
weld itself to once side of the contacts.

~Seth


scott at doc

Nov 4, 2009, 10:41 AM

Post #22 of 41 (1417 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

Has anyone managed to get a root cause from HE yet regarding what happened?

I'm still waiting for them to get back to me over 24 hours later...

Scott


On Tue, Nov 3, 2009 at 10:50 AM, Tico <tico-nanog [at] raapid> wrote:

> I can't get through to Hurricane Electric, and they seem to be having an
> outage at their Fremont-2 facility again (as of 17:30 UTC or thereabouts) --
> ticket system is unanswered, phones go to voicemail, all equipment is
> unreachable.
>
> Does anyone here have a presence at 48233 Warm Springs Blvd, that can
> provide any information about this? I got hit by the ATS failure last month,
> so I guess it's possible that that equipment may have flaked again.
>
> -t
>
>


alex at corp

Nov 4, 2009, 10:44 AM

Post #23 of 41 (1412 views)
Permalink
RE: HE.net, Fremont-2 outage? [In reply to]

> > Regardless of generator sizing issues or disparities, if the ATS fails,
> > then no amount of grid or generator power will keep the cabinets juiced up.

That is patently false.

Assume N+1 UPS, with each UPS module having its own ATS fed from a utility and emergency bus. Then you can even individually maintain each UPS module and ATS. Bonus and score.

And if it's a really good place, you have two of the above (2(n+1)) and each of your power cords goes to one each.





"Question everything, assume nothing, discuss all, and resolve quickly."

-- Alex Rubenstein, AR97, K2AHR, alex [at] nac, latency, Al Reuben --
-- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --


jgreco at ns

Nov 4, 2009, 10:54 AM

Post #24 of 41 (1419 views)
Permalink
Re: HE.net, Fremont-2 outage? [In reply to]

> Joe Greco wrote:
> >
> > With power:
> >
> > N+1 is usually better than N
> > Best to assume full load when doing math
> > Things will go wrong, predict common failures
> > The best plans are still prone to failure
> > Safety margins can save your rear
> > etc
>
> I find that electrical panelboards, busways, transfer switches, etc. are
> often put in the category of things that don't need maintenance or
> routine inspections. Big deal if you can start your fancy generator once
> a month (I prefer on-load weekly) but the in between stuff is in
> disrepair or full of mice. Even a simple dusty transfer switch could arc
> weld itself to once side of the contacts.

Yup. Related: "100% availability" is a marketing person's dream; it
sounds good in theory but is unattainable in practice, and is a reliable
sign of non-100%-reliability.

The most common way to gain "100% availability" is to avoid testing
under load. This surely protects the equipment against a whole slew of
failures in the less-used portions of your power systems, but also
protects you from detecting them outside your Hour(s) Of Greatest Need.

And even for those who follow best practices... You can inspect and
maintain things until you're blue in the face. One day a contractor
will drop a wrench into a PDU or UPS or whatever and spectacular things
will happen. Or a battery develops a strange fault.

You do live load testing, you'll lose now and then. It's best to simply
assume no single circuit is 100% reliable. You should be able to get
two circuits from separate power systems and the combination of the two
should really closely approximate 100%, but even there... it isn't.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


alex at corp

Nov 4, 2009, 11:06 AM

Post #25 of 41 (1417 views)
Permalink
RE: HE.net, Fremont-2 outage? [In reply to]

> Yup. Related: "100% availability" is a marketing person's dream; it
> sounds good in theory but is unattainable in practice, and is a
> reliable sign of non-100%-reliability.

You are confusing two different things.

Availability != Reliability.

For instance, an airplane is designed to be 100% reliable, but much less available. To keep a 747 from not crashing (100% reliability) it needs significant downtime (not 100% available).



> And even for those who follow best practices... You can inspect and
> maintain things until you're blue in the face. One day a contractor
> will drop a wrench into a PDU or UPS or whatever and spectacular things
> will happen.

That's were policies, procedures and methods come in (read: SAS70)


> Or a battery develops a strange fault.

Get more than one string, one more than one UPS, with monitoring. Batteries are NOT the Achilles heel everyone wants to make you believe they are.




"Question everything, assume nothing, discuss all, and resolve quickly."

-- Alex Rubenstein, AR97, K2AHR, alex [at] nac, latency, Al Reuben --
-- Net Access Corporation, 800-NET-ME-36, http://www.nac.net --

First page Previous page 1 2 Next page Last page  View All NANOG users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.