Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: NANOG: users

cost of misconfigurations

 

 

NANOG users RSS feed   Index | Next | Previous | View Threaded


yuksem at cse

Aug 1, 2012, 4:27 PM

Post #1 of 11 (927 views)
Permalink
cost of misconfigurations

Hi all,

I am looking for literature on the (monetary) costs of misconfigurations in an operational ISP network. Are there any such studies I can benefit from?

In a larger context, are there any thorough studies exploring the cost of building and running a large ISP network?

Best,

-Murat
========================================
Murat Yuksel
Associate Professor
Graduate Director
Department of Computer Science and Engineering
University of Nevada, Reno
1664 N. Virginia Street, MS 171, Reno, NV 89557.
Phone: +1 (775) 327 2246, Fax: +1 (775) 784 1877
E-mail: yuksem [at] cse
Web: http://www.cse.unr.edu/~yuksem
========================================


diogo.montagner at gmail

Aug 1, 2012, 5:08 PM

Post #2 of 11 (910 views)
Permalink
Re: cost of misconfigurations [In reply to]

Hi Murat,

I never saw any literature about this topic. But I think it is not too
difficult to calculate (or estimate).

A misconfiguration will, at least, impact on two points: network
outage and re-work. For the network outage, you have to use the SLAs
to calculate the cost (how much you lost from the customers' revenue)
due to that outage. On the other hand, there is the time efforts spent
to fix the misconfiguration. Under the fix, it could be removing the
misconfig and applying a new one correct. Or just fixing the misconfig
targeting the correct config. This re-work will translate in time, and
time can be translated in money spent.

Regards

On 8/2/12, Murat Yuksel <yuksem [at] cse> wrote:
> Hi all,
>
> I am looking for literature on the (monetary) costs of misconfigurations in
> an operational ISP network. Are there any such studies I can benefit from?
>
> In a larger context, are there any thorough studies exploring the cost of
> building and running a large ISP network?
>
> Best,
>
> -Murat
> ========================================
> Murat Yuksel
> Associate Professor
> Graduate Director
> Department of Computer Science and Engineering
> University of Nevada, Reno
> 1664 N. Virginia Street, MS 171, Reno, NV 89557.
> Phone: +1 (775) 327 2246, Fax: +1 (775) 784 1877
> E-mail: yuksem [at] cse
> Web: http://www.cse.unr.edu/~yuksem
> ========================================
>
>

--
Sent from my mobile device

./diogo -montagner
JNCIE-SP 0x41A


djahandarie at gmail

Aug 1, 2012, 5:13 PM

Post #3 of 11 (905 views)
Permalink
Re: cost of misconfigurations [In reply to]

On Wed, Aug 1, 2012 at 8:08 PM, Diogo Montagner
<diogo.montagner [at] gmail> wrote:
> A misconfiguration will, at least, impact on two points: network
> outage and re-work. For the network outage, you have to use the SLAs
> to calculate the cost (how much you lost from the customers' revenue)
> due to that outage. On the other hand, there is the time efforts spent
> to fix the misconfiguration. Under the fix, it could be removing the
> misconfig and applying a new one correct. Or just fixing the misconfig
> targeting the correct config. This re-work will translate in time, and
> time can be translated in money spent.

Isn't the largest cost omitted (or at least glossed over) here?
Namely, lost customers due to the outage. That's why people have SLAs
and rework the network at all -- to avoid that cost.


--
Darius Jahandarie


randy at psg

Aug 1, 2012, 5:23 PM

Post #4 of 11 (906 views)
Permalink
Re: cost of misconfigurations [In reply to]

> I am looking for literature on the (monetary) costs of
> misconfigurations in an operational ISP network. Are there any such
> studies I can benefit from?

jgs, who should know, says 42 quatloos

randy


diogo.montagner at gmail

Aug 1, 2012, 5:32 PM

Post #5 of 11 (904 views)
Permalink
Re: cost of misconfigurations [In reply to]

Hi Darius,

You are right. The lost of a customer due to those things. However, I
would classify this as an unknown situation (in terms of risk
analisys) because the others I mentioned are possible to calculate and
estimate (they are known). But it is very hard to estimate if a
customer will cancel the contract because 1 or n network outages. In
theory, if the customer SLA is not being met consecutively, there is a
potential probability he will cancel the contract.

Regards

On 8/2/12, Darius Jahandarie <djahandarie [at] gmail> wrote:
> On Wed, Aug 1, 2012 at 8:08 PM, Diogo Montagner
> <diogo.montagner [at] gmail> wrote:
>> A misconfiguration will, at least, impact on two points: network
>> outage and re-work. For the network outage, you have to use the SLAs
>> to calculate the cost (how much you lost from the customers' revenue)
>> due to that outage. On the other hand, there is the time efforts spent
>> to fix the misconfiguration. Under the fix, it could be removing the
>> misconfig and applying a new one correct. Or just fixing the misconfig
>> targeting the correct config. This re-work will translate in time, and
>> time can be translated in money spent.
>
> Isn't the largest cost omitted (or at least glossed over) here?
> Namely, lost customers due to the outage. That's why people have SLAs
> and rework the network at all -- to avoid that cost.
>
>
> --
> Darius Jahandarie
>

--
Sent from my mobile device

./diogo -montagner
JNCIE-SP 0x41A


mysidia at gmail

Aug 1, 2012, 6:14 PM

Post #6 of 11 (912 views)
Permalink
Re: cost of misconfigurations [In reply to]

On 8/1/12, Diogo Montagner <diogo.montagner [at] gmail> wrote:

I think it's more complicated than that, the cost of misconfiguration
is almost inseparable
in some cases from the cost of configuration in general.; not all
misconfigs are equal, so you might want to concentrate on a specific
kind of misconfiguration, or a specific misconfig impact "E.g. an
erroneous filter is applied, causing routes to be accepted from an EGP
peer without restriction". Esp. with misconfigurations that might not
have an immediately discovered impact, business impact beyond cost
to discover and resolve may not be apparent, which depend on details
of the misconfig, such as how trivial or 'obvious' the error
should be, how consistent the problems it causes.

At least if you concetrate on a certain specific type of misconfig and
specific impact, you can have a basis for comparison and
approximation, for just that type though.


The "fix" to some types of misconfigs might sometimes be to update the
design documentation, so the "misconfig" is no longer a
misconfiguration; so then you can start asking about how you
define "misconfig" in the first place, and the costs of having
erroneous or missing documentation.

Which is hard, because the "costs" of updating documentation and
finding errors, less than best/optimal practices, or improvements
possible in configurations, are effected by long term "costs" or
loss of efficiencies resulting from failing to correct
documentation, and failing to review and improve arguably
suboptimal configurations.


Some misconfigs or suboptimal configs are discovered by review or
other measures before there is any operational impact. Some
misconfigs are "safe" or "harmless" by coincidence, but can cause
issues later when the network is expanded farther according to design
that does not anticipate the misconfig, so the cost there is
increased risk.

Not all possible misconfigurations of a network cause an outage, some
misconfigurations are actually design errors, not operator errors;
not all network issues are outages, some configuration errors are
just things like

"Some entries in an access-list that are dead-weight, e.g. can never
be reached, or is not necessary"; and the impact of this error is
wasted memory resources, or increased complexity / more unnecessary
stuff for humans to look at.

(The entry might not have been dead-weight when originally added.)
Correcting the deadweight ACL entry situation then is an improvement
in efficiency.

Not all misconfigurations are detected, either, possibly, sometimes
even misconfigs that caused issues.

An example of a misconfiguration that would occur frequently in some
kinds of environments and might not break an uptime SLA, would be
suboptimal performance, less cost-effectiveness (E.g. early
upgrade required due to an unrecognized misconfiguration).

Or configuration deadweight utilizing so much memory, that hardware
upgrades become needed. On some networks, there might not be a
formal SLA, and the end user might not notice or take issue with it.

Loss of fault resilience (E.g. failover path won't work); no SLA is
violated if the
fault tolerance wasn't required by the SLA, and the configuration
error might go undetected
for years if there was not regular failover testing performed.

It might be corrected before there is an issue... then the cost of
"Increased risk" during the period, in which the misconfig wasn't
service-effecting could be quite nebulous.

> I never saw any literature about this topic. But I think it is not too
> difficult to calculate (or estimate).
[snip]
> A misconfiguration will, at least, impact on two points: network
> outage and re-work. For the network outage, you have to use the SLAs
> to calculate the cost (how much you lost from the customers' revenue)
> due to that outage. On the other hand, there is the time efforts spent
> to fix the misconfiguration. Under the fix, it could be removing the
[snip]

--
-JH


george.herbert at gmail

Aug 1, 2012, 6:16 PM

Post #7 of 11 (899 views)
Permalink
Re: cost of misconfigurations [In reply to]

On Wed, Aug 1, 2012 at 5:32 PM, Diogo Montagner
<diogo.montagner [at] gmail> wrote:
> Hi Darius,
>
> You are right. The lost of a customer due to those things. However, I
> would classify this as an unknown situation (in terms of risk
> analisys) because the others I mentioned are possible to calculate and
> estimate (they are known). But it is very hard to estimate if a
> customer will cancel the contract because 1 or n network outages. In
> theory, if the customer SLA is not being met consecutively, there is a
> potential probability he will cancel the contract.
>
> Regards

On the end customer side, I've done a bunch of reliability / risk cost
assessments for various customers over the years. It's never easy.

For an ISP... customers are fairly locked in, but for big networks and
customers, especially multihoming customers, business goes where they
want it.

SLA costs are easy. Predicting the final financial impact is hard.


--
-george william herbert
george.herbert [at] gmail


simon.knight at gmail

Aug 1, 2012, 9:22 PM

Post #8 of 11 (890 views)
Permalink
Re: cost of misconfigurations [In reply to]

Quantifying the business costs would be very complex.

Here are some reports and research papers that may be a starting point:
[1] Juniper Networks, Inc., “What's Behind Network Downtime?,” pp.
1–12, May 2008.
[2] R. Mahajan, D. Wetherall, and T. Anderson, “Understanding BGP
misconfiguration,” Proceedings of the 2002 conference on Applications,
2002.
[3] A. Medem, R. Teixeira, N. Feamster, and M. Meulle, “Joint analysis
of network incidents and intradomain routing changes,” Network and
Service Management (CNSM), 2010 International Conference on, pp.
198–205, 2010.
[4] D. Turner, K. Levchenko, A. C. Snoeren, and S. Savage, “California
fault lines: understanding the causes and impact of network failures,”
presented at the SIGCOMM '10: Proceedings of the ACM SIGCOMM 2010
conference on SIGCOMM, 2010.
[5] Z. Yin, X. Ma, J. Zheng, Y. Zhou, L. N. Bairavasundaram, and S.
Pasupathy, “An empirical study on configuration errors in commercial
and open source systems,” presented at the SOSP '11: Proceedings of
the Twenty-Third ACM Symposium on Operating Systems Principles, 2011.
[6] Z. Kerravala, “As the Value of Enterprise Networks Escalates,
So
Does the Need for Configuration Management
,” cs.princeton.edu, 01-Jan.-2004. [Online]. Available:
https://www.cs.princeton.edu/courses/archive/fall10/cos561/papers/Yankee04.pdf.
[Accessed: 09-May-2012].
[7] W. Enck, P. McDaniel, S. Sen, and P. Sebos, “Configuration
management at massive scale: System design and experience,” USENIX
'07, Jun. 2007.
[8] R. D. Doverspike, K. K. Ramakrishnan, and C. Chase, “Structural
overview of ISP networks,” Guide to Reliable Internet Services and
Applications, pp. 19–93, 2010.


On 2 August 2012 10:46, George Herbert <george.herbert [at] gmail> wrote:
> On Wed, Aug 1, 2012 at 5:32 PM, Diogo Montagner
> <diogo.montagner [at] gmail> wrote:
>> Hi Darius,
>>
>> You are right. The lost of a customer due to those things. However, I
>> would classify this as an unknown situation (in terms of risk
>> analisys) because the others I mentioned are possible to calculate and
>> estimate (they are known). But it is very hard to estimate if a
>> customer will cancel the contract because 1 or n network outages. In
>> theory, if the customer SLA is not being met consecutively, there is a
>> potential probability he will cancel the contract.
>>
>> Regards
>
> On the end customer side, I've done a bunch of reliability / risk cost
> assessments for various customers over the years. It's never easy.
>
> For an ISP... customers are fairly locked in, but for big networks and
> customers, especially multihoming customers, business goes where they
> want it.
>
> SLA costs are easy. Predicting the final financial impact is hard.
>
>
> --
> -george william herbert
> george.herbert [at] gmail
>


EWieling at nyigc

Aug 2, 2012, 4:08 AM

Post #9 of 11 (887 views)
Permalink
RE: cost of misconfigurations [In reply to]

I do not think occasional outages cause significant loss of customers. Customers get angry easily, but once an issue is fixed, they get happy quickly. Customers have very short memories and the cost and hassle of changing services is often significant. Outages are never good, but it is better to concentrate on fixing the issue than panic about customers canceling their service.

Many times the cause of an outage is totally out of your control. For example, most of our outages are caused by Verizon's aging and neglected copper cable plant. I often wish some company had the balls to file a class action lawsuit over Verizon's neglect of their copper plant, but NOBODY wants to piss off their ILEC, including us.

-----Original Message-----
From: Diogo Montagner [mailto:diogo.montagner [at] gmail]
Sent: Wednesday, August 01, 2012 8:32 PM
To: Darius Jahandarie; Murat Yuksel; nanog [at] nanog
Subject: Re: cost of misconfigurations

Hi Darius,

You are right. The lost of a customer due to those things. However, I would classify this as an unknown situation (in terms of risk
analisys) because the others I mentioned are possible to calculate and estimate (they are known). But it is very hard to estimate if a customer will cancel the contract because 1 or n network outages. In theory, if the customer SLA is not being met consecutively, there is a potential probability he will cancel the contract.

Regards

On 8/2/12, Darius Jahandarie <djahandarie [at] gmail> wrote:
> On Wed, Aug 1, 2012 at 8:08 PM, Diogo Montagner
> <diogo.montagner [at] gmail> wrote:
>> A misconfiguration will, at least, impact on two points: network
>> outage and re-work. For the network outage, you have to use the SLAs
>> to calculate the cost (how much you lost from the customers' revenue)
>> due to that outage. On the other hand, there is the time efforts
>> spent to fix the misconfiguration. Under the fix, it could be
>> removing the misconfig and applying a new one correct. Or just fixing
>> the misconfig targeting the correct config. This re-work will
>> translate in time, and time can be translated in money spent.
>
> Isn't the largest cost omitted (or at least glossed over) here?
> Namely, lost customers due to the outage. That's why people have SLAs
> and rework the network at all -- to avoid that cost.
>
>
> --
> Darius Jahandarie
>

--
Sent from my mobile device

./diogo -montagner
JNCIE-SP 0x41A


ralph.brandt at pateam

Aug 2, 2012, 7:31 AM

Post #10 of 11 (883 views)
Permalink
RE: cost of misconfigurations [In reply to]

The misconfiguration cost is usually not calculable in itself. But I
think the more important issue is, "How do we prevent it?" I would
spend more time on prevention than assessing the cost.

I can think of several minor provisioning issues that cost us more in
customer relations than everything else put together and a couple
significant ones that seemed like nothing happened. And I am not sure I
could have predicted the outcome the day before the event if someone had
handed me the scenario to assess it. Reason, when it happens the
CURRENT situation is as much a driver of the impact as is the actual
event. It even goes back to the emotional state of the customer and
maybe if his toast was burned this morning, if he/she had a fight with
the spouse, who flipped him the bird during his drive in and a lot of
other things that dictate mental state.

I would be very lax to use a vendor who is taking an approach that all
they are concerned about is what an error costs them. I want them to be
more concerned about what that costs their customer (me) and what they
can do to prevent it.

Proper Prior Preparation Prevents Piss Poor Performance.

Training, sound processes, good management practices, good maintenance,
good personnel selection go a long way.

To quote Chief Gassaway (fire chief with good stuff on the web for any
business) "Luck validates bad practices." The REB translation, "We did
it this way for years and nothing bad happened."

In Chief Gassaway's business, bad practices cause Line of Duty Deaths.
In ours it causes outages, lost revenue and possibly bankruptcy.
Remember, if your company goes belly up, you are out of a job...

http://www.samatters.com/2012/07/31/positive-reinforcement-of-undesirabl
e-behavior/


Ralph Brandt

-----Original Message-----
From: George Herbert [mailto:george.herbert [at] gmail]
Sent: Wednesday, August 01, 2012 9:17 PM
To: Diogo Montagner
Cc: nanog [at] nanog
Subject: Re: cost of misconfigurations

On Wed, Aug 1, 2012 at 5:32 PM, Diogo Montagner
<diogo.montagner [at] gmail> wrote:
> Hi Darius,
>
> You are right. The lost of a customer due to those things. However, I
> would classify this as an unknown situation (in terms of risk
> analisys) because the others I mentioned are possible to calculate and
> estimate (they are known). But it is very hard to estimate if a
> customer will cancel the contract because 1 or n network outages. In
> theory, if the customer SLA is not being met consecutively, there is a
> potential probability he will cancel the contract.
>
> Regards

On the end customer side, I've done a bunch of reliability / risk cost
assessments for various customers over the years. It's never easy.

For an ISP... customers are fairly locked in, but for big networks and
customers, especially multihoming customers, business goes where they
want it.

SLA costs are easy. Predicting the final financial impact is hard.


--
-george william herbert
george.herbert [at] gmail


jared at puck

Aug 9, 2012, 7:43 AM

Post #11 of 11 (840 views)
Permalink
Re: cost of misconfigurations [In reply to]

On Aug 2, 2012, at 10:31 AM, Brandt, Ralph wrote:

> The misconfiguration cost is usually not calculable in itself. But I
> think the more important issue is, "How do we prevent it?" I would
> spend more time on prevention than assessing the cost.

Lots of people have developed best practices on these topics. The
problem is pushing against the business side and keeping these in
place, and not letting the bar be low at your upstream and peers.

There is a secondary issue that is yet still unaddressed. Some vendors
still send all routes they receive out to all external peers in the
absence of a policy. This is something I want to see corrected as it
will require a bit more intelligence when it comes to BGP policy to
provide the expected behavior.

- Jared

NANOG users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.