Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: NANOG: users

dns and software, was Re: Reliable Cloud host ?

 

 

First page Previous page 1 2 3 Next page Last page  View All NANOG users RSS feed   Index | Next | Previous | View Threaded


drais at icantclick

Feb 27, 2012, 12:43 PM

Post #1 of 58 (1325 views)
Permalink
dns and software, was Re: Reliable Cloud host ?

On Mon, 27 Feb 2012, William Herrin wrote:

> In some cases this is because of carelessness: The application does a
> gethostbyname once when it starts, grabs the first IP address in the
> list and retains it indefinitely. The gethostbyname function doesn't
> even pass the TTL to the application. Ntpd is/used to be one of the
> notable offenders, continuing to poll the dead address for years after
> the server moved.

While yes it often is carelessness - it's been reported by hardcore
development sorts that I trust that there is no standardized API to obtain
the TTL... What needs to get fixed is get[hostbyname,addrinfo,etc] so
programmers have better tools.



--
david raistrick http://www.netmeister.org/news/learn2quote.html
drais [at] icantclick http://www.expita.com/nomime.html


drais at icantclick

Feb 27, 2012, 12:42 PM

Post #2 of 58 (1302 views)
Permalink
dns and software, was Re: Reliable Cloud host ? [In reply to]

On Mon, 27 Feb 2012, William Herrin wrote:

> In some cases this is because of carelessness: The application does a
> gethostbyname once when it starts, grabs the first IP address in the
> list and retains it indefinitely. The gethostbyname function doesn't
> even pass the TTL to the application. Ntpd is/used to be one of the
> notable offenders, continuing to poll the dead address for years after
> the server moved.

While yes it often is carelessness - it's been reported by hardcore
development sorts that I trust that there is no standardized API to obtain
the TTL... What needs to get fixed is get[hostbyname,addrinfo,etc] so
programmers have better tools.



--
david raistrick http://www.netmeister.org/news/learn2quote.html
drais [at] icantclick http://www.expita.com/nomime.html


bill at herrin

Feb 27, 2012, 3:50 PM

Post #3 of 58 (1290 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Mon, Feb 27, 2012 at 3:43 PM, david raistrick <drais [at] icantclick> wrote:
> On Mon, 27 Feb 2012, William Herrin wrote:
>> In some cases this is because of carelessness: The application does a
>> gethostbyname once when it starts, grabs the first IP address in the
>> list and retains it indefinitely. The gethostbyname function doesn't
>> even pass the TTL to the application. Ntpd is/used to be one of the
>> notable offenders, continuing to poll the dead address for years after
>> the server moved.
>
> While yes it often is carelessness - it's been reported by hardcore
> development sorts that I trust that there is no standardized API to obtain
> the TTL...  What needs to get fixed is get[hostbyname,addrinfo,etc] so
> programmers have better tools.

Meh. What should be fixed is that connect() should receive a name
instead of an IP address. Having an application deal directly with the
IP address should be the exception rather than the rule. Then, deal
with the TTL issues once in the standard libraries instead of
repeatedly in every single application.

In theory, that'd even make the app code protocol agnostic so that it
doesn't have to be rewritten yet again for IPv12.

Regards,
Bill Herrin

--
William D. Herrin ................ herrin [at] dirtside  bill [at] herrin
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004


owen at delong

Feb 27, 2012, 4:07 PM

Post #4 of 58 (1297 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Feb 27, 2012, at 3:50 PM, William Herrin wrote:

> On Mon, Feb 27, 2012 at 3:43 PM, david raistrick <drais [at] icantclick> wrote:
>> On Mon, 27 Feb 2012, William Herrin wrote:
>>> In some cases this is because of carelessness: The application does a
>>> gethostbyname once when it starts, grabs the first IP address in the
>>> list and retains it indefinitely. The gethostbyname function doesn't
>>> even pass the TTL to the application. Ntpd is/used to be one of the
>>> notable offenders, continuing to poll the dead address for years after
>>> the server moved.
>>
>> While yes it often is carelessness - it's been reported by hardcore
>> development sorts that I trust that there is no standardized API to obtain
>> the TTL... What needs to get fixed is get[hostbyname,addrinfo,etc] so
>> programmers have better tools.
>
> Meh. What should be fixed is that connect() should receive a name
> instead of an IP address. Having an application deal directly with the
> IP address should be the exception rather than the rule. Then, deal
> with the TTL issues once in the standard libraries instead of
> repeatedly in every single application.
>
> In theory, that'd even make the app code protocol agnostic so that it
> doesn't have to be rewritten yet again for IPv12.
>

While I agree with the principle of what you are trying to say, I would argue
that it should be dealt with in getnameinfo() / getaddrinfo() and not connect().

It is perfectly reasonable for connect() to deal with an address structure.

If people are not using getnameinfo()/getaddrinfo() from the standard libraries,
then, I don't see any reason to believe that they would use connect() from the
standard libraries if it incorporated their functionality.

Owen


bill at herrin

Feb 27, 2012, 4:59 PM

Post #5 of 58 (1294 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Mon, Feb 27, 2012 at 7:07 PM, Owen DeLong <owen [at] delong> wrote:
> On Feb 27, 2012, at 3:50 PM, William Herrin wrote:
>> Meh. What should be fixed is that connect() should receive a name
>> instead of an IP address. Having an application deal directly with the
>> IP address should be the exception rather than the rule. Then, deal
>> with the TTL issues once in the standard libraries instead of
>> repeatedly in every single application.
>>
>> In theory, that'd even make the app code protocol agnostic so that it
>> doesn't have to be rewritten yet again for IPv12.
>
> While I agree with the principle of what you are trying to say, I would argue
> that it should be dealt with in getnameinfo() / getaddrinfo() and not connect().
>
> It is perfectly reasonable for connect() to deal with an address structure.

Yes, well, that's why we're still using a layer 4 protocol (TCP) that
can't dynamically rebind to the protocol level below (IP). God help us
when folks start overriding the ethernet MAC address to force machines
to keep the same IPv6 address that's been hardcoded somewhere or is
otherwise too much trouble to change.

Regards,
Bill Herrin



--
William D. Herrin ................ herrin [at] dirtside  bill [at] herrin
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004


george.herbert at gmail

Feb 27, 2012, 5:12 PM

Post #6 of 58 (1296 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Mon, Feb 27, 2012 at 4:59 PM, William Herrin <bill [at] herrin> wrote:
> ....
> Yes, well, that's why we're still using a layer 4 protocol (TCP) that
> can't dynamically rebind to the protocol level below (IP).

This is somewhat irritating, but on the scale of 0 (all is well) to 10
(you want me to do WHAT with DHCPv6???) this is about a 2.

The application can re-connect from the TCP layer if something wiggy
happens to the layer below. This is an application layer solution, is
well established, and works fine. One just has to notice something's
amiss and retry connection rather than abort the application.

> God help us
> when folks start overriding the ethernet MAC address to force machines
> to keep the same IPv6 address that's been hardcoded somewhere or is
> otherwise too much trouble to change.

It could be worse. Back in the day I worked for a company that did
one of the earlier two-on-motherboard ethernet chip servers. The Boot
PROM (from another vendor) had no clue about multiple ethernet
interfaces. It came up with both interfaces set to the same NVRAM-set
MAC. We wanted to fix it in firmware but kept having issues with
that.

I had to get an init script to rotate the MAC for the second interface
up one, and ensure that it was in the OS and run before the interfaces
got plumbed, get it bundled into the OS distribution, and ensure that
factory MACs were only set to even numbers to start with.

One of these steps ultimately failed rather spectacularly.



--
-george william herbert
george.herbert [at] gmail


marka at isc

Feb 27, 2012, 5:47 PM

Post #7 of 58 (1291 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

In message <CAP-guGVA4eHv0K=U=x2B-WPYDy2RQ7ZE1Di2AHc+dmA_huyGzA [at] mail>,
William Herrin writes:
> On Mon, Feb 27, 2012 at 3:43 PM, david raistrick <drais [at] icantclick> wro=
> te:
> > On Mon, 27 Feb 2012, William Herrin wrote:
> >> In some cases this is because of carelessness: The application does a
> >> gethostbyname once when it starts, grabs the first IP address in the
> >> list and retains it indefinitely. The gethostbyname function doesn't
> >> even pass the TTL to the application. Ntpd is/used to be one of the
> >> notable offenders, continuing to poll the dead address for years after
> >> the server moved.
> >
> > While yes it often is carelessness - it's been reported by hardcore
> > development sorts that I trust that there is no standardized API to obtai=
> n
> > the TTL... =A0What needs to get fixed is get[hostbyname,addrinfo,etc] so
> > programmers have better tools.
>
> Meh. What should be fixed is that connect() should receive a name
> instead of an IP address. Having an application deal directly with the
> IP address should be the exception rather than the rule. Then, deal
> with the TTL issues once in the standard libraries instead of
> repeatedly in every single application.

No. connect() should stay the way it is. Most developers cut and paste
the connection code. It's just that the code they cut and paste is very
old and is often IPv4 only.

> In theory, that'd even make the app code protocol agnostic so that it
> doesn't have to be rewritten yet again for IPv12.

getaddrinfo() man page has IP version agnostic code examples. It
is however simplistic code which doesn't behave well when a address
is unreachable. For examples of how to behave better for TCP see:

https://www.isc.org/community/blog/201101/how-to-connect-to-a-multi-homed-server-over-tcp

Mark
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka [at] isc


matt.addison at lists

Feb 27, 2012, 8:57 PM

Post #8 of 58 (1290 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Feb 27, 2012, at 19:10, Owen DeLong <owen [at] delong> wrote:

>
> On Feb 27, 2012, at 3:50 PM, William Herrin wrote:
>
>> On Mon, Feb 27, 2012 at 3:43 PM, david raistrick <drais [at] icantclick> wrote:
>>> On Mon, 27 Feb 2012, William Herrin wrote:
>>>> In some cases this is because of carelessness: The application does a
>>>> gethostbyname once when it starts, grabs the first IP address in the
>>>> list and retains it indefinitely. The gethostbyname function doesn't
>>>> even pass the TTL to the application. Ntpd is/used to be one of the
>>>> notable offenders, continuing to poll the dead address for years after
>>>> the server moved.
>>>
>>> While yes it often is carelessness - it's been reported by hardcore
>>> development sorts that I trust that there is no standardized API to obtain
>>> the TTL... What needs to get fixed is get[hostbyname,addrinfo,etc] so
>>> programmers have better tools.
>>
>> Meh. What should be fixed is that connect() should receive a name
>> instead of an IP address. Having an application deal directly with the
>> IP address should be the exception rather than the rule. Then, deal
>> with the TTL issues once in the standard libraries instead of
>> repeatedly in every single application.
>>
>> In theory, that'd even make the app code protocol agnostic so that it
>> doesn't have to be rewritten yet again for IPv12.
>
> While I agree with the principle of what you are trying to say, I would argue
> that it should be dealt with in getnameinfo() / getaddrinfo() and not connect().
>
> It is perfectly reasonable for connect() to deal with an address structure.
>
> If people are not using getnameinfo()/getaddrinfo() from the standard libraries,
> then, I don't see any reason to believe that they would use connect() from the
> standard libraries if it incorporated their functionality.

gai/gni do not return TTL values on any platforms I'm aware of, the
only way to get TTL currently is to use a non standard resolver (e.g.
lwres). The issue is application developers not calling gai every time
they connect (due to aforementioned security concerns, at least in the
browser realm), instead opting to hold onto the original resolved
address for unreasonable amounts of time. Modifying gai to provide TTL
has been proposed in the past (dnsop '04) but afaik was shot down to
prevent inconsistencies in the API. Maybe when happy eyeballs
stabilizes someone will propose an API for inclusion in the standard
library that implements HE style connections. Looks like there was
already some talk on v6ops headed this way, but as always there's
resistance to standardizing it.

~Matt


marka at isc

Feb 27, 2012, 9:45 PM

Post #9 of 58 (1289 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

getaddrinfo was designed to be extensible as was struct
addrinfo. Part of the problem with TTL is not data sources
used by getaddrinfo have TTL information. Additionally for
many uses you want to reconnect to the same server rather
than the same name. Note there is nothing to prevent a
getaddrinfo implementation maintaining its own cache though
if I was implementing such a cache I would have a flag to
to force a refresh.

--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka [at] isc


owen at delong

Feb 28, 2012, 1:32 AM

Post #10 of 58 (1277 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Feb 27, 2012, at 9:45 PM, Mark Andrews wrote:

>
> getaddrinfo was designed to be extensible as was struct
> addrinfo. Part of the problem with TTL is not data sources
> used by getaddrinfo have TTL information. Additionally for
> many uses you want to reconnect to the same server rather
> than the same name. Note there is nothing to prevent a
> getaddrinfo implementation maintaining its own cache though
> if I was implementing such a cache I would have a flag to
> to force a refresh.
>

Sorry if I wasn't clear... My point to Bill was that we should be using calls that don't have TTL information
(GAI/GNI in their default forms). That we don't need to abuse connect() to achieve that. That if people use GAI/GNI(), then, any brokenness is system-wide brokenness in the system's resolver library and should be addressed there.

Owen


bill at herrin

Feb 28, 2012, 5:11 AM

Post #11 of 58 (1280 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Tue, Feb 28, 2012 at 12:45 AM, Mark Andrews <marka [at] isc> wrote:
>        getaddrinfo was designed to be extensible as was struct
>        addrinfo.  Part of the problem with TTL is not [all] data sources
>        used by getaddrinfo have TTL information.

Hi Mark,

By the time getaddrinfo replaced gethostbyname, NIS and similar
systems were on their way out. It was reasonably well understood that
many if not most of the calls would return information gained from the
DNS. Depending on how you look at it, choosing not to propagate TTL
knowledge was either a belligerent choice to continue disrespecting
the DNS Time To Live or it was fatalistic acceptance that the DNS TTL
isn't and would not become functional at the application level.

Still works fine deeper in the query system, timing out which server
holds the records though.


>  Additionally for
>        many uses you want to reconnect to the same server rather
>        than the same name.

The SRV record was designed to solve that whole class of problems
without damaging the operation of the TTL. No one uses it.


It's all really very unfortunate. The recipe for SOHO multihoming, the
end of routing table bloat and IP roaming without pivoting off a home
base all boils down to two technologies: (1) a layer 4 protocol that
can dynamically rebind to the layer 3 IP address the same way IP uses
ARP to rebind to a changing ethernet MAC and (2) a DNS TTL that
actually works so that the DNS supports finding a connection's current
IP address.

Regards,
Bill Herrin

--
William D. Herrin ................ herrin [at] dirtside  bill [at] herrin
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004


marka at isc

Feb 28, 2012, 1:06 PM

Post #12 of 58 (1286 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

In message <CAP-guGV09HF7in+vZbKpGk0RR1Q4gpMMo5jQREUZVEj+ewzmkg [at] mail>,
William Herrin writes:
> On Tue, Feb 28, 2012 at 12:45 AM, Mark Andrews <marka [at] isc> wrote:
> > getaddrinfo was designed to be extensible as was struct
> > addrinfo. Part of the problem with TTL is not [all] dat=
> a sources
> > used by getaddrinfo have TTL information.
>
> Hi Mark,
>
> By the time getaddrinfo replaced gethostbyname, NIS and similar
> systems were on their way out. It was reasonably well understood that
> many if not most of the calls would return information gained from the
> DNS. Depending on how you look at it, choosing not to propagate TTL
> knowledge was either a belligerent choice to continue disrespecting
> the DNS Time To Live or it was fatalistic acceptance that the DNS TTL
> isn't and would not become functional at the application level.

No. Propogating TTL is still a issue especially when you do not always
have one. You can't just wave the problem away. As for DNS TTL addresses
are about the only thing which have multiple sources. You also don't
have to use getaddrinfo. It really is designed to be the first step in
connecting to a host. If you need to reconnect you call it again.

> Still works fine deeper in the query system, timing out which server
> holds the records though.
>
>
> > Additionally for
> > many uses you want to reconnect to the same server rather
> > than the same name.
>
> The SRV record was designed to solve that whole class of problems
> without damaging the operation of the TTL. No one uses it.

You don't need to know the TTL to use SRV.

> It's all really very unfortunate. The recipe for SOHO multihoming, the
> end of routing table bloat and IP roaming without pivoting off a home
> base all boils down to two technologies: (1) a layer 4 protocol that
> can dynamically rebind to the layer 3 IP address the same way IP uses
> ARP to rebind to a changing ethernet MAC and (2) a DNS TTL that
> actually works so that the DNS supports finding a connection's current
> IP address.

DNS TTL works. Applications that don't honour it arn't a indication that
it doesn't work.

> Regards,
> Bill Herrin
>
> --
> William D. Herrin ................ herrin [at] dirtside bill [at] herrin
> 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
> Falls Church, VA 22042-3004
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka [at] isc


bill at herrin

Feb 28, 2012, 1:21 PM

Post #13 of 58 (1274 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka [at] isc> wrote:
> DNS TTL works.  Applications that don't honour it arn't a indication that
> it doesn't work.

Mark,

If three people died and the building burned down then the sprinkler
system didn't work. It may have sprayed water, but it didn't *work*.

Regards,
Bill Herrin


--
William D. Herrin ................ herrin [at] dirtside  bill [at] herrin
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004


marka at isc

Feb 28, 2012, 5:46 PM

Post #14 of 58 (1270 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

In message <CAP-guGXK3WQGPLpmnVsnM0xnnU8==4zONK=UWTLkYWuduA6T9Q [at] mail>,
William Herrin writes:
> On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka [at] isc> wrote:
> > DNS TTL works. =A0Applications that don't honour it arn't a indication th=
> at
> > it doesn't work.
>
> Mark,
>
> If three people died and the building burned down then the sprinkler
> system didn't work. It may have sprayed water, but it didn't *work*.

Not enough evidence to say if it worked or not. Sprinkler systems
are designed to handle particular classes of fire, not every fire.

A 0 TTL means use this information for this transaction. We don't
tear down TCP sessions on DNS TTL going to zero.

If one really want to deprecate addresses we need something a lot
more complicated than A and AAAA records in the DNS. We need stuff
like "use this address for new transactions", "this address is going
away soon, don't use it unless no other works". One also has to use
multiple addresses at the same time.

Mark
--
Mark Andrews, ISC
1 Seymour St., Dundas Valley, NSW 2117, Australia
PHONE: +61 2 9871 4742 INTERNET: marka [at] isc


jgreco at ns

Feb 29, 2012, 4:57 AM

Post #15 of 58 (1260 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

> In message <CAP-guGXK3WQGPLpmnVsnM0xnnU8==4zONK=UWTLkYWuduA6T9Q [at] mail>,
> William Herrin writes:
> > On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka [at] isc> wrote:
> > > DNS TTL works. =A0Applications that don't honour it arn't a indication th=
> > at
> > > it doesn't work.
> >
> > Mark,
> >
> > If three people died and the building burned down then the sprinkler
> > system didn't work. It may have sprayed water, but it didn't *work*.
>
> Not enough evidence to say if it worked or not. Sprinkler systems
> are designed to handle particular classes of fire, not every fire.

It is also worth noting that many fire systems are not intended to
put out the fire, but to provide warning and then provide an extended
window for people to exit the affected building through use of sprinklers
and other measures to slow the spread of the fire. As you suggest, most
sprinkler systems are not actually designed to be able to completely
extinguish fires - but that even applies to fires they are intended to be
able to "handle". This is a common misunderstanding of the technology.

> A 0 TTL means use this information for this transaction. We don't
> tear down TCP sessions on DNS TTL going to zero.
>
> If one really want to deprecate addresses we need something a lot
> more complicated than A and AAAA records in the DNS. We need stuff
> like "use this address for new transactions", "this address is going
> away soon, don't use it unless no other works". One also has to use
> multiple addresses at the same time.

This has always been a weakness of the technology and documentation.
The common usage scenario of static hosts and merely needing to be able
to resolve a hostname to reach the traditional example of a "departmental
server" or something like that is what most code and code examples are
intended to tackle; very little of what developers are actually given (in
real practical terms) even hints at needing to consider aspects such as
TTL or periodically refreshing host->ip mappings, and most of the
documentation I've seen fails to discuss the implications of overloading
things like TTL for deliberate load-balancing or geo purposes. Shocking
it's poorly understood by developers who just want their poor little
program to connect over the Internet.

It's funny how these two technologies are both often misunderstood. I
would not have thought of comparing DNS to fire suppression. :-)

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


bill at herrin

Feb 29, 2012, 6:18 AM

Post #16 of 58 (1263 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Wed, Feb 29, 2012 at 7:57 AM, Joe Greco <jgreco [at] ns> wrote:
>> In message <CAP-guGXK3WQGPLpmnVsnM0xnnU8==4zONK=UWTLkYWuduA6T9Q [at] mail>,
>>  William Herrin writes:
>> > On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka [at] isc> wrote:
>> > > DNS TTL works. =A0Applications that don't honour it arn't a indication th=
>> > at
>> > > it doesn't work.
>> >
>> > Mark,
>> >
>> > If three people died and the building burned down then the sprinkler
>> > system didn't work. It may have sprayed water, but it didn't *work*.
>>
>> Not enough evidence to say if it worked or not.  Sprinkler systems
>> are designed to handle particular classes of fire, not every fire.
>
> It is also worth noting that many fire systems are not intended to
> put out the fire, but to provide warning and then provide an extended
> window for people to exit the affected building through use of sprinklers
> and other measures to slow the spread of the fire.

Hi Joe,

The sprinkler system is designed to delay the fire long enough for
everyone to safely escape. As a secondary objective, it reduces the
fire damage that occurs while waiting for firefighters to arrive and
extinguish the fire. If "three people died" then the system failed.
Perhaps the design was inadequate. Perhaps some age-related issue
prevented the sprinkler heads from melting. Perhaps someone stacked
boxes to the ceiling and it blocked the water. Perhaps the water was
shut off and nobody knew it. Perhaps an initial explosion damaged the
sprinkler system so it could no longer work effectively. Whatever the
exact details, that sprinkler system failed.

Whoever you want to blame, DNS TTL dysfunction at the application
level is the same way. It's a failed system. With the TTL on an A
record set to 60 seconds, you can't change the address attached to the
A record and expect that 60 seconds later no one will continue to
connect to the old address. Nor 600 seconds later nor 6000 seconds
later. The "system" for renumbering a service of which the TTL setting
is a part consistently fails to reliably function in that manner.

Regards,
Bill Herrin



--
William D. Herrin ................ herrin [at] dirtside  bill [at] herrin
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004


owen at delong

Feb 29, 2012, 10:01 AM

Post #17 of 58 (1266 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Feb 29, 2012, at 6:18 AM, William Herrin wrote:

> On Wed, Feb 29, 2012 at 7:57 AM, Joe Greco <jgreco [at] ns> wrote:
>>> In message <CAP-guGXK3WQGPLpmnVsnM0xnnU8==4zONK=UWTLkYWuduA6T9Q [at] mail>,
>>> William Herrin writes:
>>>> On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka [at] isc> wrote:
>>>>> DNS TTL works. =A0Applications that don't honour it arn't a indication th=
>>>> at
>>>>> it doesn't work.
>>>>
>>>> Mark,
>>>>
>>>> If three people died and the building burned down then the sprinkler
>>>> system didn't work. It may have sprayed water, but it didn't *work*.
>>>
>>> Not enough evidence to say if it worked or not. Sprinkler systems
>>> are designed to handle particular classes of fire, not every fire.
>>
>> It is also worth noting that many fire systems are not intended to
>> put out the fire, but to provide warning and then provide an extended
>> window for people to exit the affected building through use of sprinklers
>> and other measures to slow the spread of the fire.
>
> Hi Joe,
>
> The sprinkler system is designed to delay the fire long enough for
> everyone to safely escape. As a secondary objective, it reduces the
> fire damage that occurs while waiting for firefighters to arrive and
> extinguish the fire. If "three people died" then the system failed.
> Perhaps the design was inadequate. Perhaps some age-related issue
> prevented the sprinkler heads from melting. Perhaps someone stacked
> boxes to the ceiling and it blocked the water. Perhaps the water was
> shut off and nobody knew it. Perhaps an initial explosion damaged the
> sprinkler system so it could no longer work effectively. Whatever the
> exact details, that sprinkler system failed.

Bill, you are blaming the sprinkler system for what could, in fact, be not
a failure of the sprinkler system, but, of the 3 humans.

If they were too intoxicated or stoned to react, for example, the sprinkler
system is not to blame. If they were overcome by smoke before the
sprinklers went off, that may be a failure of the smoke detectors, but, it
is not a failure of the sprinklers. If they were killed or rendered unconsious
and/or unresponsive in the preceding explosion you mentioned and did
not die in the subsequent fire, then, that is not a failure in the sprinkler
system.

>
> Whoever you want to blame, DNS TTL dysfunction at the application
> level is the same way. It's a failed system. With the TTL on an A
> record set to 60 seconds, you can't change the address attached to the
> A record and expect that 60 seconds later no one will continue to
> connect to the old address. Nor 600 seconds later nor 6000 seconds
> later. The "system" for renumbering a service of which the TTL setting
> is a part consistently fails to reliably function in that manner.

Yes, the assumption by developers that gni/ghi is a fire-and-forget
mechanism and that the data received is static is a failure. It is not a
failure of DNS TTL. It is a failure of the application developers that
code that way. Further analysis of the underlying causes of that failure
to properly understand name resolution technology and the environment
in which it operates is left as an exercise for the reader.

The fact that people playing interesting games with DNS TTLs don't
necessarily understand or well document the situation to raise awareness
among application developers could also be argued to be a failure
on the part of those people.

It is not, in either case, a failure of the technology.

One should always call gni/gai in close temporal (and ideally close
in the code as well) proximity to calling connect(). Obviously one
should call these resolver functions prior to calling connect().

Most example code is designed for short-lived non-recovering flows,
so, it's designed along the lines of resolve->(iterate through results
calling connect() for each result untill connect() succeeds)->process->
close->exit.

Examples for persistent connections and/or connections that recover
or re-establish after a failure and/or browsers that stay running for a
long time and connect to the same system again significantly later
are few and far between. As a result, most code doing that ends up
being poorly written.

Further, DNS performance issues in the past have led developers of
such applications to "take matters into their own hands" to try and
improve the performance/behavior of their application in spite of
DNS. This is one of the things that led to many of the TTL ignorant
application-level DNS caches which you are complaining about.

Again, not a failure of DNS technology, but, of the operators of that
technology and the developers that tried to compensate for those
failures. They introduced a cure that is often worse than the disease.

Owen

>
> Regards,
> Bill Herrin
>
>
>
> --
> William D. Herrin ................ herrin [at] dirtside bill [at] herrin
> 3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
> Falls Church, VA 22042-3004


lanning at lanning

Feb 29, 2012, 10:38 AM

Post #18 of 58 (1259 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On 02/29/12 10:01, Owen DeLong wrote:
> Further, DNS performance issues in the past have led developers of
> such applications to "take matters into their own hands" to try and
> improve the performance/behavior of their application in spite of
> DNS. This is one of the things that led to many of the TTL ignorant
> application-level DNS caches which you are complaining about.

I have found some carriers to run hacked nameservers. Several years
ago I was moving a website and found that Cox was overriding the TTL
for all "www" names. At least for their residential customers in
Oklahoma. The TTL value our test subject was getting was larger than
it had ever been set.

--
Mr. Flibble
King of the Potato People


jmkeller at houseofzen

Feb 29, 2012, 1:02 PM

Post #19 of 58 (1260 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On 2/29/2012 1:38 PM, Robert Hajime Lanning wrote:
> On 02/29/12 10:01, Owen DeLong wrote:
>> Further, DNS performance issues in the past have led developers of
>> such applications to "take matters into their own hands" to try and
>> improve the performance/behavior of their application in spite of
>> DNS. This is one of the things that led to many of the TTL ignorant
>> application-level DNS caches which you are complaining about.
>
> I have found some carriers to run hacked nameservers. Several years
> ago I was moving a website and found that Cox was overriding the TTL
> for all "www" names. At least for their residential customers in
> Oklahoma. The TTL value our test subject was getting was larger than
> it had ever been set.
>

Back in the day, the uu.net cache servers where set for 24 hours (can't
remember if they claimed it was a performance issue or some other
justification). Several other large ISPs of the day also did this, so
you typically got the "allow 24 hours for full propagation of DNS
changes ..." response when updating external DNS entries. Nominal best
practice is to expect that and to run the service on old and new IPs for
at least 24 hours then start doing redirection (where possible by
protocol) or stop servicing the protocols on the old IP.


I'm sure other providers are doing the same to slow down fast flux
entries being used for spam site hosting today.

--
---
James M Keller


jgreco at ns

Feb 29, 2012, 1:02 PM

Post #20 of 58 (1257 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

> On Wed, Feb 29, 2012 at 7:57 AM, Joe Greco <jgreco [at] ns> wrote:
> >> In message <CAP-guGXK3WQGPLpmnVsnM0xnnU8==4zONK=UWTLkYWuduA6T9Q [at] mail>,
> >>  William Herrin writes:
> >> > On Tue, Feb 28, 2012 at 4:06 PM, Mark Andrews <marka [at] isc> wrote:
> >> > > DNS TTL works. =A0Applications that don't honour it arn't a indication th=
> >> > at
> >> > > it doesn't work.
> >> >
> >> > Mark,
> >> >
> >> > If three people died and the building burned down then the sprinkler
> >> > system didn't work. It may have sprayed water, but it didn't *work*.
> >>
> >> Not enough evidence to say if it worked or not.  Sprinkler systems
> >> are designed to handle particular classes of fire, not every fire.
> >
> > It is also worth noting that many fire systems are not intended to
> > put out the fire, but to provide warning and then provide an extended
> > window for people to exit the affected building through use of sprinklers
> > and other measures to slow the spread of the fire.
>
> Hi Joe,
>
> The sprinkler system is designed to delay the fire long enough for
> everyone to safely escape.

Hi Bill,

No, the sprinkler system is *intended* to delay the fire long enough
for everyone to safely escape, however, in order to accomplish this,
the designer chooses from some reasonable options to meet certain
goals that are commonly accepted to allow that. For example, the
suppression design applied to a multistory dwelling where people
live, cook, and sleep is typically different from the single-story
light office space. Neither design will be effective against all
possible types of fire

> As a secondary objective, it reduces the
> fire damage that occurs while waiting for firefighters to arrive and
> extinguish the fire. If "three people died" then the system failed.

That's silly. The system fails if the system *fails* or doesn't
behave as designed. No system is capable of guaranteeing survival.

Just yesterday, here in Milwaukee, we had a child killed at a
railroad crossing. The crossing was well-marked, with signals
and gates. Visibility of approaching trains for close to a mile
in either direction. The crew on the train saw him crossing,
blew their horn, laid on the emergency brakes. CP Rail inspected
the gates and signals for any possible faults, but eyewitness
accounts were that the gates and signals were working, and the
train made every effort to make itself known.

The 11 year old kid had his hood up and earbuds in, and apparently
didn't see the signals or look up and down the track before crossing,
and for whatever reason, didn't hear the train horn blaring at him.

At a certain point, you just can't protect against every possible
bad thing that can happen. I have a hard time seeing this as a
failure of the railroad's fully functional railroad crossing and
related safety mechanisms. The system doesn't guarantee survival.

> Whoever you want to blame, DNS TTL dysfunction at the application
> level is the same way. It's a failed system. With the TTL on an A
> record set to 60 seconds, you can't change the address attached to the
> A record and expect that 60 seconds later no one will continue to
> connect to the old address. Nor 600 seconds later nor 6000 seconds
> later. The "system" for renumbering a service of which the TTL setting
> is a part consistently fails to reliably function in that manner.

It's a failure because people don't understand the intent of the system,
and it is pretty safe to argue that it is a multifaceted failure, due
to failures by client implementations, server implementations, sample
code, attempts to use the system for things it wasn't meant for, etc.
This is by no means limited to TTL; we've screwed up multiple addresses,
IPv6 handling, negative caching, um, do I need to go on...?

In the specific case of TTL, the problem is made much worse due to the
way most client code has hidden this data from developers, so that many
developers don't even have any idea that such a thing exists.

I'm not sure how to see that a design failure of the TTL mechanism.

I don't see developers ignoring DNS and hardcoding IP addresses into
code as a failure of the DNS system.

I see both as naive implementation errors. The difference with TTL is
that the implementation errors are so widespread as to render any sane
implementation relatively useless.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.


bill at herrin

Feb 29, 2012, 1:20 PM

Post #21 of 58 (1261 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Wed, Feb 29, 2012 at 4:02 PM, Joe Greco <jgreco [at] ns> wrote:
> In the specific case of TTL, the problem is made much worse due to the
> way most client code has hidden this data from developers, so that many
> developers don't even have any idea that such a thing exists.
>
> I'm not sure how to see that a design failure of the TTL mechanism.

Hi Joe,

You shouldn't see that as a design failure of the TTL mechanism. It
isn't. It's a failure of the system of which DNS TTL is a component.
The TTL component itself was reasonably designed.

The failure is likened to installing a well designed sprinkler system
(the DNS with a TTL) and then shutting off the water valve
(gethostbyname/getaddrinfo).


> I don't see developers ignoring DNS and hardcoding IP addresses into
> code as a failure of the DNS system.

It isn't. It's a failure of the sockets API design which calls on
every application developer to (a) translate the name to a set of
addresses with a mechanism that discards the TTL knowledge and (b)
implement his own glue between name to address mapping and connect by
address.

It would be like telling an app developer: here's the ARP function and
the SEND function. When you Send to an IP address, make sure you
attach the right destination MAC. Of course the app developer gets it
wrong most of the time.

Regards,
Bill Herrin



--
William D. Herrin ................ herrin [at] dirtside  bill [at] herrin
3005 Crane Dr. ...................... Web: <http://bill.herrin.us/>
Falls Church, VA 22042-3004


mysidia at gmail

Feb 29, 2012, 10:15 PM

Post #22 of 58 (1250 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Mon, Feb 27, 2012 at 10:57 PM, Matt Addison
<matt.addison [at] lists> wrote:
> gai/gni do not return TTL values on any platforms I'm aware of, the
> only way to get TTL currently is to use a non standard resolver (e.g.
> lwres). The issue is application developers not calling gai every time

GAI/GNI do not return TTL values, but this should not be a problem.
If they were to return anything, it should not be a TTL, but a time()
value, after which
the result may no longer be used.

One way to achieve that would be for GAI to return an opaque structure
that contained the IP and such a value, in a manner consumable by the
sockets API, and adjust connect() to return an error if passed a
structure containing a ' returned time + TTL' in the past.


TTL values are a DNS resolver function; the application consuming the
sockets API
should not be concerned about details of the DNS protocol.

All the application developer should need to know is that you invoke
GAI/GNI and wait for a response.
Once you have that response, it is permissible to use the value immediately,
but you may not store or re-use that value for more than a few seconds.

If you require that value again later, then you invoke GAI/GNI again;
any caching details
are the concern of the resolver library developer who has implemented GAI/GNI.

--
-JH


tim at pelican

Mar 1, 2012, 2:54 AM

Post #23 of 58 (1239 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

> GAI/GNI do not return TTL values, but this should not be a problem.
> If they were to return anything, it should not be a TTL, but a time()
> value, after which the result may no longer be used.
>
> One way to achieve that would be for GAI to return an opaque structure
> that contained the IP and such a value, in a manner consumable by the
> sockets API, and adjust connect() to return an error if passed a
> structure containing a ' returned time + TTL' in the past.

AF_INET_TTL and AFINET6_TTL, with correspondingly expanded struct sockaddr_* ?

Code that explictly requests AF_INET or AF_INET6 would get what it was expecting, code that requests AF_UNSPEC on a system with modified getaddrinfo() would get the expanded structs with the different ai_family set, and could pass them straight into a modified connect().

I'm sure I'm grossly oversimplifying somewhere though...

Regards,
Tim.


owen at delong

Mar 1, 2012, 4:20 AM

Post #24 of 58 (1238 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

On Feb 29, 2012, at 10:15 PM, Jimmy Hess wrote:

> On Mon, Feb 27, 2012 at 10:57 PM, Matt Addison
> <matt.addison [at] lists> wrote:
>> gai/gni do not return TTL values on any platforms I'm aware of, the
>> only way to get TTL currently is to use a non standard resolver (e.g.
>> lwres). The issue is application developers not calling gai every time
>
> GAI/GNI do not return TTL values, but this should not be a problem.
> If they were to return anything, it should not be a TTL, but a time()
> value, after which
> the result may no longer be used.
>
> One way to achieve that would be for GAI to return an opaque structure
> that contained the IP and such a value, in a manner consumable by the
> sockets API, and adjust connect() to return an error if passed a
> structure containing a ' returned time + TTL' in the past.
>
>
> TTL values are a DNS resolver function; the application consuming the
> sockets API
> should not be concerned about details of the DNS protocol.
>
> All the application developer should need to know is that you invoke
> GAI/GNI and wait for a response.
> Once you have that response, it is permissible to use the value immediately,
> but you may not store or re-use that value for more than a few seconds.
>
> If you require that value again later, then you invoke GAI/GNI again;
> any caching details
> are the concern of the resolver library developer who has implemented GAI/GNI.
>
> --
> -JH

The simpler approach and perfectly viable without mucking up what is already implemented and working:

Don't keep returns from GAI/GNI around longer than it takes to cycle through your connect() loop immediately after the GAI/GNI call.

If you write your code to the standard of:

getaddrinfo();
/* do something with the results */
freeaddrinfo();

with a very limited amount of time passing between getaddrinfo() and freeaddrinfo(), then, you don't need TTLs and it doesn't matter.

The system resolver library should do the right thing with DNS TTLs for records retrieved from DNS and a subsequent call to getaddrinfo() within the DNS TTL for the previously retrieved record should be a relatively cheap, fast in-memory operation.

Owen


jgreco at ns

Mar 1, 2012, 5:25 AM

Post #25 of 58 (1233 views)
Permalink
Re: dns and software, was Re: Reliable Cloud host ? [In reply to]

>
> On Wed, Feb 29, 2012 at 4:02 PM, Joe Greco <jgreco [at] ns> wrote:
> > In the specific case of TTL, the problem is made much worse due to the
> > way most client code has hidden this data from developers, so that many
> > developers don't even have any idea that such a thing exists.
> >
> > I'm not sure how to see that a design failure of the TTL mechanism.
>
> Hi Joe,
>
> You shouldn't see that as a design failure of the TTL mechanism. It
> isn't. It's a failure of the system of which DNS TTL is a component.
> The TTL component itself was reasonably designed.

Think that's pretty much what I said.

> The failure is likened to installing a well designed sprinkler system
> (the DNS with a TTL) and then shutting off the water valve
> (gethostbyname/getaddrinfo).

No, the water still works as intended. I think your analogy starts to
fail here. It's more like expecting a water suppression system to put
out a grease fire. The TTL mechanism is completely suitable for what
it was originally meant for, and in an environment where everyone has
followed the rules, it works fine. If you take a light office space
with sprinklers and remodel it into a short order grill, the fire
inspector will require you to rework the fire suppression system to
an appropriate system.

Problem is, TTL is a relatively light-duty system that people have felt
free to ignore, overload for other purposes, etc., but there's no fire
inspector to come around and tell people how and why what they've done
is broken. In the case of TTL, the system is even largely hidden from
users, so that it is rarely thought about except now and then on NANOG,
dns-operations, etc. ;-) No wonder it is even poorly understood.

> > I don't see developers ignoring DNS and hardcoding IP addresses into
> > code as a failure of the DNS system.
>
> It isn't. It's a failure of the sockets API design which calls on
> every application developer to (a) translate the name to a set of
> addresses with a mechanism that discards the TTL knowledge and (b)
> implement his own glue between name to address mapping and connect by
> address.
>
> It would be like telling an app developer: here's the ARP function and
> the SEND function. When you Send to an IP address, make sure you
> attach the right destination MAC. Of course the app developer gets it
> wrong most of the time.

That's correct - and it doesn't imply that the system that was engineered
is faulty. In all likelihood, the fault lies with what the app developer
was told.

You originally said:

"If three people died and the building burned down then the sprinkler
system didn't work. It may have sprayed water, but it didn't *work*."

That's not true. If it sprayed water in the manner it was designed to,
then it worked. If three people took sleeping pills and didn't wake up
when the alarms blared, and an arsonist poured ten gallons of gas
everywhere before lighting the fire, the system still worked. It failed
to save those lives or protect the building from burning down, but I
am aware of no fire suppression systems that realistically attempts to
address that. It is an unreasonable expectation.

I have a hard time seeing the many self-inflicted wounds of people who
have attempted to abuse TTL for various purposes as a failure of the TTL
design. The design is reasonable.

... JG
--
Joe Greco - sol.net Network Services - Milwaukee, WI - http://www.sol.net
"We call it the 'one bite at the apple' rule. Give me one chance [and] then I
won't contact you again." - Direct Marketing Ass'n position on e-mail spam(CNN)
With 24 million small businesses in the US alone, that's way too many apples.

First page Previous page 1 2 3 Next page Last page  View All NANOG users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact Gossamer Threads
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.