Login | Register For Free | Help
Search for: (Advanced)

Mailing List Archive: NANOG: users

latency (was: RE: cooling door)

 

 

NANOG users RSS feed   Index | Next | Previous | View Threaded


swmike at swm

Mar 29, 2008, 5:20 PM

Post #1 of 16 (2196 views)
Permalink
latency (was: RE: cooling door)

On Sat, 29 Mar 2008, Frank Coluccio wrote:

> We often discuss the empowerment afforded by optical technology, but we've barely
> scratched the surface of its ability to effect meaningful architectural changes.

If you talk to the server people, they have an issue with this:

Latency.

I've talked to people who have collapsed layers in their LAN because they
can see performance degradation for each additional switch packets have to
pass in their NFS-mount. Yes, higher speeds means lower serialisation
delay, but there is still a lookup time involved and 10GE is
substantionally more expensive than GE.

--
Mikael Abrahamsson email: swmike[at]swm.pp.se


frank at dticonsulting

Mar 29, 2008, 6:10 PM

Post #2 of 16 (2153 views)
Permalink
Re: latency (was: RE: cooling door) [In reply to]

Please clarify. To which network element are you referring in connection with
extended lookup times? Is it the collapsed optical backbone switch, or the
upstream L3 element, or perhaps both?

Certainly, some applications will demand far less latency than others. Gamers and
some financial (program) traders, for instance, will not tolerate delays caused
by access provisions that are extended over vast WAN, or even large Metro,
distances. But in a local/intramural setting, where optical paths amount to no
more than a klick or so, the impact is almost negligible, even to the class of
users mentioned above. Worst case, run the enterprise over the optical model and
treat those latency-sensitive users as the one-offs that they actually are by
tying them into colos that are closer to their targets. That's what a growing
number of financial firms from around the country have done in NY and CHI colos,
in any case.

As for cost, while individual ports may be significantly more expensive in one
scenario than another, the architectural decision is seldom based on a single
element cost. It's the TCO of all architectural considerations that must be taken
into account. Going back to my original multi-story building example-- better
yet, let's use one of the forty-story structures now being erected at Ground Zero
as a case in point:

When all is said and done it will have created a minimum of two internal data
centers (main/backup/load-sharing) and a minimum of eighty (80) LAN enclosures,
with each room consisting of two L2 access switches (where each of the latter
possesses multiple 10Gbps uplinks, anyway), UPS/HVAC/Raised flooring,
firestopping, sprinklers, and a commitment to consume power for twenty years in
order to keep all this junk purring. I think you see my point.

So even where cost may appear to be the issue when viewing cost comparisons of
discreet elements, in most cases that qualify for this type of design, i.e. where
an organization reaches critical mass beyond so many users, I submit that it
really is not an issue. In fact, a pervasively-lighted environment may actually
cost far less.

Frank A. Coluccio
DTI Consulting Inc.
212-587-8150 Office
347-526-6788 Mobile

On Sat Mar 29 19:20 , Mikael Abrahamsson sent:

>
>On Sat, 29 Mar 2008, Frank Coluccio wrote:
>
>> We often discuss the empowerment afforded by optical technology, but we've barely
>> scratched the surface of its ability to effect meaningful architectural changes.
>
>If you talk to the server people, they have an issue with this:
>
>Latency.
>
>I've talked to people who have collapsed layers in their LAN because they
>can see performance degradation for each additional switch packets have to
>pass in their NFS-mount. Yes, higher speeds means lower serialisation
>delay, but there is still a lookup time involved and 10GE is
>substantionally more expensive than GE.
>
>--
>Mikael Abrahamsson email: swmike[at]swm.pp.se


swmike at swm

Mar 29, 2008, 6:30 PM

Post #3 of 16 (2147 views)
Permalink
Re: latency (was: RE: cooling door) [In reply to]

On Sat, 29 Mar 2008, Frank Coluccio wrote:

> Please clarify. To which network element are you referring in connection with
> extended lookup times? Is it the collapsed optical backbone switch, or the
> upstream L3 element, or perhaps both?

I am talking about the matter that the following topology:

server - 5 meter UTP - switch - 20 meter fiber - switch - 20 meter
fiber - switch - 5 meter UTP - server

has worse NFS performance than:

server - 25 meter UTP - switch - 25 meter UTP - server

Imagine bringing this into metro with 1-2ms delay instead of 0.1-0.5ms.

This is one of the issues that the server/storage people have to deal
with.

--
Mikael Abrahamsson email: swmike[at]swm.pp.se


frank at dticonsulting

Mar 29, 2008, 9:54 PM

Post #4 of 16 (2148 views)
Permalink
Re: latency (was: RE: cooling door) [In reply to]

Understandably, some applications fall into a class that requires very-short
distances for the reasons you cite, although I'm still not comfortable with the
setup you've outlined. Why, for example, are you showing two Ethernet switches
for the fiber option (which would naturally double the switch-induced latency),
but only a single switch for the UTP option?

Now, I'm comfortable in ceding this point. I should have made allowances for this
type of exception in my introductory post, but didn't, as I also omitted mention
of other considerations for the sake of brevity. For what it's worth, propagation
over copper is faster propagation over fiber, as copper has a higher nominal
velocity of propagation (NVP) rating than does fiber, but not significantly
greater to cause the difference you've cited.

As an aside, the manner in which o-e-o and e-o-e conversions take place when
transitioning from electronic to optical states, and back, affects latency
differently across differing link assembly approaches used. In cases where 10Gbps
or greater is being sent across a "multi-mode" fiber link in a data center or
other in-building venue, for instance, "parallel optics" are most ofen used,
i.e., multiple optical channels (either fibers or wavelengths) that undergo
multiplexing and de-multiplexing (collectively: inverse multiplexing or channel
bonding) -- as opposed to a single fiber (or a single wavelength) operating at
the link's rated wire speed.

By chance, is the "deserialization" you cited earlier, perhaps related to this
inverse muxing process? If so, then that would explain the disconnect, and if it
is so, then one shouldn't despair, because there is a direct path to avoiding this.

In parallel optics, e-o processing and o-e processing is intensive at both ends
of the 10G link, respectively. These have the effect of adding more latency than
a single-channel approach would. Yet, most of the TIA activity taking place today
that is geared to increasing data rates over in-building fiber links continues to
favor multi-mode and the use of parallel optics, as opposed to specifying
single-mode supporting a single channel. But singlemode solutions are also
available to those who dare to be different.

I'll look more closely at these issues and your original exception during the
coming week, since they represent an important aspect in assessing the overall
model. Thanks.

Frank A. Coluccio
DTI Consulting Inc.
212-587-8150 Office
347-526-6788 Mobile

On Sat Mar 29 20:30 , Mikael Abrahamsson sent:

>
>On Sat, 29 Mar 2008, Frank Coluccio wrote:
>
>> Please clarify. To which network element are you referring in connection with
>> extended lookup times? Is it the collapsed optical backbone switch, or the
>> upstream L3 element, or perhaps both?
>
>I am talking about the matter that the following topology:
>
>server - 5 meter UTP - switch - 20 meter fiber - switch - 20 meter
>fiber - switch - 5 meter UTP - server
>
>has worse NFS performance than:
>
>server - 25 meter UTP - switch - 25 meter UTP - server
>
>Imagine bringing this into metro with 1-2ms delay instead of 0.1-0.5ms.
>
>This is one of the issues that the server/storage people have to deal
>with.
>
>--
>Mikael Abrahamsson email: swmike[at]swm.pp.se


adrian at creative

Mar 29, 2008, 10:03 PM

Post #5 of 16 (2152 views)
Permalink
Re: latency (was: RE: cooling door) [In reply to]

On Sun, Mar 30, 2008, Mikael Abrahamsson wrote:
>
> On Sat, 29 Mar 2008, Frank Coluccio wrote:
>
> >Please clarify. To which network element are you referring in connection
> >with
> >extended lookup times? Is it the collapsed optical backbone switch, or the
> >upstream L3 element, or perhaps both?
>
> I am talking about the matter that the following topology:
>
> server - 5 meter UTP - switch - 20 meter fiber - switch - 20 meter
> fiber - switch - 5 meter UTP - server
>
> has worse NFS performance than:
>
> server - 25 meter UTP - switch - 25 meter UTP - server
>
> Imagine bringing this into metro with 1-2ms delay instead of 0.1-0.5ms.
>
> This is one of the issues that the server/storage people have to deal
> with.

Thats because the LAN protocols need to be re-jiggled a little to start
looking less like LAN protocols and more like WAN protocols. Similar
things need to happen for applications.

I helped a friend debug an NFS throughput issue between some Linux servers
running Fortran-77 based numerical analysis code and a 10GE storage backend.
The storage backend can push 10GE without too much trouble but the application
wasn't poking the kernel in the right way (large fetches and prefetching, basically)
to fully utilise the infrastructure.

Oh, and kernel hz tickers can have similar effects on network traffic, if the
application does dumb stuff. If you're (un)lucky then you may see 1 or 2ms
of delay between packet input and scheduling processing. This doesn't matter
so much over 250ms + latent links but matters on 0.1ms - 1ms latent links.

(Can someone please apply some science to this and publish best practices please?)



adrian


swmike at swm

Mar 30, 2008, 1:17 AM

Post #6 of 16 (2146 views)
Permalink
Re: latency (was: RE: cooling door) [In reply to]

On Sat, 29 Mar 2008, Frank Coluccio wrote:

> Understandably, some applications fall into a class that requires very-short
> distances for the reasons you cite, although I'm still not comfortable with the
> setup you've outlined. Why, for example, are you showing two Ethernet switches
> for the fiber option (which would naturally double the switch-induced latency),
> but only a single switch for the UTP option?

Yes, I am showing a case where you have switches in each rack so each rack
is uplinked with a fiber to a central aggregation switch, as opposed to
having a lot of UTP from the rack directly into the aggregation switch.

> Now, I'm comfortable in ceding this point. I should have made allowances for this
> type of exception in my introductory post, but didn't, as I also omitted mention
> of other considerations for the sake of brevity. For what it's worth, propagation
> over copper is faster propagation over fiber, as copper has a higher nominal
> velocity of propagation (NVP) rating than does fiber, but not significantly
> greater to cause the difference you've cited.

The 2/3 speed of light in fiber as opposed to propagation speed in copper
was not in my mind.

> As an aside, the manner in which o-e-o and e-o-e conversions take place when
> transitioning from electronic to optical states, and back, affects latency
> differently across differing link assembly approaches used. In cases where 10Gbps

My opinion is that the major factors of added end-to-end latency in my
example is that the packet has to be serialisted three times as opposed to
once and there are three lookups instead of one. Lookups take time,
putting the packet on the wire take time.

Back in the 10 megabit/s days, there were switches that did cut-through,
ie if the output port was not being used the instant the packet came in,
it could start to send out the packet on the outgoing port before it was
completely taken in on the incoming port (when the header was received,
the forwarding decision was taken and the equipment would start to send
the packet out before it was completely received from the input port).

> By chance, is the "deserialization" you cited earlier, perhaps related to this
> inverse muxing process? If so, then that would explain the disconnect, and if it
> is so, then one shouldn't despair, because there is a direct path to avoiding this.

No, it's the store-and-forward architecture used in all modern equipment
(that I know of). A packet has to be completely taken in over the wire
into a buffer, a lookup has to be done as to where this packet should be
put out, it needs to be sent over a bus or fabric, and then it has to be
clocked out on the outgoing port from another buffer. This adds latency in
each switch hop on the way.

As Adrian Chadd mentioned in the email sent after yours, this can of
course be handled by modifying or creating new protocols that handle this
fact. It's just that with what is available today, this is a problem. Each
directory listing or file access takes a bit longer over NFS with added
latency, and this reduces performance in current protocols.

Programmers who do client/server applications are starting to notice this
and I know of companies that put latency-inducing applications in the
development servers so that the programmer is exposed to the same
conditions in the development environment as in the real world. This means
for some that they have to write more advanced SQL queries to get
everything done in a single query instead of asking multiple and changing
the queries depending on what the first query result was.

Also, protocols such as SMB and NFS that use message blocks over TCP have
to be abandonded and replaced with real streaming protocols and large
window sizes. Xmodem wasn't a good idea back then, it's not a good idea
now (even though the blocks now are larger than the 128 bytes of 20-30
years ago).

--
Mikael Abrahamsson email: swmike[at]swm.pp.se


vixie at isc

Mar 30, 2008, 7:34 AM

Post #7 of 16 (2147 views)
Permalink
Re: latency (was: RE: cooling door) [In reply to]

swmike[at]swm.pp.se (Mikael Abrahamsson) writes:

> ...
> Back in the 10 megabit/s days, there were switches that did cut-through,
> ie if the output port was not being used the instant the packet came in,
> it could start to send out the packet on the outgoing port before it was
> completely taken in on the incoming port (when the header was received,
> the forwarding decision was taken and the equipment would start to send
> the packet out before it was completely received from the input port).

had packet sizes scaled with LAN transmission speed, i would agree. but
the serialization time for 1500 bytes at 10MBit was ~1.2ms, and went down
by a factor of 10 for FastE (~120us), another factor of 10 for GigE (~12us)
and another factor of 10 for 10GE (~1.2us). even those of us using jumbo
grams are getting less serialization delay at 10GE (~7us) than we used to
get on a DEC LANbridge 100 which did cutthrough after the header (~28us).

> ..., it's the store-and-forward architecture used in all modern equipment
> (that I know of). A packet has to be completely taken in over the wire
> into a buffer, a lookup has to be done as to where this packet should be
> put out, it needs to be sent over a bus or fabric, and then it has to be
> clocked out on the outgoing port from another buffer. This adds latency in
> each switch hop on the way.

you may be right about the TCAM lookup times having an impact, i don't know
if they've kept pace with transmission speed either. but someone's theory
here yesterday that software (kernel and IP stack) architecture is more
likely to be at fault, there are still plenty of "queue it here, it'll go
out next time the device or timer interrupt handler fires" and this can be
in the ~1ms or even ~10ms range. this doesn't show up on file transfer
benchmarks since packet trains usually do well, but miss an ACK, or send
a ping, and you'll see a shelf.

> As Adrian Chadd mentioned in the email sent after yours, this can of
> course be handled by modifying or creating new protocols that handle this
> fact. It's just that with what is available today, this is a problem. Each
> directory listing or file access takes a bit longer over NFS with added
> latency, and this reduces performance in current protocols.

here again it's not just the protocols, it's the application design, that
has to be modernized. i've written plenty of code that tries to cut down
the number of bytes of RAM that get copied or searched, which ends up not
going faster on modern CPUs (or sometimes going slower) because of the
minimum transfer size between L2 and DRAM. similarly, a program that sped
up on a VAX 780 when i taught it to match the size domain of its disk I/O
to the 512-byte size of a disk sector, either fails to go faster on modern
high-bandwidth I/O and log structured file systems, or actually goes slower.

in other words you don't need NFS/SMB, or E-O-E, or the WAN, to erode what
used to be performance gains through efficiency. there's plenty enough new
latency (expressed as a factor of clock speed) in the path to DRAM, the
path to SATA, and the path through ZFS, to make it necessary that any
application that wants modern performance has to be re-oriented to take
modern (which in this case means, streaming) approach. correspondingly,
applications which take this approach, don't suffer as much when they move
from SATA to NFS or iSCSI.

> Programmers who do client/server applications are starting to notice this
> and I know of companies that put latency-inducing applications in the
> development servers so that the programmer is exposed to the same
> conditions in the development environment as in the real world. This
> means for some that they have to write more advanced SQL queries to get
> everything done in a single query instead of asking multiple and changing
> the queries depending on what the first query result was.

while i agree that turning one's SQL into transactions that are more like
applets (such that, for example, you're sending over the content for a
potential INSERT that may not happen depending on some SELECT, because the
end-to-end delay of getting back the SELECT result is so much higher than
the cost of the lost bandwidth from occasionally sending a useless INSERT)
will take better advantage of modern hardware and software architecture
(which means in this case, streaming), it's also necessary to teach our
SQL servers that ZFS "recordsize=128k" means what it says, for file system
reads and writes. a lot of SQL users who have moved to a streaming model
using a lot of transactions have merely seen their bottleneck move from the
network into the SQL server.

> Also, protocols such as SMB and NFS that use message blocks over TCP have
> to be abandonded and replaced with real streaming protocols and large
> window sizes. Xmodem wasn't a good idea back then, it's not a good idea
> now (even though the blocks now are larger than the 128 bytes of 20-30
> years ago).

i think xmodem and kermit moved enough total data volume (expressed as a
factor of transmission speed) back in their day to deserve an honourable
retirement. but i'd agree, if an application is moved to a new environment
where everything (DRAM timing, CPU clock, I/O bandwidth, network bandwidth,
etc) is 10X faster, but the application only runs 2X faster, then it's time
to rethink more. but the culprit will usually not be new network latency.
--
Paul Vixie


freimer at ctiusa

Mar 30, 2008, 9:11 AM

Post #8 of 16 (2139 views)
Permalink
RE: latency (was: RE: cooling door) [In reply to]

> -----Original Message-----
> From: owner-nanog[at]merit.edu [mailto:owner-nanog[at]merit.edu] On Behalf Of
> Paul Vixie
> Sent: Sunday, March 30, 2008 10:35 AM
> To: nanog[at]merit.edu
> Subject: Re: latency (was: RE: cooling door)
>
>
> swmike[at]swm.pp.se (Mikael Abrahamsson) writes:
>
> > Programmers who do client/server applications are starting to notice
> this
> > and I know of companies that put latency-inducing applications in the
> > development servers so that the programmer is exposed to the same
> > conditions in the development environment as in the real world. This
> > means for some that they have to write more advanced SQL queries to
> get
> > everything done in a single query instead of asking multiple and
> changing
> > the queries depending on what the first query result was.
>
> while i agree that turning one's SQL into transactions that are more
> like
> applets (such that, for example, you're sending over the content for a
> potential INSERT that may not happen depending on some SELECT, because
> the
> end-to-end delay of getting back the SELECT result is so much higher
> than
> the cost of the lost bandwidth from occasionally sending a useless
> INSERT)
> will take better advantage of modern hardware and software architecture
> (which means in this case, streaming), it's also necessary to teach our
> SQL servers that ZFS "recordsize=128k" means what it says, for file
> system
> reads and writes. a lot of SQL users who have moved to a streaming
> model
> using a lot of transactions have merely seen their bottleneck move from
> the
> network into the SQL server.

I have seen first hand (worked for a company and diagnosed issues with their
applications from a network perspective, prompting a major re-write of the
software), where developers work with their SQL servers, application
servers, and clients all on the same L2 switch. They often do not duplicate
the environment they are going to be deploying the application into, and
therefore assume that the "network" is going to perform the same. So, when
there are problems they blame the network. Often the root problem is the
architecture of the application itself and not the "network." All the
servers and client workstations have Gigabit connections to the same L2
switch, and they are honestly astonished when there are issues running the
same application over a typical enterprise network with clients of different
speeds (10/100/1000, full and/or half duplex). Surprisingly, to me, they
even expect the same performance out of a WAN.

Application developers today need a "network" guy on their team. One who
can help them understand how their proposed application architecture would
perform over various customer networks, and that can make suggestions as to
how the architecture can be modified to allow the performance of the
application to take advantage of the networks' capabilities. Mikael (seems
to) complain that developers have to put latency inducing applications into
the development environment. I'd say that those developers are some of the
few who actually have a clue, and are doing the right thing.

> > Also, protocols such as SMB and NFS that use message blocks over TCP
> have
> > to be abandonded and replaced with real streaming protocols and large
> > window sizes. Xmodem wasn't a good idea back then, it's not a good
> idea
> > now (even though the blocks now are larger than the 128 bytes of 20-
> 30
> > years ago).
>
> i think xmodem and kermit moved enough total data volume (expressed as
> a
> factor of transmission speed) back in their day to deserve an
> honourable
> retirement. but i'd agree, if an application is moved to a new
> environment
> where everything (DRAM timing, CPU clock, I/O bandwidth, network
> bandwidth,
> etc) is 10X faster, but the application only runs 2X faster, then it's
> time
> to rethink more. but the culprit will usually not be new network
> latency.
> --
> Paul Vixie

It may be difficult to switch to a streaming protocol if the underlying data
sets are block-oriented.

Fred Reimer, CISSP, CCNP, CQS-VPN, CQS-ISS
Senior Network Engineer
Coleman Technologies, Inc.
954-298-1697
Attachments: smime.p7s (3.01 KB)


swmike at swm

Mar 30, 2008, 9:29 AM

Post #9 of 16 (2152 views)
Permalink
RE: latency (was: RE: cooling door) [In reply to]

On Sun, 30 Mar 2008, Fred Reimer wrote:

> application to take advantage of the networks' capabilities. Mikael (seems
> to) complain that developers have to put latency inducing applications into
> the development environment. I'd say that those developers are some of the
> few who actually have a clue, and are doing the right thing.

I was definately not complaining, I brought it up as an example where
developers have clue and where they're doing the right thing.

I've too often been involved in customer complaints which ended up being
the fault of Microsoft SMB and the customers having the firm idea that it
must be a network problem since MS is a world standard and that can't be
changed. Even proposing to change TCP Window settings to get FTP transfers
quicker is met with the same sceptisism.

Even after describing to them about the propagation delay of light in
fiber and the physical limitations, they're still very suspicious about it
all.

--
Mikael Abrahamsson email: swmike[at]swm.pp.se


smb at cs

Mar 30, 2008, 10:08 AM

Post #10 of 16 (2147 views)
Permalink
Re: latency (was: RE: cooling door) [In reply to]

On Sun, 30 Mar 2008 13:03:18 +0800
Adrian Chadd <adrian[at]creative.net.au> wrote:

> Oh, and kernel hz tickers can have similar effects on network
> traffic, if the application does dumb stuff. If you're (un)lucky then
> you may see 1 or 2ms of delay between packet input and scheduling
> processing. This doesn't matter so much over 250ms + latent links but
> matters on 0.1ms - 1ms latent links.
>
> (Can someone please apply some science to this and publish best
> practices please?)
>
There's been a lot of work done on TCP throughput. Roughly speaking,
and holding everything else constant, throughput is linear in the round
trip time. That is, if you double the RTT -- even from .1 ms to .2 ms
-- you halve the throughput on (large) file transfers. See
http://www.slac.stanford.edu/comp/net/wan-mon/thru-vs-loss.html for one
summary; feed "tcp throughput equation" into your favorite search
engine for a lot more references. Another good reference is RFC 3448,
which relates throughput to packet size (also a linear factor, but if
serialization delay is significant then increasing the packet size will
increase the RTT), packet loss rate, the TCP retransmission timeout
(which can be approximated as 4x the RTT), and the number of packets
acknowledged by a single TCP acknowledgement.

On top of that, there are lots of application issues, as a number of
people have pointed out.

--Steve Bellovin, http://www.cs.columbia.edu/~smb


freimer at ctiusa

Mar 30, 2008, 10:18 AM

Post #11 of 16 (2147 views)
Permalink
RE: latency (was: RE: cooling door) [In reply to]

Thanks for the clarification; that's why I put the "seems to" in the reply.

Fred Reimer, CISSP, CCNP, CQS-VPN, CQS-ISS
Senior Network Engineer
Coleman Technologies, Inc.
954-298-1697


> -----Original Message-----
> From: owner-nanog[at]merit.edu [mailto:owner-nanog[at]merit.edu] On Behalf Of
> Mikael Abrahamsson
> Sent: Sunday, March 30, 2008 12:30 PM
> To: nanog[at]merit.edu
> Subject: RE: latency (was: RE: cooling door)
>
>
> On Sun, 30 Mar 2008, Fred Reimer wrote:
>
> > application to take advantage of the networks' capabilities. Mikael
> (seems
> > to) complain that developers have to put latency inducing
> applications into
> > the development environment. I'd say that those developers are some
> of the
> > few who actually have a clue, and are doing the right thing.
>
> I was definately not complaining, I brought it up as an example where
> developers have clue and where they're doing the right thing.
>
> I've too often been involved in customer complaints which ended up
> being
> the fault of Microsoft SMB and the customers having the firm idea that
> it
> must be a network problem since MS is a world standard and that can't
> be
> changed. Even proposing to change TCP Window settings to get FTP
> transfers
> quicker is met with the same sceptisism.
>
> Even after describing to them about the propagation delay of light in
> fiber and the physical limitations, they're still very suspicious about
> it
> all.
>
> --
> Mikael Abrahamsson email: swmike[at]swm.pp.se
Attachments: smime.p7s (3.01 KB)


gtb at slac

Mar 30, 2008, 12:15 PM

Post #12 of 16 (2150 views)
Permalink
RE: latency (was: RE: cooling door) [In reply to]

> ... feed "tcp throughput equation" into your favorite search
> engine for a lot more references.

There has been a lot of work in some OS stacks
(Vista and recent linux kernels) to enable TCP
auto-tuning (of one form or another), which is
attempting to hide some of the worst of the TCP
uglynesses from the application/end-users. I
am not convinced this is always a good thing,
since having the cruft exposed to the developers
(in particular) means one needs to plan for
errors and less than ideal cases.

Gary


vixie at isc

Mar 30, 2008, 2:00 PM

Post #13 of 16 (2143 views)
Permalink
Re: latency (was: RE: cooling door) [In reply to]

gtb[at]slac.stanford.edu ("Buhrmaster, Gary") writes:

> > ... feed "tcp throughput equation" into your favorite search
> > engine for a lot more references.=20
>
> There has been a lot of work in some OS stacks
> (Vista and recent linux kernels) to enable TCP
> auto-tuning (of one form or another), ...

on <http://www.onlamp.com/pub/a/bsd/2008/02/26/whats-new-in-freebsd-70.html>
i'd read that freebsd 7 also has some tcp auto tuning logic.
--
Paul Vixie


smb at cs

Mar 30, 2008, 2:25 PM

Post #14 of 16 (2139 views)
Permalink
Re: latency (was: RE: cooling door) [In reply to]

On 30 Mar 2008 21:00:25 +0000
Paul Vixie <vixie[at]isc.org> wrote:

>
> gtb[at]slac.stanford.edu ("Buhrmaster, Gary") writes:
>
> > > ... feed "tcp throughput equation" into your favorite search
> > > engine for a lot more references.=20
> >
> > There has been a lot of work in some OS stacks
> > (Vista and recent linux kernels) to enable TCP
> > auto-tuning (of one form or another), ...
>
> on
> <http://www.onlamp.com/pub/a/bsd/2008/02/26/whats-new-in-freebsd-70.html>
> i'd read that freebsd 7 also has some tcp auto tuning logic.

There are certain things that the stack can do, like auto-adjusting the
window size, tuning retransmission intervals, etc. But other problem
are at the application layer, as you noted a few posts ago.

--Steve Bellovin, http://www.cs.columbia.edu/~smb


frank at dticonsulting

Mar 30, 2008, 2:47 PM

Post #15 of 16 (2139 views)
Permalink
Re: latency (was: RE: cooling door) [In reply to]

Mikael, I see your points more clearly now in respect to the number of turns
affecting latency. In analyzing this further, however, it becomes apparent that
the collapsed backbone regimen may, in many scenarios offer far fewer
opportunities for turns, and more occasions for others.

To the former class of winning applications, because it eliminates local
access/distribution/aggregation switches and then an entire lineage of
hierarchical in-building routing elements.

To the latter class of loser applications, no doubt, if a collapsed backbone
design were to be dropped-shipped in place on a Friday Evening, as is, the there
would surely be some losers that would require re-designing, or maybe simply some
re-tuning, or they may need to be treated as one-offs entirely.

BTW, in case there is any confusion concerning my earlier allusion to "SMB", it
had nothing to do with the size of message blocks, protocols, or anything else
affecting a transaction profile's latency numbers. Instead, I was referring to
the "_s_mall-to-_m_edium-sized _b_usiness" class of customers that the cable
operator Bright House Networks was targeting with its passive optical network
business-grade offering, fwiw.
--

Mikael, All, I truly appreciate the comments and criticisms you've offered on
this subject up until now in connection with the upstream hypothesis that began
with a post by Michael Dillon. However, I shall not impose this topic on the
larger audience any further. I would, however, welcome a continuation _offlist _
with anyone so inclined. If anything worthwhile results I'd be pleased to post it
here at a later date. TIA.

Frank A. Coluccio
DTI Consulting Inc.
212-587-8150 Office
347-526-6788 Mobile

On Sun Mar 30 3:17 , Mikael Abrahamsson sent:

>On Sat, 29 Mar 2008, Frank Coluccio wrote:
>
>> Understandably, some applications fall into a class that requires very-short
>> distances for the reasons you cite, although I'm still not comfortable with the
>> setup you've outlined. Why, for example, are you showing two Ethernet switches
>> for the fiber option (which would naturally double the switch-induced latency),
>> but only a single switch for the UTP option?
>
>Yes, I am showing a case where you have switches in each rack so each rack
>is uplinked with a fiber to a central aggregation switch, as opposed to
>having a lot of UTP from the rack directly into the aggregation switch.
>
>> Now, I'm comfortable in ceding this point. I should have made allowances for this
>> type of exception in my introductory post, but didn't, as I also omitted mention
>> of other considerations for the sake of brevity. For what it's worth, propagation
>> over copper is faster propagation over fiber, as copper has a higher nominal
>> velocity of propagation (NVP) rating than does fiber, but not significantly
>> greater to cause the difference you've cited.
>
>The 2/3 speed of light in fiber as opposed to propagation speed in copper
>was not in my mind.
>
>> As an aside, the manner in which o-e-o and e-o-e conversions take place when
>> transitioning from electronic to optical states, and back, affects latency
>> differently across differing link assembly approaches used. In cases where 10Gbps
>
>My opinion is that the major factors of added end-to-end latency in my
>example is that the packet has to be serialisted three times as opposed to
>once and there are three lookups instead of one. Lookups take time,
>putting the packet on the wire take time.
>
>Back in the 10 megabit/s days, there were switches that did cut-through,
>ie if the output port was not being used the instant the packet came in,
>it could start to send out the packet on the outgoing port before it was
>completely taken in on the incoming port (when the header was received,
>the forwarding decision was taken and the equipment would start to send
>the packet out before it was completely received from the input port).
>
>> By chance, is the "deserialization" you cited earlier, perhaps related to this
>> inverse muxing process? If so, then that would explain the disconnect, and if it
>> is so, then one shouldn't despair, because there is a direct path to avoiding
this.
>
>No, it's the store-and-forward architecture used in all modern equipment
>(that I know of). A packet has to be completely taken in over the wire
>into a buffer, a lookup has to be done as to where this packet should be
>put out, it needs to be sent over a bus or fabric, and then it has to be
>clocked out on the outgoing port from another buffer. This adds latency in
>each switch hop on the way.
>
>As Adrian Chadd mentioned in the email sent after yours, this can of
>course be handled by modifying or creating new protocols that handle this
>fact. It's just that with what is available today, this is a problem. Each
>directory listing or file access takes a bit longer over NFS with added
>latency, and this reduces performance in current protocols.
>
>Programmers who do client/server applications are starting to notice this
>and I know of companies that put latency-inducing applications in the
>development servers so that the programmer is exposed to the same
>conditions in the development environment as in the real world. This means
>for some that they have to write more advanced SQL queries to get
>everything done in a single query instead of asking multiple and changing
>the queries depending on what the first query result was.
>
>Also, protocols such as SMB and NFS that use message blocks over TCP have
>to be abandonded and replaced with real streaming protocols and large
>window sizes. Xmodem wasn't a good idea back then, it's not a good idea
>now (even though the blocks now are larger than the 128 bytes of 20-30
>years ago).
>
>--
>Mikael Abrahamsson email: swmike[at]swm.pp.se


frank at dticonsulting

Mar 30, 2008, 2:51 PM

Post #16 of 16 (2140 views)
Permalink
Re: latency (was: RE: cooling door) [In reply to]

Silly me. I didn't mean "turns" alone, but also intended to include the number of
state "transitions" (e-o, o-e, e-e, etc.) in my preceding reply, as well.

Frank A. Coluccio
DTI Consulting Inc.
212-587-8150 Office
347-526-6788 Mobile

On Sun Mar 30 16:47 , Frank Coluccio sent:

>Mikael, I see your points more clearly now in respect to the number of turns
>affecting latency. In analyzing this further, however, it becomes apparent that
>the collapsed backbone regimen may, in many scenarios offer far fewer
>opportunities for turns, and more occasions for others.
>
>To the former class of winning applications, because it eliminates local
>access/distribution/aggregation switches and then an entire lineage of
>hierarchical in-building routing elements.
>
>To the latter class of loser applications, no doubt, if a collapsed backbone
>design were to be dropped-shipped in place on a Friday Evening, as is, the there
>would surely be some losers that would require re-designing, or maybe simply some
>re-tuning, or they may need to be treated as one-offs entirely.
>
>BTW, in case there is any confusion concerning my earlier allusion to "SMB", it
>had nothing to do with the size of message blocks, protocols, or anything else
>affecting a transaction profile's latency numbers. Instead, I was referring to
>the "_s_mall-to-_m_edium-sized _b_usiness" class of customers that the cable
>operator Bright House Networks was targeting with its passive optical network
>business-grade offering, fwiw.
>--
>
>Mikael, All, I truly appreciate the comments and criticisms you've offered on
>this subject up until now in connection with the upstream hypothesis that began
>with a post by Michael Dillon. However, I shall not impose this topic on the
>larger audience any further. I would, however, welcome a continuation _offlist _
>with anyone so inclined. If anything worthwhile results I'd be pleased to post it
>here at a later date. TIA.
>
>Frank A. Coluccio
>DTI Consulting Inc.
>212-587-8150 Office
>347-526-6788 Mobile
>
>On Sun Mar 30 3:17 , Mikael Abrahamsson sent:
>
>>On Sat, 29 Mar 2008, Frank Coluccio wrote:
>>
>>> Understandably, some applications fall into a class that requires very-short
>>> distances for the reasons you cite, although I'm still not comfortable with the
>>> setup you've outlined. Why, for example, are you showing two Ethernet switches
>>> for the fiber option (which would naturally double the switch-induced latency),
>>> but only a single switch for the UTP option?
>>
>>Yes, I am showing a case where you have switches in each rack so each rack
>>is uplinked with a fiber to a central aggregation switch, as opposed to
>>having a lot of UTP from the rack directly into the aggregation switch.
>>
>>> Now, I'm comfortable in ceding this point. I should have made allowances for this
>>> type of exception in my introductory post, but didn't, as I also omitted mention
>>> of other considerations for the sake of brevity. For what it's worth, propagation
>>> over copper is faster propagation over fiber, as copper has a higher nominal
>>> velocity of propagation (NVP) rating than does fiber, but not significantly
>>> greater to cause the difference you've cited.
>>
>>The 2/3 speed of light in fiber as opposed to propagation speed in copper
>>was not in my mind.
>>
>>> As an aside, the manner in which o-e-o and e-o-e conversions take place when
>>> transitioning from electronic to optical states, and back, affects latency
>>> differently across differing link assembly approaches used. In cases where 10Gbps
>>
>>My opinion is that the major factors of added end-to-end latency in my
>>example is that the packet has to be serialisted three times as opposed to
>>once and there are three lookups instead of one. Lookups take time,
>>putting the packet on the wire take time.
>>
>>Back in the 10 megabit/s days, there were switches that did cut-through,
>>ie if the output port was not being used the instant the packet came in,
>>it could start to send out the packet on the outgoing port before it was
>>completely taken in on the incoming port (when the header was received,
>>the forwarding decision was taken and the equipment would start to send
>>the packet out before it was completely received from the input port).
>>
>>> By chance, is the "deserialization" you cited earlier, perhaps related to this
>>> inverse muxing process? If so, then that would explain the disconnect, and if it
>>> is so, then one shouldn't despair, because there is a direct path to avoiding
>this.
>>
>>No, it's the store-and-forward architecture used in all modern equipment
>>(that I know of). A packet has to be completely taken in over the wire
>>into a buffer, a lookup has to be done as to where this packet should be
>>put out, it needs to be sent over a bus or fabric, and then it has to be
>>clocked out on the outgoing port from another buffer. This adds latency in
>>each switch hop on the way.
>>
>>As Adrian Chadd mentioned in the email sent after yours, this can of
>>course be handled by modifying or creating new protocols that handle this
>>fact. It's just that with what is available today, this is a problem. Each
>>directory listing or file access takes a bit longer over NFS with added
>>latency, and this reduces performance in current protocols.
>>
>>Programmers who do client/server applications are starting to notice this
>>and I know of companies that put latency-inducing applications in the
>>development servers so that the programmer is exposed to the same
>>conditions in the development environment as in the real world. This means
>>for some that they have to write more advanced SQL queries to get
>>everything done in a single query instead of asking multiple and changing
>>the queries depending on what the first query result was.
>>
>>Also, protocols such as SMB and NFS that use message blocks over TCP have
>>to be abandonded and replaced with real streaming protocols and large
>>window sizes. Xmodem wasn't a good idea back then, it's not a good idea
>>now (even though the blocks now are larger than the 128 bytes of 20-30
>>years ago).
>>
>>--
>>Mikael Abrahamsson email: swmike[at]swm.pp.se
>
>

NANOG users RSS feed   Index | Next | Previous | View Threaded
 
 


Interested in having your list archived? Contact lists@gossamer-threads.com
 
  Web Applications & Managed Hosting Powered by Gossamer Threads Inc.