TCP/IP Performance Factors

Author
Peter Welcher
Architect, Operations Technical Advisor

(Updated 9/30/09)
For a while I’ve been loosely tracking TCP/IP performance information. I’d like to share some information about this area. I claim some insight but freely admit I’m still learning.

Motivation: It seems like over the last two years, a number of application WAN (MPLS VPN, etc.) performance issues have been coming our (my) way. If you’re having problems, this blog will provide some pointers that I think will help.

In particular, note that adding bandwidth to a WAN link may NOT make backup or file transfer go faster!

There are three things I’d like to talk about here:

  • TCP/IP enhancements
  • Round trip time and ping-pong behavior
  • TCP/IP with packet loss

Brief Historical perspective

If you Google, you’ll find a number of links on improving file transfer or TCP performance. Apply with large grain of salt, as with any web info, some authors don’t have a clue and are putting out incorrect information.

Having said that, over the last many years, the various supercomputer and super-collider sites have been great sources of advice on topics like tuning OS settings, especially Windows registry settings, to get better performance. Historically, I believe Windows carried along some behavior settings that were probably better for dial-up that for high-speed LAN.

The reason they did this: If you’re a Pittsburgh SuperComputing, CERN, or SLAC researcher trying to transfer a 10 GB file quickly, some tuning used to really help!

http://www.psc.edu/networking/projects/tcptune/

http://en.wikipedia.org/wiki/TCP_tuning

http://icfa-scic.web.cern.ch/ICFA-SCIC/docs/20020712-TcpBW-SR.ppt

TCP/IP enhancements

There are a number of enhancements for TCP/IP: selective ACK, revised congestion response for situations like high speed lossy links (e.g. some satellite or wireless links), and the like. For example, if you’re trying to blast along a fast lossy link, you don’t want a delayed response or dropped packet to slow you down per standard TCP behavior, because that causes more delay and throughput hit than SACK and “selective makeup”. In other words, you make TCP a little less sensitive to hints, so it takes a couple of lost packets before it really throttles back the way it does on a LAN.

This sort of thing may really help if  your computer stack supports them. Microsoft Vista added a number of clever TCP/IP functions, and from what I’ve read, Microsoft is putting them into upcoming server code as well. These help.

If your OS stack doesn’t support them, well, then the technical complexity goes up. Up for changing out your TCP stack? I didn’t think so…

Round trip time and ping-pong behavior

This was (and sometimes still is) one of our hiring interview questions. I’ve written about it in my articles.

TCP (hence FTP, SFTP, etc.) is geared towards streaming data. However, database operations running over a TCP connection, Microsoft file shares, Sun NFS, and applications may operate in ping-pong fashion, where they send something, and wait for a reply, over and over. Databases that fetch one row at a time (vice spewing a bunch of matches) do this.

Such ping-pong behavior fully or partially defeats the send-window mechanism that allows TCP to “fill the bandwidth pipe”. Streaming requires having buffering to handle the bandwidth-delay product (bits per second times round-trip-time in seconds). That product is how many bits it takes to “fill the pipe”. So for example, older Windows file system operations over TCP sent 32KB at a time, then waited. Newer versions apparently leverage the ability NetBEUI has had to request the next 32 KB chunk before the current one is fully received.

When you ping-pong, the computer sends some amount of data, say 32,000 B = the default TCP window size. The stack then waits for the round trip time (RTT) for an acknowledgement. So your throughput is at best  32 KB * 8 bits/Byte divided by the RTT.

max bps throughput = Bytes per ping-pong * 8 / RTT

Here’s a crude calculation I recently did for a consulting customer. Say you’re sending data to Singapore, with RTT = 225 milliseconds = 0.255 seconds.

32,000 * 8 / 0.255 =  1,003,921 bps

So the best you’re going to do is about 1 Mbps throughput. Realistically, the standard TCP congestion avoidance gives you about 70-75% of that (cycling down to 1/2 speed, then up to full). Older stacks ran at about 50% of the theoretical max. That’s for longer transfers, for small files the TCP connection overhead and slow start may somewhat lower the throughput.

Interesting enough, the predicted performance was about what we saw. (Instant credibility!) We did see some much faster numbers, but it turned out the Linux file transfer app was compressing, so with very compressible small files you saw much faster virtual transfer speeds. The actual speed was the same, the compression just meant far fewer bytes had to be sent.

Be aware, I did say “maximum throughput”. Other factors, like server turn-around time, or client “data digestion time”, might add to the network RTT (delay) and lower throughput. They can be cranked into the above formula if you want.

By the way, this is very practical. Many sites bump up their bandwidth to get better throughput. This site had a DS-3 as I recall. Note the formula doesn’t have link bandwidth anywhere in it. You could throw all the bandwidth you want at it, but a single file transfer using TCP isn’t going to beat the number above!

Some tricks to go faster:

Transfer bigger chunks per round-trip: If you can persuade TCP to have bigger send and receive windows, then you can transfer more per RTT. (This is the usual sort of tuning the super-computing centers do.)

Parallelize: some peer to peer apps and browsers open many TCP connections. Each one is subject to the same max, but since you’re doing them all at once, the aggregate transfers more. Note that this is selfish, since it defeats the TCP congestion avoidance (aka “politeness”) and triess to grab all the bandwidth for the file transfer.

Reduce the RTT: Whip those electrons or photons?

TCP/IP with packet loss

(Thanks to James Ventre for passing some info along that got me started here. James has been doing throughput research in a data center / database context. That includes some neat testing of latency of Cisco switches, and studying the impact of new very fast optical technologies with slightly higher error rates than we’re perhaps used to. )

The key word is “Mathis formula”. This is a formula that has been experimentally verified to estimate the maximum throughput of TCP in the presence of packet loss.

See:

http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html

Summary:

Max DATA throughput rate < (MSS/RTT)*(1 / sqrt(p))

where:

  • Max Rate: is the TCP transfer rate in bps
  • MSS: is the maximum segment size (fixed for each Internet path, typically 1460 bytes) in bytes
  • RTT: is the round trip time (as measured by TCP) in seconds (ping isn’t quite as good an RTT value)
  • p: is the packet loss rate (fraction)

From skimming the discussion, there is a scaling constant that is approximately 1, which explains why the units (Bytes / seconds * number versus bps) don’t seem to quite add up.

Also note the formula isn’t valid with p = 0. That’s ok, the real world always has some packet loss in it.

Taking our Singapore link above, let’s say it loses packets at 0.1% = 0.001 of all packets. We’ll use the MSS, which is what fits into one TCP packet, of 1460, since we can’t alter the MTU on the Internet. Most routers run an MTU of 1500, but you have to subtract off IP and TCP header sizes to get the data transfer rate.

(1460 / 0.225) *  (1 / sqrt (0.001)) = 6489 * 31.6 = 205,052 bps

Note how much that 1 in 1000 packet loss rate lowered that throughput, due to TCP slowing down and also retransmitting.

Sample results

Here is a table showing the results for MSS=1460 B, and different values of RTT and p. I used  a data table formula in MS Excel to generate this.

Max Theoretical TCP Throughput in Mbps
Per Mathis Formula
Calculations assume MSS = 1460 Bytes
Of course, the throughput will never exceed the actual bandwidth. 
Probability of dropped packet
0.230846 0.00001 0.0001 0.001 0.01 0.1
5 92.33851 29.2 9.233851 2.92 0.923385
Round 10 46.16925 14.6 4.616925 1.46 0.461693
Trip 20 23.08463 7.3 2.308463 0.73 0.230846
Time 40 11.54231 3.65 1.154231 0.365 0.115423
(millisec) 60 7.694876 2.433333 0.769488 0.243333 0.076949
80 5.771157 1.825 0.577116 0.1825 0.057712
100 4.616925 1.46 0.461693 0.146 0.046169
120 3.847438 1.216667 0.384744 0.121667 0.038474
140 3.297804 1.042857 0.32978 0.104286 0.032978
160 2.885578 0.9125 0.288558 0.09125 0.028856
180 2.564959 0.811111 0.256496 0.081111 0.02565
200 2.308463 0.73 0.230846 0.073 0.023085
220 2.098602 0.663636 0.20986 0.066364 0.020986
240 1.923719 0.608333 0.192372 0.060833 0.019237
260 1.775741 0.561538 0.177574 0.056154 0.017757
280 1.648902 0.521429 0.16489 0.052143 0.016489
300 1.538975 0.486667 0.153898 0.048667 0.01539
Row input 0.001
Col input 200

Some other good links

The SLAC link above has several other links in it, to more detailed / accurate formulas. Caution: technical complexity ensues!

http://sd.wareonearth.com/~phil/

James Ventre pointed me at the following info about some well known/accepted BERs, from http://www.wcisd.hpc.mil/~phil/sc2004/SC2004M6_files/frame.htm, page 155:

Hard Drives 1*10^14
Fibre Channel 1*10^12
SONET Equipment 1*10^12
SONET Circuits 1*10^10
1000BaseT 1*10^10
Copper T1 1*10^6

CNC’s  Terry Slattery recommends:

http://www.linuxsa.org.au/meetings/2003-09/tcpperformance.print.pdf

2 responses to “TCP/IP Performance Factors

  1. i believe there’s a mistake in this calculation: (1460 / 0.225) * (1 / sqrt (0.001)) = 6489 * 31.6 = 205,052 bps and also in the sample calculations. Should be 1460*8 as result is also written in bits

  2. Normally you’d be right, I had the same thought when writing the article. Units analysis is the physics term for it. But the formula is empirical, and so the units get taken care of by the constants in there, in effect.

    I am pretty sure I checked two sources which both said MSS was in bytes not bits.

Leave a Reply