Click here to request your free 14-day trial of Cisco Umbrella through NetCraftsmen today!

(Updated 9/30/09)
For a while I’ve been loosely tracking TCP/IP performance information. I’d like to share some information about this area. I claim some insight but freely admit I’m still learning.

Motivation: It seems like over the last two years, a number of application WAN (MPLS VPN, etc.) performance issues have been coming our (my) way. If you’re having problems, this blog will provide some pointers that I think will help.

In particular, note that adding bandwidth to a WAN link may NOT make backup or file transfer go faster!

There are three things I’d like to talk about here:

  • TCP/IP enhancements
  • Round trip time and ping-pong behavior
  • TCP/IP with packet loss

Brief Historical perspective

If you Google, you’ll find a number of links on improving file transfer or TCP performance. Apply with large grain of salt, as with any web info, some authors don’t have a clue and are putting out incorrect information.

Having said that, over the last many years, the various supercomputer and super-collider sites have been great sources of advice on topics like tuning OS settings, especially Windows registry settings, to get better performance. Historically, I believe Windows carried along some behavior settings that were probably better for dial-up that for high-speed LAN.

The reason they did this: If you’re a Pittsburgh SuperComputing, CERN, or SLAC researcher trying to transfer a 10 GB file quickly, some tuning used to really help!

http://www.psc.edu/networking/projects/tcptune/

http://en.wikipedia.org/wiki/TCP_tuning

http://icfa-scic.web.cern.ch/ICFA-SCIC/docs/20020712-TcpBW-SR.ppt

TCP/IP enhancements

There are a number of enhancements for TCP/IP: selective ACK, revised congestion response for situations like high speed lossy links (e.g. some satellite or wireless links), and the like. For example, if you’re trying to blast along a fast lossy link, you don’t want a delayed response or dropped packet to slow you down per standard TCP behavior, because that causes more delay and throughput hit than SACK and “selective makeup”. In other words, you make TCP a little less sensitive to hints, so it takes a couple of lost packets before it really throttles back the way it does on a LAN.

This sort of thing may really help if  your computer stack supports them. Microsoft Vista added a number of clever TCP/IP functions, and from what I’ve read, Microsoft is putting them into upcoming server code as well. These help.

If your OS stack doesn’t support them, well, then the technical complexity goes up. Up for changing out your TCP stack? I didn’t think so…

Round trip time and ping-pong behavior

This was (and sometimes still is) one of our hiring interview questions. I’ve written about it in my articles.

TCP (hence FTP, SFTP, etc.) is geared towards streaming data. However, database operations running over a TCP connection, Microsoft file shares, Sun NFS, and applications may operate in ping-pong fashion, where they send something, and wait for a reply, over and over. Databases that fetch one row at a time (vice spewing a bunch of matches) do this.

Such ping-pong behavior fully or partially defeats the send-window mechanism that allows TCP to “fill the bandwidth pipe”. Streaming requires having buffering to handle the bandwidth-delay product (bits per second times round-trip-time in seconds). That product is how many bits it takes to “fill the pipe”. So for example, older Windows file system operations over TCP sent 32KB at a time, then waited. Newer versions apparently leverage the ability NetBEUI has had to request the next 32 KB chunk before the current one is fully received.

When you ping-pong, the computer sends some amount of data, say 32,000 B = the default TCP window size. The stack then waits for the round trip time (RTT) for an acknowledgement. So your throughput is at best  32 KB * 8 bits/Byte divided by the RTT.

max bps throughput = Bytes per ping-pong * 8 / RTT

Here’s a crude calculation I recently did for a consulting customer. Say you’re sending data to Singapore, with RTT = 225 milliseconds = 0.255 seconds.

32,000 * 8 / 0.255 =  1,003,921 bps

So the best you’re going to do is about 1 Mbps throughput. Realistically, the standard TCP congestion avoidance gives you about 70-75% of that (cycling down to 1/2 speed, then up to full). Older stacks ran at about 50% of the theoretical max. That’s for longer transfers, for small files the TCP connection overhead and slow start may somewhat lower the throughput.

Interesting enough, the predicted performance was about what we saw. (Instant credibility!) We did see some much faster numbers, but it turned out the Linux file transfer app was compressing, so with very compressible small files you saw much faster virtual transfer speeds. The actual speed was the same, the compression just meant far fewer bytes had to be sent.

Be aware, I did say “maximum throughput”. Other factors, like server turn-around time, or client “data digestion time”, might add to the network RTT (delay) and lower throughput. They can be cranked into the above formula if you want.

By the way, this is very practical. Many sites bump up their bandwidth to get better throughput. This site had a DS-3 as I recall. Note the formula doesn’t have link bandwidth anywhere in it. You could throw all the bandwidth you want at it, but a single file transfer using TCP isn’t going to beat the number above!

Some tricks to go faster:

Transfer bigger chunks per round-trip: If you can persuade TCP to have bigger send and receive windows, then you can transfer more per RTT. (This is the usual sort of tuning the super-computing centers do.)

Parallelize: some peer to peer apps and browsers open many TCP connections. Each one is subject to the same max, but since you’re doing them all at once, the aggregate transfers more. Note that this is selfish, since it defeats the TCP congestion avoidance (aka “politeness”) and triess to grab all the bandwidth for the file transfer.

Reduce the RTT: Whip those electrons or photons?

TCP/IP with packet loss

(Thanks to James Ventre for passing some info along that got me started here. James has been doing throughput research in a data center / database context. That includes some neat testing of latency of Cisco switches, and studying the impact of new very fast optical technologies with slightly higher error rates than we’re perhaps used to. )

The key word is “Mathis formula”. This is a formula that has been experimentally verified to estimate the maximum throughput of TCP in the presence of packet loss.

See:

http://www.slac.stanford.edu/comp/net/wan-mon/tutorial.html

Summary:

Max DATA throughput rate < (MSS/RTT)*(1 / sqrt(p))

where:

  • Max Rate: is the TCP transfer rate in bps
  • MSS: is the maximum segment size (fixed for each Internet path, typically 1460 bytes) in bytes
  • RTT: is the round trip time (as measured by TCP) in seconds (ping isn’t quite as good an RTT value)
  • p: is the packet loss rate (fraction)

From skimming the discussion, there is a scaling constant that is approximately 1, which explains why the units (Bytes / seconds * number versus bps) don’t seem to quite add up.

Also note the formula isn’t valid with p = 0. That’s ok, the real world always has some packet loss in it.

Taking our Singapore link above, let’s say it loses packets at 0.1% = 0.001 of all packets. We’ll use the MSS, which is what fits into one TCP packet, of 1460, since we can’t alter the MTU on the Internet. Most routers run an MTU of 1500, but you have to subtract off IP and TCP header sizes to get the data transfer rate.

(1460 / 0.225) *  (1 / sqrt (0.001)) = 6489 * 31.6 = 205,052 bps

Note how much that 1 in 1000 packet loss rate lowered that throughput, due to TCP slowing down and also retransmitting.

Sample results

Here is a table showing the results for MSS=1460 B, and different values of RTT and p. I used  a data table formula in MS Excel to generate this.

Max Theoretical TCP Throughput in Mbps
Per Mathis Formula
Calculations assume MSS = 1460 Bytes
Of course, the throughput will never exceed the actual bandwidth. 
Probability of dropped packet
0.230846 0.00001 0.0001 0.001 0.01 0.1
5 92.33851 29.2 9.233851 2.92 0.923385
Round 10 46.16925 14.6 4.616925 1.46 0.461693
Trip 20 23.08463 7.3 2.308463 0.73 0.230846
Time 40 11.54231 3.65 1.154231 0.365 0.115423
(millisec) 60 7.694876 2.433333 0.769488 0.243333 0.076949
80 5.771157 1.825 0.577116 0.1825 0.057712
100 4.616925 1.46 0.461693 0.146 0.046169
120 3.847438 1.216667 0.384744 0.121667 0.038474
140 3.297804 1.042857 0.32978 0.104286 0.032978
160 2.885578 0.9125 0.288558 0.09125 0.028856
180 2.564959 0.811111 0.256496 0.081111 0.02565
200 2.308463 0.73 0.230846 0.073 0.023085
220 2.098602 0.663636 0.20986 0.066364 0.020986
240 1.923719 0.608333 0.192372 0.060833 0.019237
260 1.775741 0.561538 0.177574 0.056154 0.017757
280 1.648902 0.521429 0.16489 0.052143 0.016489
300 1.538975 0.486667 0.153898 0.048667 0.01539
Row input 0.001
Col input 200

Some other good links

The SLAC link above has several other links in it, to more detailed / accurate formulas. Caution: technical complexity ensues!

http://sd.wareonearth.com/~phil/

James Ventre pointed me at the following info about some well known/accepted BERs, from http://www.wcisd.hpc.mil/~phil/sc2004/SC2004M6_files/frame.htm, page 155:

Hard Drives 1*10^14
Fibre Channel 1*10^12
SONET Equipment 1*10^12
SONET Circuits 1*10^10
1000BaseT 1*10^10
Copper T1 1*10^6

CNC’s  Terry Slattery recommends:

http://www.linuxsa.org.au/meetings/2003-09/tcpperformance.print.pdf

Peter Welcher

Peter Welcher

Architect, Operations Technical Advisor

A principal consultant with broad knowledge and experience in high-end routing and network design, as well as data centers, Pete has provided design advice and done assessments of a wide variety of networks. CCIE #1773, CCDP, CCSI (#94014)

View more Posts

 

Nick Kelly

Cybersecurity Engineer, Cisco

Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.