It's that time of the year again. Register for GetNetCrafty 2019! Register

Not all network engineers understand the impact of interface errors on TCP performance. Interface errors can cause a BIG impact, although it may not be intuitive at first glance.
We recently pointed out some interfaces with extremely high errors to a customer. We mentioned that the links with the highest percentage loss were likely getting very little useful data through them, and that they should investigate the cause of these errors. Initially the customer did not appear to be very concerned because the percent of errors was below 3%. We personally find error rates of greater than 0.001% to be a cause for concern.

Based on this experience, I thought I’d write up an article to illustrate the impact of interface errors.

Best TCP/IP Performance Expected
Perhaps the first question to consider is “What is the best TCP/IP performance you can expect on a Gigabit Ethernet link in the campus?”
First let’s look at the buffering required for TCP which is the bandwidth delay product (BDP). With a Gigabit Ethernet link, the buffering required in a receiving system for maximum performance is the amount of data that can be sent between ACKs. The bandwidth of a Gigabit link is 1000 Mbps. If the data exchange is inside a campus, say between a data center server and a user, the RTT should be very small, perhaps 2 milliseconds or .002 seconds. So for a Gigabit link, the receiving system needs to be able to buffer bandwidth * delay:

BDP = 1000 Mb/s * .002 seconds
BDP = 1000 Mb/s (1 byte/8 bits) * .002 seconds
BDP = 125,000,000 bytes * .002 seconds
BDP = 250,000 Bytes

When the BDP is less than the TCP window size, the path BW is the limiting factor in throughput. For a Gigabit Ethernet link, the BDP of 250,000 Bytes is greater than the default TCP window of 32,000 Bytes (the default TCP window size), so the path bandwidth will not be the limiting factor.
When the TCP window size is less than the buffering required to keep the pipe filled, the mechanics of TCP operation affect the maximum throughput. In this case, the sending system sends a full TCP window worth of data, waits for an acknowledgement from the receiver, then sends again. The application is not using the send-window mechanism that would allow TCP to fill the bandwidth pipe. Only when an ACK is received can more data can be sent. Therefore, the maximum throughput that can be achieved for a source and destination is the window size divided by the time it takes to get back an ACK (i.e., the round trip time). In this case, the best throughput you can achieve is the chunk size (amount of data sent per window) divided by the round trip time or

Max Throughput = chunk size / RTT
Max Throughput in bps = [Bytes * 8 (bits/byte) ] / RTT

Another question to consider is “What is the maximum throughput for a GE link in the data center?”

For this best case calculation, I assume the application sends a chunk of 64,000 Bytes of data across multiple TCP segments and waits for an ACK before sending more data. If the data exchange is inside a campus, say between a data center server and a user, the RTT should be very small, perhaps 2 milliseconds or .002 seconds. So the maximum rate for a single file transfer would be

64,000 * 8 / .002 =256,000,000 bps or 256 Mbps

Conclusion: If the RTT is 2ms, a maximum rate of about 256Mbps is possible in the campus across a Gigabit Ethernet link.

Expected TCP/IP Performance With Errors
A third question to consider is “What is the impact of errors on TCP/IP performance on a Gigabit Ethernet link in the campus?”

Note: There are several potential sources of interface errors, including interface discards when there is insufficient bandwidth to support the traffic volume, misconfigured duplex and speed settings, excessive buffering on interfaces, misconfigured EtherChannels, and faulty cables or hardware.

 First we consider what is an acceptable error rate. Based on the IEEE 802.3ab standards, the Bit Error Rate (BER) considered acceptable for 1000BaseT circuits is 1 in 1*10^10 bits.

1 bit loss in 1*10^10 bits/sec = 1 bit loss in 1.25*10^9 bytes per second

If we assume an average packet is 1000 bytes long, the 1000BaseT BER would be 1 packet loss in 1.25*10^6 packets. On a percentage basis, 1 packet lost/1.25*10^6 = 8*10^-7 = .00008%
Therefore we could round this up and really expect to see at most .0001% packet loss on the Gigabit Ethernet cable.

Note: This is a very generous packet size, perhaps 300 to 450 bytes may be a more common average for enterprises including VoIP. However, the 1000 byte packet size was chosen for easier math.

However, the TCP path can experiences packet loss due to performance and configuration issues with the servers and network devices. TCP performance is degraded as packets are lost and need to be retransmitted. The Mathis equation is a formula that approximates the actual impact of loss on the maximum throughput rate:

Max Rate in bps < (MSS/RTT)*(1 / sqrt(p))

where

  • MSS = maximum segment size in bytes
  • RTT = round trip time in seconds
  • p = the probability of packet loss

Note that this formula includes constant with a value that is approximately 1 that resolves the bytes to bits… The formula is known as the Mathis equation, from a 1997 paper titled The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm.

Now we can apply the Mathis equation to the example GE link. For the MSS we will use 1460 bytes, since this will fit into one TCP packet (when the MTU of the network gear is 1500 bytes.) We assume that the application will send a chunk of 1460 Bytes of data and waits for an ACK before sending more data. Since this data exchange is inside the campus, we are again assuming that the RTT is.002 seconds. So the maximum rate will be for a single file transfer with standard BER for 1000BaseT cable of 0.0001% losses:

Mathis Max Rate in bps < (MSS/RTT)*(1 / sqrt(p))
Max rate in bps < (1460/.002)*(1/ sqrt(.000001))
Max rate in bps < 7.3*10^8 bps
Max rate in bps < 730 Mbps

The predicted Mathis rate exceeds the maximum rate of 256Mbps we calculated without losses, so the maximum rate will be the lesser of these two calculations or 256Mpbs. This result is reasonable, circuits that meet the acceptable BER for Gigabit Ethernet do not adversely impact TCP performance.
What happens at our threshold rate of concern? In this case, we have 0.001% losses, or 1 packet in 100,000.

Mathis Max Rate in bps < (MSS/RTT)*(1 / sqrt(p))
Max rate in bps < (1460/.002)*(1/ sqrt(.00001))
Max rate in bps < 2.3*10^8 bps
Max rate in bps < 231 Mbps

Since this is within 10% of the predicted 256Mbps, so we deem it as “acceptable.”
However, we then look at what happens if the line has 0.01% losses, or 1 lost packet in 10,0000 packets?

Mathis Max Rate in bps < (MSS/RTT)*(1 / sqrt(p))
Max rate in bps < (1460/.002)*(1/ sqrt(.0001))
Max rate in bps < 7.3*10^7 bps
Max rate in bps < 73 Mbps

This is significantly below the predicted 256Mbps. This reduced rate will cause a noticeable impact on application performance.
Looking back at the beginning of the article, what is the impact of less than 3% errors? If we use an error rate of 3%, the maximum throughput on the Gigabit Ethernet link is drastically reduced:

Mathis Max Rate in bps < (MSS/RTT)*(1 / sqrt(p))
Max rate in bps < (1460/.002)*(1/ sqrt(.02))
Max rate in bps < 4.2*10^6 bps
Max rate in bps < 4.2 Mbps

This drastically reduced rate could easily cause sluggish application performance.


Conclusion:

The following diagram illustrates the impact of interface errors on TCP throughput. We see that host-to-host system performance quickly degrades.

2011_11_04_interface_errors

I believe that any interface error rates that exceed 0.01% should be a cause for alarm and immediate investigation/resolution. I hope this discussion helps explain why you should be very concerned about interface errors!

— cwr

_________________________________________________________________________________________

References on TCP Performance, Reliability, and the Mathis Equation

This article summarizes ideas from several sources of information:

Carole Warner Reece

Architect

A senior network consultant with more than fifteen years of industry experience, Carole is one of our most highly experienced network professionals. Her current focus is on the data center and on network infrastructure.

View more Posts

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.