2019

Aruba Wireless Controllers: Architecture & Configurations

Not all network engineers understand the impact of interface errors on TCP performance. Interface errors can cause a BIG impact, although it may not be intuitive at first glance.

We recently pointed out some interfaces with extremely high errors to a customer. We mentioned that the links with the highest percentage loss were likely getting very little useful data through them, and that they should investigate the cause of these errors. Initially the customer did not appear to be very concerned because the percent of errors was below 3%. We personally find error rates of greater than 0.001% to be a cause for concern.

Based on this experience, I thought I’d write up an article to illustrate the impact of interface errors.

Best TCP/IP Performance Expected

Perhaps the first question to consider is “What is the best TCP/IP performance you can expect on a Gigabit Ethernet link in the campus?”

First let’s look at the buffering required for TCP which is the bandwidth delay product (BDP). With a Gigabit Ethernet link, the buffering required in a receiving system for maximum performance is the amount of data that can be sent between ACKs. The bandwidth of a Gigabit link is 1000 Mbps. If the data exchange is inside a campus, say between a data center server and a user, the RTT should be very small, perhaps 2 milliseconds or .002 seconds. So for a Gigabit link, the receiving system needs to be able to buffer bandwidth * delay:

BDP = 1000 Mb/s * .002 seconds

BDP = 1000 Mb/s (1 byte/8 bits) * .002 seconds

BDP = 125,000,000 bytes * .002 seconds

BDP = 250,000 Bytes

When the BDP is less than the TCP window size, the path BW is the limiting factor in throughput. For a Gigabit Ethernet link, the BDP of 250,000 Bytes is greater than the default TCP window of 32,000 Bytes (the default TCP window size), so the path bandwidth will not be the limiting factor.

When the TCP window size is less than the buffering required to keep the pipe filled, the mechanics of TCP operation affect the maximum throughput. In this case, the sending system sends a full TCP window worth of data, waits for an acknowledgement from the receiver, then sends again. The application is not using the send-window mechanism that would allow TCP to fill the bandwidth pipe. Only when an ACK is received can more data can be sent. Therefore, the maximum throughput that can be achieved for a source and destination is the window size divided by the time it takes to get back an ACK (i.e., the round trip time). In this case, the best throughput you can achieve is the chunk size (amount of data sent per window) divided by the round trip time or

Max Throughput = chunk size / RTT

Max Throughput in bps = [Bytes * 8 (bits/byte) ] / RTT

Max Throughput in bps = [Bytes * 8 (bits/byte) ] / RTT

Another question to consider is “What is the maximum throughput for a GE link in the data center?”

For this best case calculation, I assume the application sends a chunk of 64,000 Bytes of data across multiple TCP segments and waits for an ACK before sending more data. If the data exchange is inside a campus, say between a data center server and a user, the RTT should be very small, perhaps 2 milliseconds or .002 seconds. So the maximum rate for a single file transfer would be

64,000 * 8 / .002 =256,000,000 bps or 256 Mbps

Conclusion: If the RTT is 2ms, a maximum rate of about 256Mbps is possible in the campus across a Gigabit Ethernet link.

Expected TCP/IP Performance With Errors

A third question to consider is “What is the impact of errors on TCP/IP performance on a Gigabit Ethernet link in the campus?”

Note: There are several potential sources of interface errors, including interface discards when there is insufficient bandwidth to support the traffic volume, misconfigured duplex and speed settings, excessive buffering on interfaces, misconfigured EtherChannels, and faulty cables or hardware.

First we consider what is an acceptable error rate. Based on the IEEE 802.3ab standards, the Bit Error Rate (BER) considered acceptable for 1000BaseT circuits is 1 in 1*10^10 bits.

1 bit loss in 1*10^10 bits/sec = 1 bit loss in 1.25*10^9 bytes per second

If we assume an average packet is 1000 bytes long, the 1000BaseT BER would be 1 packet loss in 1.25*10^6 packets. On a percentage basis, 1 packet lost/1.25*10^6 = 8*10^-7 = .00008%

Therefore we could round this up and really expect to see at most .0001% packet loss on the Gigabit Ethernet cable.

Note: This is a very generous packet size, perhaps 300 to 450 bytes may be a more common average for enterprises including VoIP. However, the 1000 byte packet size was chosen for easier math.

However, the TCP path can experiences packet loss due to performance and configuration issues with the servers and network devices. TCP performance is degraded as packets are lost and need to be retransmitted. The Mathis equation is a formula that approximates the actual impact of loss on the maximum throughput rate:

Max Rate in bps < (MSS/RTT)*(1 / sqrt(p))

where

- MSS = maximum segment size in bytes
- RTT = round trip time in seconds
- p = the probability of packet loss

*Note that this formula includes constant with a value that is approximately 1 that resolves the bytes to bits… The formula is known as the Mathis equation, from a 1997 paper titled* The Macroscopic Behavior of the TCP Congestion Avoidance Algorithm.

Now we can apply the Mathis equation to the example GE link. For the MSS we will use 1460 bytes, since this will fit into one TCP packet (when the MTU of the network gear is 1500 bytes.) We assume that the application will send a chunk of 1460 Bytes of data and waits for an ACK before sending more data. Since this data exchange is inside the campus, we are again assuming that the RTT is.002 seconds. So the maximum rate will be for a single file transfer with standard BER for 1000BaseT cable of 0.0001% losses:

Mathis Max Rate in bps < (MSS/RTT)*(1 / sqrt(p))

Max rate in bps < (1460/.002)*(1/ sqrt(.000001))

Max rate in bps < 7.3*10^8 bps

Max rate in bps < 730 Mbps

Max rate in bps < (1460/.002)*(1/ sqrt(.000001))

Max rate in bps < 7.3*10^8 bps

Max rate in bps < 730 Mbps

The predicted Mathis rate exceeds the maximum rate of 256Mbps we calculated without losses, so the maximum rate will be the lesser of these two calculations or 256Mpbs. This result is reasonable, circuits that meet the acceptable BER for Gigabit Ethernet do not adversely impact TCP performance.

What happens at our threshold rate of concern? In this case, we have 0.001% losses, or 1 packet in 100,000.

Mathis Max Rate in bps < (MSS/RTT)*(1 / sqrt(p))

Max rate in bps < (1460/.002)*(1/ sqrt(.00001))

Max rate in bps < 2.3*10^8 bps

Max rate in bps < 231 Mbps

Max rate in bps < (1460/.002)*(1/ sqrt(.00001))

Max rate in bps < 2.3*10^8 bps

Max rate in bps < 231 Mbps

Since this is within 10% of the predicted 256Mbps, so we deem it as “acceptable.”

However, we then look at what happens if the line has 0.01% losses, or 1 lost packet in 10,0000 packets?

Mathis Max Rate in bps < (MSS/RTT)*(1 / sqrt(p))

Max rate in bps < (1460/.002)*(1/ sqrt(.0001))

Max rate in bps < 7.3*10^7 bps

Max rate in bps < 73 Mbps

Max rate in bps < (1460/.002)*(1/ sqrt(.0001))

Max rate in bps < 7.3*10^7 bps

Max rate in bps < 73 Mbps

This is significantly below the predicted 256Mbps. This reduced rate will cause a noticeable impact on application performance.

Looking back at the beginning of the article, what is the impact of less than 3% errors? If we use an error rate of 3%, the maximum throughput on the Gigabit Ethernet link is drastically reduced:

Mathis Max Rate in bps < (MSS/RTT)*(1 / sqrt(p))

Max rate in bps < (1460/.002)*(1/ sqrt(.02))

Max rate in bps < 4.2*10^6 bps

Max rate in bps < 4.2 Mbps

Max rate in bps < (1460/.002)*(1/ sqrt(.02))

Max rate in bps < 4.2*10^6 bps

Max rate in bps < 4.2 Mbps

This drastically reduced rate could easily cause sluggish application performance.

Conclusion:

The following diagram illustrates the impact of interface errors on TCP throughput. We see that host-to-host system performance quickly degrades.

I believe that any interface error rates that exceed 0.01% should be a cause for alarm and immediate investigation/resolution. I hope this discussion helps explain why you should be very concerned about interface errors!

— cwr

_________________________________________________________________________________________

References on TCP Performance, Reliability, and the Mathis Equation

This article summarizes ideas from several sources of information:

2019

Aruba Wireless Controllers: Architecture & Configurations

2019

Dealing with Performance Brownouts

2019

Just Say No to Jumbo Frames