Ways to Automate VXLAN
Good network engineers know about TCP performance over Long, Fat Networks (LFNs – see RFC1323) and how to use bandwidth, delay, and window size to calculate the maximum throughput of a connection. But it seems that not many people know about the Mathis Equation, which describes how packet loss factors into the throughput calculations.
For those not familiar with it, the buffering required in a receiving system for maximum performance is based on the BW-Delay Product (BDP). It is the amount of data that can be sent between ACKs. You multiply the path’s minimum bandwidth by the round-trip delay. The table below shows the buffering requirements for several examples, with the BW converted to Bytes/Sec and the Delay converted to Sec, resulting in the BDP units of Bytes.
|BW (Mbps)||RT Delay (ms)||Bytes/Sec||Delay (Sec)||BDP (Bytes)|
When the TCP window size is more than the BDP, the path BW is the limiting factor in throughput. But when the TCP window size is less than the buffering required to keep the pipe filled, we use another equation to calculate the maximum throughput of the path. The sending system will send a full TCP window worth of data and then wait for an acknowledgement from the receiver. When the ACKs are received, more data can be sent. Therefore, the maximum throughput is the window size divided by the time it takes to get back an ACK (i.e., the round trip time).
An example is useful to illustrate what happens here. Let’s say that we want to transfer a large file from a server to our workstation and that the workstation supports a TCP window size of 65535. The path between the server and the workstation is T3 speeds (45Mbps) and the two systems are separated by a terrestrial link that’s 3000 miles long. One-way delay on terrestrial links is nominally 10ms per 1000 miles, so our link is 30ms long, one-way and 60ms round-trip.
The amount of buffering required is the link speed, converted to bytes, times the delay, in seconds.
Buffering = BW * Delay = 45,000,000/8 Bytes/Sec * .06 Sec = 337,500 Bytes. That’s bigger than our window, so we need to use the equation for TCP throughput.
Throughput = TCPWindow / round-trip-delay = 65535 Bytes / .06 Sec = 1,092,250 Bytes/Sec
= 8,738,000 bits/sec
That’s considerably less than the T3 bandwidth that a non-technical person might expect. If the systems support the window scaling option, the window size can be multiplied by a scaling factor in order to fill the pipe. But many systems don’t have window scaling options enabled or enough buffer space allocated to handle the high throughput for modern network paths, so throughput suffers and the people using these systems have lower productivity.
The above calculations all assume a lossless path. This brings us to the question of what happens when the path experiences packet loss. The formula that approximates what happens when TCP experiences loss in the path was approximated in a 1997 paper by Mathis, Semke, Mahdavi & Ott in Computer Communication Review, 27(3), July 1997, titled The macroscopic behavior of the TCP congestion avoidance algorithm.
Rate < (MSS/RTT)*(1 / sqrt(p))
where p is the probability of packet loss.
This equation seems to have become known as the Mathis Equation. Obviously, TCP performance goes down as packet loss increases, but this gives us an idea of how quickly this happens. Fiber Bit Error Rates (BER) are typically 10E-13. Some optical gear treats a link as down at a BER of 10E-6 (one bad bit in 1M bits). Assuming a stream of 1460 byte packets, that’s one bad packet approximately every 85 packets. Mathis, et al., did a thorough review of what happens with high packet loss probability, which I’ll not try to duplicate here; see the paper referenced below.
What it means
What is important for us as network engineers is to understand the impact on throughput so we can decided when to do something about it. We should be looking for links that are experiencing BER loss that exceeds 10E-10. A rough engineering calculation is that a packet nominally contains 10,000 bits (1250 Bytes), or 10E4 bits. This makes the math easier and allows us to get a gut feel for the figures. That means that we need a packet loss figure of 10E-10 (BER) – 10E-4 (the packet size in bits) = 10E-6 (packet loss rate). Converting to a decimal number from scientific representation and we have .000001, or .0001%. That’s a number that we can use with our network monitoring systems to highlight lossy links and paths. Statistically, there isn’t much difference between this calculation and one in which you assume an average packet size of 300 Bytes. I prefer the simpler calculation and base my network management thresholds on it. Also note that LAN interfaces should typically have a loss rate that’s several orders of magnitude less than a WAN link or wireless link, so you may want to use different figures for different types of links.
There is a good tutorial on TCP performance, with examples of fiber BER and different delays at
A validation was reported in Modelling TCP throughput: A simple model and its empirical validation by J. Padhye, V. Firoiu, D. Townsley and J. Kurose, in Proc. SIGCOMM Symp. Communications Architectures and Protocols Aug. 1998, pp. 304-314. While this may be a more accurate analysis, it is certainly beyond the level of detail that most network engineers wish to go.
Pete Welcher also took a look at this and posted his own view and links on the topic at: TCP/IP Performance Factors
I also recommend that you read RFC1323 (May 1992) to learn more about the fundamental mechanisms that have been in place for a long time. There are also some great references on the Internet on how to tune various TCP stacks for optimum operation over LFNs. Understanding how to measure performance and what to do about it when it seems slow is valuable knowledge.
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html
Ways to Automate VXLAN
The Changing Cisco QoS Environment
Service Chaining via Cisco Catena
Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.
John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services. Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.
He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.