Amazon Transit VPCs and Transit Gateway
I was recently checking out a product that does syslog correlation and noticed that it had not reported a couple of events that I could see in syslog-ng’s log. I use syslog-ng because it is free, easy to install and configure, performs filtering, and forwards to other destinations. I normally have it configured to log everything to the local filesystem and to filter and forward specific events to other systems. It provides a good de-coupling mechanism between the network devices that are sending syslog messages and the systems that must process syslog. For example, NetMRI needs to receive Cisco CONFIG_I events indicating that a configuration change has been made.
The product that I was configuring was running on a separate server. It needed to receive syslog events and its display wasn’t showing me all the events that syslog-ng was recording. At first I blamed the product, but I then decided to replace it with another copy of syslog-ng to simplify the test. The test setup was syslog-ng running on Server A, a RedHat EL5 server, receiving syslog events from all the network equipment. Server B, a Centos 5.3 server, was configured with a second copy of syslog-ng, also logging to the filesystem. Server A was forwarding all Cisco syslog events to Server B. The rate of syslog events was on the order of 10 packets per second during peaks. Each packet was pretty small, because Cisco syslog messages tend to be small. I was very surprised to find that a measurable percentage of the syslog messages were being dropped on System B, even with syslog-ng. So it wasn’t a problem within the product that I was trying to install.
The next step was to verify that the UDP packets were making it from System A to System B. I ran tcpdump on both systems and verified that System A was sending the forwarded packets and that System B was receiving them. But syslog-ng was still not receiving all the events. Looking through System B’s syslog events and the tcpdump events, I could see that the packets were being received by the system, but were not being received by syslog-ng.
There are a number of web sites that discuss UDP packet loss. A good one is 29West.com’s UDP Buffer Sizing page, which includes commands for reporting the number of dropped UDP packets for several operating systems. On my system, it showed a lot of UDP packet errors:
$ netstat -us … Udp: 29582255 packets received 6898 packets to unknown port received. 15597 packet receive errors 29934317 packets sent
That definitely looked like the problem. So I worked on a number of recommendations for adjusting the UDP packet buffers. Some recommendations consume a lot of buffer space, as described in the 29West.com article above. I still had packet drops. I then switched System B to use a RedHat release and the packet errors dropped significantly. It turns out that the Centos 5.3 release drops many UDP packets, event at relatively low packet rates.
I would have expected any modern Linux kernel to be able to handle a load of hundreds of UDP packets per second on a 1-core server where there is no other competing process. But for some reason Centos has a problem handling even modest UDP packet loads. Switching to RedHat EL5 eliminated most (but not all) of the packet loss.
This brings me to another point that I find myself often making to network management vendors: syslog and traps are inherently unreliable due to the nature of their transport protocol: UDP. My recommendation to vendors is: Don’t write your network management application as if UDP were a reliable protocol. Use multiple mechanisms or multiple requests to get the data that’s needed to create informative answers to common questions. My recommendation to users: verify that the syslog and trap receivers are not dropping packets.
NetCraftsmen would like to acknowledge Infoblox for their permission to re-post this article which originally appeared in the Applied Infrastructure blog under http://www.infoblox.com/en/communities/blogs.html
Amazon Transit VPCs and Transit Gateway
Aruba Wireless Controllers: Architecture & Configurations
Dealing with Performance Brownouts
Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.
John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services. Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.
He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.