Biggest Mistakes Companies Make in the Cloud
I recently got the chance to revisit the topic of filtering HSRP traffic between datacenters. It was in the initial stages of setup for testing a LISP VM Mobility scenario — but that’s a topic for a later blog. The big picture for this blog: Data Center Interconnect (DCI), filtering HSRP between datacenters as part of optimal forwarding. This topic is also known as “DCI FHRP Isolation”. Along the way, I and those working with me found a couple of surprises. I’d like to share those with you.
Technical Background: In OTV it has historically been necessary to manually block HSRP or any other FHRP (First Hop Routing Protocol) between datacenters. It appears that is still the case, despite the expectation of automated FHRP blocking for OTV.
Lab Background: We were testing various forms of LISP optimal forwarding and VM-mobility in a dual datacenter setting, using 6500 VSS port-channeled Layer 2 links. I hope to provide more information about that in a subsequent blog, if / when time permits. To achieve VM / host computer localization, FHRP filtering is needed. We also did BPDU filtering, to emulate another benefit of OTV: Spanning Tree Protocol (STP) termination, so that STP is not propagated between datacenters.
The first semi-surprise repeated a lesson I’ve learned a lot of times: unless you lab test it, you can’t count on it being right.
For a while now, there has been a documented VACL approach for FHRP filtering. I documented it in a blog titled Cisco Overlay Transport Virtualization. See also FHRP Isolation — the example there is fairly clearly for a Nexus 7K, and I have not tested it on that platform. Cisco probably tested it rather well, unless post-testing edits crept in.
Minor surprise: the VACL approach shown there doesn’t work on a 6500 switch. The problem: on the 6500 I was using, with fairly recent code, the VACL cannot have multiple match conditions in a block. And even with the IP access list entries omitted, only using MAC entries, the VACL did not seem to be working.
Due to lab time constraints, I chose a different approach, namely filtering with both IP and MAC port access lists.
Caveat: On the Nexus, it appears that IP and MAC filtering cannot be done simultaneously on Layer 2 interfaces (cf. “mac packet-classify”). So the best bet appears to be using VACLs on Nexus, which is where most of us will likely be doing this sort of thing.
The port access lists that worked are:
description TO OTHER DATACENTER
switchport trunk encapsulation dot1q
switchport trunk allowed vlan …
switchport mode trunk
access-group mode prefer port
ip access-group HSRP-VIP in
mac access-group HSRP-VMAC in
spanning-tree bpdufilter enable
mac access-list extended HSRP-VMAC
deny 0000.0c07.ac00 0000.0000.00ff any
deny any 0000.0c07.ac00 0000.0000.00ff
deny 0000.0c9f.f000 0000.0000.0fff any
deny any 0000.0c9f.f000 0000.0000.0fff
permit any any
ip access-list extended HSRP-VIP
deny ip any host 10.10.3.1 ! VIP of the HSRP default gateway
deny udp any host 184.108.40.206 eq 1985
deny udp any host 220.127.116.11 eq 1985
permit ip any any
The entry blocking traffic to the virtual IP is likely unnecessary. I put it in out of an abundance of caution.
Note the “access-group prefer port” command — the 6500 platform does not normally apply MAC filters to IP traffic (for efficiency, I presume).
This works (well, mostly — keep reading)! When the above was configured in the lab, it definitely blocked the HSRP hellos and clearly the FHRP routers (separate from the 6500 switches for reasons I’ll go into in the sequel to this).
If you’re wondering why 6500 switches and not Nexus, well, their capacity was adequate and cost lower, i.e. business requirement, customer’s choice.
After making a lot of lab progress, we were doing some ping testing and noticed a lot of dropped packets. This was rather perplexing. I finally tracked it down to the following (assisted by muttered four letter words for an unwanted digression at the end of a long day):
It turned out, the 6500 was learning the HSRP virtual MAC address, and in fact the VMAC was flapping between the port the FHRP router was on, and the trunk to the other “datacenter” and its FHRP router. Google search quickly revealed that the 6500 switch learns MAC addresses even though a MAC ACL blocks that MAC. And apparently the Nexus does similarly. See also http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/Data_Center/DCI/4-0/EMC/implementation_guide/DCI4_EMC_IG/EMC_1.html and other recent DCI documents.
My workaround was to use a static MAC entry, which you’ll note the above URL recommends.
Another planned blog will go into how this also led us down the path of writing some EEM scripts, along with some basic aspects of DCI and optimal forwarding. I highly advise not having a single FHRP router in either datacenter — and plan to explain why!
Per the above reference, a VACL is effective outbound whereas a Port ACL cannot be applied outbound. That also would also solve the flapping MAC problem noted above. If the other datacenter never sees frames with the HSRP VMAC, they won’t learn that MAC address on the DCI port.
As the above article notes, that might be acceptable on a Nexus OTV VDC, since the separate Aggregation VDCs with the FHRP configuration would still be able to exchange MAC traffic and would likely provide the FHRP routers as well. In our lab scenario, the 6500 VSS pair in each datacenter would block HSRP within each datacenter if we used a VACL. Offsetting that, the particular design involved only one FHRP router in each datacenter, so it looks like VACLs might have worked for us.
Hashtags: #CiscoChampion #DCI #OTV #FHRPfilter
Biggest Mistakes Companies Make in the Cloud
Avoid Server Downtime By Managing Your Load Balancers: Part 2
Avoid Server Downtime By Managing Your Load Balancers