OTV Best Practices have come to the forefront lately. Various sites are starting to implement OTV. The ones I’m aware of to date are aware they are taking a minor risk (immature technology). They have chosen to go ahead anyway because they are migrating to new data centers and OTV is potentially very helpful in doing so. You don’t need to be doing live VMotion for the benefit to be seen. Even if you are moving physical servers, OTV can be helpful. So if you’re going to be doing OTV, doing it the best way is obviously the way to go. This blog assumes you know how OTV works, and focuses on best practices (Cisco’s, mine, lessons learned).
For those who somehow missed all the Cisco press and my prior blogs: OTV is a way to transport Layer 2 between data centers over any (sufficiently high speed) Layer 3 IP network. I consider it the best of the Data Center Interconnect (DCI) approaches Cisco provides. OTV includes technology to reduce WAN ARP broadcast traffic, isolate STP instances to each data center, etc. That does not make it perfect: any DCI technique necessarily allows BUM (Broadcast, Unknown Unicast, Multicast) traffic to slosh between sites — it has to, or various protocols and applications would break. OTV reduces broadcast traffic by doing ARP caching and filtering. Apparently further tools to filter broadcast or BUM traffic may appear in future releases, but are not yet available.
See the OTV Technology Introduction and Deployment Considerations document, at http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/DCI/whitepaper/DCI_1.html. It contains lots of good design and other information. I intend to summarize main points from it and from my brain, to give you somewhat of a checklist you can use. Being a consultant, of course I also highly recommend professional design advice and review!
I’m going to start with the part that isn’t in the above document, i.e. “hidden” OTV best practices.
From recent experience, there is one best practice you will want to incorporate in your action plan, one that is hinted at but not really spelled out in the above document. With any Data Center Interconnect technique, you really should implement the various functions that limit broadcast traffic and mitigate its consequences: traffic storm control, hardware rate limiting, and CoPP (Control Plane Policing).
See also my blog about Traffic Storm Control
By the way, you really also should implement all the other STP loop defenses that everyone knows about but nobody deploys (at least not until burned): BPDU guard, Root Guard, Loop Guard, and UDLD or Bridge Assurance. If you don’t have a STP loop in the first place, then you won’t need traffic storm control, hardware
(I feel the lack of a diagram here … see the Cisco document above for many diagrams.)
The OTV AED behavior should prevent OTV itself from causing a STP loop.
However, there is still risk of “spillover”. That is, if either data center experiences a STP loop, the flood of BUM traffic will spill over across your OTV link. If there is an older CPU at the “old data center”, it may experience problems under a sufficiently severe load.
This is why I recommend the traffic storm control and hardware rate limiting above.
(Hey, it’s not often my crystal ball coughs up a Best Practice that’s back to the future!)
If / when Cisco ASA and/or ACE allow stateful clustering, do you cluster across OTV? I personally think it a risk or Worst Practice: if the stateful replication gets messed up, both datacenters could be adversely affected or off the air. (I’ve seen it happen in a single data center with CheckPoint firewalls.)
You can use the comments capability below to provide your own opinion. I’d love to hear what people think!
Fifty Shades of Cloud
Become Agile with Equinix Network Edge
Nick has over 20 years of experience in Security Operations and Security Sales. He is an avid student of cybersecurity and regularly engages with the Infosec community at events like BSides, RVASec, Derbycon and more. The son of an FBI forensics director, Nick holds a B.S. in Criminal Justice and is one of Cisco’s Fire Jumper Elite members. When he’s not working, he writes cyberpunk and punches aliens on his Playstation.
Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.
John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services. Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.
He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.