Replicating at Speed
More and more sites are deploying Cisco Nexus 9K-based fabrics. Basically, if the time has come for datacenter switch refresh, you have two choices: Nexus 7700 etc. with a “classic Nexus” design (core, distribution, Top of Rack, FEX), or Nexus 9K-based fabrics. Fabrics are cool and look like a future direction for Cisco and most folks.
Design-wise, you can do classic VLANs, or Nexus VPC-based design, or a fabric. Most are opting for fabrics. Classic VLANs suffer the weaknesses and risk of Spanning Tree Protocol (STP). VPC mitigates that risk somewhat, allowing MLAG, but at the price of a little complexity, mostly behavioral rather than design.
Caution: do not accidentally duplicate a VPC domain ID within a L2 domain.
Using VXLAN increases the complexity, but allows any L2 VLAN anywhere, from the edge systems perspective. Ivan Pepelnjak recently pointed out that VXLAN gives you a more-robust control plane, but does not help much with the broadcast radiation problem of VLANs. So, I will emphasize that Control Plane Protection and storm control features are additional wise precautions in any such environment.
Note: I’ve previously blogged about storm control, N9K is documented as having the same counter-intuitive configuration behavior as the other Nexus switches.
Ok, so with some safety measures, a datacenter running VXLAN may well be more robust than one on STP.
Concerning Datacenter Interconnect (DCI), I prefer VXLAN to Spanning Tree (STP). Routing fails closed whereas STP fails open, which is not good when dealing with WAN latency and lost packets. The same applies to clustering (VMware NSX, firewalls, or servers) over the WAN; stretching control backplanes over links that are inherently less reliable than in datacenters strikes me as risky.
The next question is building and managing your fabric.
You can configure it manually. VXLAN configuration is a little verbose but can be templated. NetCraftsmen has done that in some small-ish fabrics. Managing it can be done with show commands.
At the scale of 2 spine and 4 to 6 or 8 leaf switches, all that is do-able. Still, automation ought to make it easier. And maybe, while the tools are at it, they could provide some operational support as well, including error checking?
I know of several VXLAN automation alternatives:
Let’s discuss them. Well, the latter three, seeing as NFM is off the table now.
Cisco’s DCNM has been evolving from a click-to-configure GUI for datacenter. A couple of years ago, its primary user base seemed to be SAN admins. The product used to be separate (more or less) LAN and SAN products. I see they are still separate “installations”. DCNM also seems to have shifted focus from click-to-configure to more of an automation and management focus / operations.
Anyway, DCNM version 11 supports VXLAN in various ways. It has a fabric builder for fabric initialization and does consistency monitoring. One of the very technical Cisco experts in the VXLAN and DCI areas, Yves-Louis, has blogged about DCNM for VXLAN at length, complete with demo / how-to video clips. You might find it interesting to look at Yves’ prior blogs as well.
Cisco ACI is the obvious alternative. With the starter bundle, small to fairly large sites can buy three-controller ACI for almost the same cost as plain Nexus. That provides you with automation and fabric monitoring and management.
Cisco ACI is great in a lot of ways. It also changes some standard network concepts in subtle ways, i.e. has a learning and experience curve associated with it. Some of that revolves around backing up the fabric configuration, and how to deal with controller failure, in case that ever happens. The rest involves learning the various GUI knobs and influences on how ACI forwards traffic — and thinking a bit differently. ACI makes the fabric into effectively one big router, switch, and L4 firewall, which can have external L2 and L3 connections.
There are two ways to use ACI: full-blown ACI, using contracts for connectivity and security. As noted, this really changes how you do datacenter, which might be a good thing. The second way is what I call “legacy network mode”, where you do VLANs and routing, more or less as usual. L2OUT and L3OUT policies connect ACI to the external world. ACI documents tend to talk about this more as a migration technique.
The reason I bring this up is that one may want to limit the learning curve and complexity, if your primary goal is fabric management and not contracts and security. Finding staff with ACI skills might be a related concern. As I’ve blogged elsewhere, configuring ACI so it’s not write-once / read-never might also be a consideration.
ACI is certainly well-documented. Online you’ll find several thick books’ worth of information, with a lot more documents on fine details. Cisco has also documented various ways of interconnecting datacenters, using either CLI-based VXLAN or ACI. There’s also at least one Cisco Press book about VXLAN / EVPN, and there’s also Cisco Live presentations. For that matter, I attended the Cisco Live all-day VXLAN / DCI techtorial two years ago (All good, but not necessarily simple). Does the cumulative page count concerning ACI make my point about ACI not being simple?
If you have a heavily virtualized environment, that should probably factor into any ACI decision. That’s a long discussion (alternatives, pros, cons) that I’ll mostly save for another time. If 90% of your apps are in VMware, it may make the most sense to manage security there, especially if doing NSX.
The third choice is Apstra. Apstra is a startup with some significant customers, providing Intent-Based datacenter fabrics for switches from a variety of vendors (Cisco, Arista, Cumulus), including some open-source platforms. The Apstra product automates configuration, validates connectivity in an ongoing fashion, and (reportedly) can do scalable telemetry. Yes, it has risk (startup!!!).
I’m not going to try to scoreboard Apstra versus ACI regarding all the possible VXLAN features, e.g. anycast gateway, IP multicast, IPv6, multi-site VXLAN. I googled several of those topics and am not coming up with Apstra links (Exercise for the reader!).
I asked Apstra about some of the technical details. Apparently:
Short version: fair VXLAN / EVPN support, currently lacking some of the scaling features around BUM traffic. That may be necessary due to being a multi-vendor platform; lowest-common denominator / not leveraging or may / may not inherit use of some advanced features on Cisco hardware.
Positioning-wise, Apstra does not currently, as far as I know, attempt to do security. Yet, if ever. That’s one reason I included it in my list. Quite possibly far fewer features and options than ACI, for those who want fabric automation and management, and not that much more (at least currently). That might be a Good Thing! Also, according to some off-camera #NFD19 discussion, Apstra appears to have some good features in the works.
I should probably note: beware, Apstra comes with a learning curve as well, just like ACI. Apstra is a startup, so might be less well-documented, and comes with some of the other aspects of buying from a startup / fairly new company.
There are several ways you can automate a Cisco (or other) fabric, should you wish to do so.
Comments are welcome, both in agreement or constructive disagreement about the above. I enjoy hearing from readers and carrying on deeper discussion via comments. Thanks in advance!
Hashtags: #CiscoChampion #TheNetCraftsmenWay #Cisco #Nexus #Datacenter #VXLAN #Fabric
Did you know that NetCraftsmen does network /datacenter / security / collaboration design / design review? Or that we have deep UC&C experts on staff, including @ucguerilla? For more information, contact us at firstname.lastname@example.org.
Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.
John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services. Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.
He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.