Amazon Transit VPCs and Transit Gateway
I’ve had Layer 2 on the brain for a while. Or rather, mitigating Layer 2. Several prior blog articles reflect aspects of this:
I’ve got some additional thoughts to share. I’d like to recap the situation as I see it, with lots of useful links. (When I started writing this, I optimistically thought I could deliver some diagrams and conclusions, but framing the setting too enough time and space that the best part will now come in a second blog article.)
In a few words, Spanning Tree Protocol (STP) melt-down. I’ve seen an entire data center go down twice now, with UDLD helping spread the joy. In one case, a mis-configured port channel hard-coded “on” in two new access switches caused the data center core switch CPU’s to get spun up. Lack of UDLD then caused 16 or 18 access 6500’s to errdisable their uplinks. And the site didn’t have errdisable timeout configured. At the other site, a high-priority server was built and attached by 10 Gbps dual-homing to the 6500 Sup720-10G core switches, until two 6708 10G blades could be ordered. Something went wrong (the story gets a bit fuzzy here — there’s no evidence, and no obvious way how a “bare bones unconfigured Windows server install”could have bridged two ports together). In both cases, the result was a Spanning Tree loop, UDLD errdisable and/or heavy flooding to servers on 100 M ports, data center down for hours.
That wouldn’t be quite so bad except for two things:
Routing problems, by way of contrast, tend to only affect the lost prefix(es), and tend to damp down traffic (less, not more).
The other problem I’ve seen in a large L2 campus, is that your VLAN numbers become global. And you generally end up with a large wall-chart in many colors, showing which VLANs go where. When they’re non-localized, it breaks modularity, including modularity of diagrams (core, building A, datacenter B, etc.). When your network diagram starts requiring an advertising billboard due to size (half-joking), your network isn’t modular. (Or you or your boss like big diagrams?) I like 8.5 x 11 or 11 x 17 — I can read them in Visio on my PC, without mega-zooming.
Actually, it isn’t, most of the time. Closets get along fine with Layer 3 routing, either from the closet up to distribution and access layers, or from distribution layer up.
Data centers are where we need increasing amounts of Layer 2. That’s because Microsoft clusters and Oracle RAC clusters, and vmware VMotion, all require Layer 2 adjacency. Cisco is now recommending containing L2 within the access layer if possible, otherwise the access / distribution pod if possible. No L2 across the data center core. (And how much of the data center do YOU want to put at risk of STP loops?)
In part due to this, in part due to the inefficiency of having L2 links that don’t get used, we have the IETF TRILL (Transparent Interconnection of Lots of Links) effort, based on the RBridge concept from Radia Perlman. The basic idea is, get all the L2 links usable. When you’re paying for a 10 Gbps link, you definitely want to be able to use it all of the time!
Cisco’s short-term answer to that seems to be taking 8-way EtherChannel or LACP to 16-fold, so you can have Really Big Uplinks. The VSS or VPC technologies allow such EtherChannels to be split across dual chassis, increasing their survivability.
For that matter, VSS plus EtherChannel takes Spanning Tree off the table (mostly, except when your 6500 switches are having a bad day). That’s Yet Another Answer to Spanning Tree woes. Logically, your two switches with bowtie uplinks to both upstream switches look like one switch, one connnection, another switch — no loop, no Spanning Tree.
The Cisco Bridge Assurance features can also be viewed as carrying on the theme, of let’s make Spanning Tree more robust. Since one of my colleagues and friends has already written about it (for Netcordia), let me refer you to Terry Slattery’s blog about the topic, at http://www.netcordia.com/community/blogs/terrys_blog/archive/2010/01/06/what-is-bridge-assurance.aspx.
Where is this technology headed, in terms of design? It looks to me like “small” or “moderate” amounts of L2 at the data center access layer, possibly extended through the distribution layer where needed for scaling / migration. (As a physicist might say, “for various values of ‘small'”.) That is, “small” may become larger as time goes on, and the technology matures.
There are two situations I know of where the L2 need can be more severe:
DCI is sometimes used for “geocluster” applications. I love the term! (And Cisco has a couple of mildly older but good documents tying SAN into the discussion as well — google “geocluster site:cisco.com”, I liked the Design Guides.)
The above are situations where you’ve carefully bounded “failure domains” with L3, but you need controlled, safe L2 connectivity across the L3 in the middle. Preferably in such a way that Spanning Tree melt-down in one data center doesn’t take out the Business Continuity / Disaster Recovery (BC/DR) data center.
Cisco has published a number of ways to tackle the DCI setting (various documents in the SRND / Design Zone series; see the top hits when you Google “dci site:cisco.com”). The technology choices: optical technologies, QinQ, VPLS, EoMPLS, EoMPLS with semaphores, etc. For a good summary document, see Data Center Interconnect (DCI): Layer 2 Extension Between Remote Data Centers, at http://www.cisco.com/en/US/prod/collateral/switches/ps5718/ps708/white_paper_c11_493718.html. One consideration leading to complexity is High Availability / redundancy, compounded with the recommendation to not run Spanning Tree Protocol between data centers.
The latest addition (which looks pretty clever, powerful, and well-thought out) is OTV (Overlay Transport Virtualization), currently only available on the Nexus 7000 series. Rumor has it that OTV really stands for “Over the Top Virtualization”. In any case, OTV looks like the cleanest and lowest user complexity among the solutions I’ve seen in print. I’m not sure I’d want to scale it to 6 or 12 data centers. At least not for the next week or two. Or 6-12 months.
For what information is presently available about OTV, see:
Within the data center, EoMPLS is a fairly simple and workable solution, as long as you don’t insist on redundant pseudo-wires, or are prepared to deal with the ensuing complexity. (Using VSS chassis on both ends might help.)
Let me also throw in “Long Distance VMotion”, which is another highly desirable capability that DCI opens up (at least, within the distances tested). Reference: Virtual Machine Mobility with Vmware VMotion and Cisco Data Center Interconnect Technologies, at http://www.cisco.com/en/US/solutions/collateral/ns340/ns517/ns224/ns836/white_paper_c11-557822.pdf.
Putting together my first two section headings, I think we can safely conclude that: L2 is a Necessary Evil. (Mostly joking!)
What I propose to examine in my next blog on this topic (complete with diagrams), is what the implications are in terms of traffic flows. There are some definite performance implications that you will want to understand. The technologies mentioned above are striking me as the classic case of “just because you can do it, doesn’t mean you should do it.” EoMPLS is so easy, I worry about the “beer effect” (too much leads to a headache). With L2 DCI, I see the potential for lots of good consulting work, diagnosing mysterious performance issues. Well, maybe mysterious only to those who didn’t take the time to understand the implications of the technology (or read the next blog).
Amazon Transit VPCs and Transit Gateway
Aruba Wireless Controllers: Architecture & Configurations
Dealing with Performance Brownouts
Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.
John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services. Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.
He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.