Click here to request your free 14-day trial of Cisco Umbrella through NetCraftsmen today!

8/13
2013
Carole Warner Reece

EtherChannel, MAC Persistency, and the 3850 Switch Stack

I have been thinking about the interaction between LACP system IDs, port-channels on switch stacks, and virtual port-channels on Nexus gear. I saw some ‘interesting’ behaviour last week when helping a customer update the software on his 3850 stack. His stack was connected to a vPC port-channel on a pair of Nexus 7000s. We saw a stack event taking down the entire port channel when the active stack member failed over to the standby member. (This event did not impact the vPC peer link, or the SVIs on the VLANs on the port-channel.)

I am happy to report that the new stack software (CAT3K_CAA-UNIVERSALK9-M) Version 03.02.02.SE) is pretty solid, because when I tried to replicate the issue this weekend in a maintenance window I could not break the port-channel with the 3.2.2 3850X image. I think the current port-channel behavior still is worth discussing.

Background

When a switch stack forms a port-channel, the active stack member’s MAC address is used in the LACP ID is used to identify the port-channel. Similiarly, when a Nexus 7000 or 5000 vPC pair forms a vPC port-channel, the vPC virtual system-ID is used in the LACP ID.

The behaviour of previous stack port-channel code has had issues. For example, the ASA 9.1 configuration documentation states:

The ASA does not support connecting an EtherChannel to a switch stack.

The next line of the docs somewhat explains why:

If the ASA EtherChannel is connected cross stack, and if the Master switch is powered down, then the EtherChannel connected to the remaining switch will not come up.

However, a port-channel to a 6500 VSS or a Nexus 7000 vPC is supported. There has been some discussion in the Cisco Support Community on this issue with the firewall and the 3750 switch stack: https://supportforums.cisco.com/thread/2198683

I believe I saw the EtherChannel failing to stay up on the remaining switch last week — before the software upgrade. Before the upgrade, when we failed over from the active switch member in a stack to the standby switch, the port-channel on the new active switch went down and stayed down until the previous master switch was reloaded. This was not very desirable!

Switch Stack Documentation
The 3850 and 3750 Configuring EtherChannels documentation mentions:

With LACP, the system-id uses the stack MAC address from the stack master, and if the stack master changes, the LACP system-id can change. If the LACP system-id changes, the entire EtherChannel will flap, and there will be an STP reconvergence. Use the stack-mac persistent timer command to control whether or not the stack MAC address changes during a master failover.

However, they don’t actually tell you how to set the timer in this document. The ‘Managing Switch Stacks’ guide has more information:

Use the persistent MAC address feature to set a time delay before the stack MAC address changes. During this time period, if the previous active switch rejoins the stack, the stack continues to use its MAC address as the stack MAC address, even if the switch is now a stack member and not an active switch. If the previous active switch does not rejoin the stack during this period, the switch stack takes the MAC address of the new active switch as the stack MAC address.

Default Configuration
When we looked at the issue last week, with no stack-mac persistent timer configured, the Mac persistency wait time was indefinite:

AS-01#sh swit  
Switch/Stack Mac Address : 44ad.d96c.ad00 - Local Mac Address
Mac persistency wait time: Indefinite
                                             H/W   Current
Switch#   Role    Mac Address     Priority Version  State 
------------------------------------------------------------
*1       Active   44ad.d96c.ad00     10     V02     Ready               
 2       Standby  44ad.d912.c000     1      V02     Ready               

AS-01#

We updated the image to 3.2.2, set the timers to 8 (since the typical reload seemed to take under 7 minutes), did some testing, and it all seemed to work fine.

Testing After Software Image Update
So I  tried this weekend to replicate the issue – under the new 3.2.2 image. I removed the stack-mac persistent timer command, saved configs, forced a failover, and saw no issues with the port-channel status on the new active switch. Forced a failover back, no issues with new active switch.  With no stack-mac persistent timer command, the Mac persistency wait time was still Indefinite. (This means forever, I believe…)

I did learn how to update the RSA key on my Mac pretty quickly. If you are a Mac user, and you do testing of this sort while trying to SSH to a device, you may get a message like:

~ cwr$ ssh -l admin 10.18.2.15
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
56:6d:da:2f:b8:64:87:ad:53:b6:b8:7d:13:4d:8f:8f.
Please contact your system administrator.
Add correct host key in /Users/cwr/.ssh/known_hosts to get rid of this message.
Offending RSA key in /Users/cwr/.ssh/known_hosts:225
RSA host key for 10.18.2.15 has changed and you have requested strict checking.
Host key verification failed.
~ cwr$

You can clean this up pretty easily by removing the RSA host key for the IP address:

~ cwr$ ssh-keygen -R 10.18.2.15
/Users/cwr/.ssh/known_hosts updated.
Original contents retained as /Users/cwr/.ssh/known_hosts.old
~ cwr$ ssh -l admin 10.18.2.15
The authenticity of host '10.18.2.15 (10.8.2.15)' can't be established.
RSA key fingerprint is 56:6d:da:2f:b8:64:87:ad:53:b6:b8:7d:13:4d:8f:8f.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '10.18.2.15' (RSA) to the list of known hosts.
Password:

I did try several combinations to try and break the EtherChannel – no stack-mac persistent timer , stack-mac persistent timer 8, stack-mac persistent timer 4, and stack-mac persistent timer 1.

The good news for high availability is that the IOS appears to ignore this command when you are reloading a switch in the stack. As the previously-active switch goes through several role and state changes, the newly-active switch keeps the cross-stack port-channel up. Here is what the roles and states look like:

AS-01# sh switch
Switch/Stack Mac Address : 44ad.d96c.ad00 - Foreign Mac Address
Mac persistency wait time: 4 mins
 H/W Current
Switch# Role Mac Address Priority Version State 
------------------------------------------------------------
 1 Member 0000.0000.0000 0 0 Removed  
*2 Active 44ad.d912.c000 1 V02 Ready 


DCH-AS-INT-01#sh switch
Switch/Stack Mac Address : 44ad.d96c.ad00 - Local Mac Address
Mac persistency wait time: 4 mins
 H/W Current
Switch# Role Mac Address Priority Version State 
------------------------------------------------------------
 1 Member 44ad.d96c.ad00 10 0 Initializing 
*2 Active 44ad.d912.c000 1 V02 Ready 


DCH-AS-INT-01#sh switch
Switch/Stack Mac Address : 44ad.d96c.ad00 - Local Mac Address
Mac persistency wait time: 4 mins
 H/W Current
Switch# Role Mac Address Priority Version State 
------------------------------------------------------------
 1  Member 44ad.d96c.ad00 10 V02 Syncing  
*2 Active 44ad.d912.c000 1 V02 Ready 

DCH-AS-INT-01#sh switch
Switch/Stack Mac Address : 44ad.d96c.ad00 - Local Mac Address
Mac persistency wait time: 4 mins
 H/W Current
Switch# Role Mac Address Priority Version State 
------------------------------------------------------------
 1 Member 44ad.d96c.ad00 10 V02 Ready  
*2 Active 44ad.d912.c000 1 V02 Ready 

 !! Note - not really ready, HA synch has not happened....

DCH-AS-INT-01#sh switch
Switch/Stack Mac Address : 44ad.d96c.ad00 - Local Mac Address
Mac persistency wait time: 4 mins
 H/W Current
Switch# Role Mac Address Priority Version State 
------------------------------------------------------------
 1 Standby 44ad.d96c.ad00 10 V02 HA sync in progress 
*2 Active 44ad.d912.c000 1 V02 Ready 

DCH-AS-INT-01#sh switch
Switch/Stack Mac Address : 44ad.d96c.ad00 - Local Mac Address
Mac persistency wait time: 4 mins
 H/W Current
Switch# Role Mac Address Priority Version State 
------------------------------------------------------------
 1 Standby 44ad.d96c.ad00 10 V02  Ready  
*2 Active 44ad.d912.c000 1 V02 Ready 

My guess based on my experiments with various sized timers is that the IOS knows there is a switch loading, and is ignoring the stack-mac persistent timer until the loading switch is up and its Mac address can be reviewed. This will help with high availability. It should remove the caveat from the ASA of ‘not supporting an EtherChannel with a switch stack…’

Summary

If you are running a cross-stack EtherChannel on a switch stack, you probably should update to 3.2.2 (or the 3750 equivalent) for improved EtherChannel high availability.

— cwr

Carole Warner Reece

Architect

A senior network consultant with more than fifteen years of industry experience, Carole is one of our most highly experienced network professionals. Her current focus is on the data center and on network infrastructure.

View more Posts

 

Virgilio “BONG” dela Cruz Jr.

CCDP, CCNA V, CCNP, Cisco IPS Express Security for AM/EE
Field Solutions Architect, Tech Data

Virgilio “Bong” has sixteen years of professional experience in IT industry from academe, technical and customer support, pre-sales, post sales, project management, training and enablement. He has worked in Cisco Technical Assistance Center (TAC) as a member of the WAN and LAN Switching team. Bong now works for Tech Data as the Field Solutions Architect with a focus on Cisco Security and holds a few Cisco certifications including Fire Jumper Elite.

 

John Cavanaugh

CCIE #1066, CCDE #20070002, CCAr
Chief Technology Officer, Practice Lead Security Services, NetCraftsmen

John is our CTO and the practice lead for a talented team of consultants focused on designing and delivering scalable and secure infrastructure solutions to customers across multiple industry verticals and technologies. Previously he has held several positions including Executive Director/Chief Architect for Global Network Services at JPMorgan Chase. In that capacity, he led a team managing network architecture and services.  Prior to his role at JPMorgan Chase, John was a Distinguished Engineer at Cisco working across a number of verticals including Higher Education, Finance, Retail, Government, and Health Care.

He is an expert in working with groups to identify business needs, and align technology strategies to enable business strategies, building in agility and scalability to allow for future changes. John is experienced in the architecture and design of highly available, secure, network infrastructure and data centers, and has worked on projects worldwide. He has worked in both the business and regulatory environments for the design and deployment of complex IT infrastructures.