Migrating to Rapid Spanning Tree

Author
Carole Warner Reece
Architect

One of my customers is planning to migrate his data center network to Rapid Spanning Tree. Although RSTP is a well established technology, it will be new to his site. Cisco has a migration discussion here: Spanning Tree from PVST+ to Rapid-PVST Migration Configuration Example

Building on that discussion, I wanted to gather some metrics for him. What would be the impact of moving the access switches first, then the core switches?

Note: The migration is planned for a maintenance window, but I still wanted to be able to let him know what would be the likely disruption to his production network.

As part of the migration, we will also be changing the root bridge on the cores for a handful of VLANs to align with the HSRP primary devices. I wanted metrics on that as well.

Background

For my test environment, I had two 3550s (all Fa interfaces) and two 3560s (all Gi interfaces). I tweaked the configured speeds a bit so all the ports costs were 19. The lab network looked like this:

2014 04 06 RSTP

I set up SVI 300 on all the devices, and confirmed all the switches could reach each other. Core-1 and Core-2 were configured with the spanning-tree vlan 300 priority shown in the diagram, cdc-j3 and cdc-j4 had default spanning-tree priorities.

Pre-Tests

For a quick check with STP implemented on all  4 switches, I set up some long pings of 30,000 packets between cdc-j3 and cdc-j4. (I call this “cdc-j3/4 are pinging”.) Then I changed the spanning-tree VLAN priority on Core-1 to 12288. VLAN 300 went down for about 28 seconds, and the cdc-j switches lost about 15 packets during the ping test. I then returned Core-1 to spanning-tree VLAN priority 4096. I then tried changing Core-1 (priority 4096) to spanning-tree mode rapid-pvst while cdc-j3/4 are pinging. When cdc-j3 and cdc-j4 are still in STP mode, 15 or 16 packets are lost. I returned Core-1 to default spanning tree mode.

Migration Testing

I then did a sequence of tests while pinging between cdc-j3 abd cdc-j4  to look for loss of connectivity based on lost packets.

  1. When cdc-j4 changes to spanning-tree mode rapid-pvst while cdc-j3 is pinging it, VLAN 300 on cdc-j4 goes down for about 2 seconds, and cdc-j3 loses 3 packets.
  2. When cdc-j3 changes to spanning-tree mode rapid-pvst while cdc-j4 is pinging it, VLAN 300 on cdc-j3 goes down for about 2 seconds, and cdc-j4 loses 3 packets.
  3. When Core-2 changes to spanning-tree mode rapid-pvst while cdc-j3/4 are pinging, no packets are lost. (Note – Core-2 not in forwarding path; VLAN 300 on Core-2 does not bounce.)
  4. When Core-1 changes to spanning-tree mode rapid-pvst while cdc-j3/4 are pinging, 1 packet is lost.  VLAN 300 on Core-1 does not bounce.
  5. When Core-1 changes to spanning-tree vlan 300 priority 12288 while cdc-j3/4 are pinging, 1 packet is lost. VLAN 300 on Core-1 does not bounce. Core-1 is initial traffic path.
  6. When Core-2 changes to spanning-tree vlan 300 priority 16384 while cdc-j3/4 are pinging, 1 packet is lost. VLAN 300 on Core-2 does not bounce. Core-2 is initial traffic path.
  7. When Core-2 changes to spanning-tree vlan 300 priority 24576 while cdc-j3/4 are pinging, no packets are lost. VLAN 300 on Core-2 does not bounce. Core-2 is NOT initial traffic path.
  8. When Core-1 changes to  spanning-tree vlan 300 priority 16384 while cdc-j3/4 are pinging, 0 or 1 packets are lost. VLAN 300 on Core-1 does not bounce. Core-1 is initial traffic path.

I did one more test – with all the network in RSTP, I wanted to make cdc-j4 the new root.  I set up a ping from cdc-j3 to Core-2, then changed the spanning-tree vlan 300 priority 4096 on cdc-j4. I found that when cdc-j4 becomes root, cdc-j3 loses 1 ping to Core-2.

The results were clear – migrate the network to RSTP starting at the end/access switches, then migrate the cores to RSTP. After the network is running RSTP, update the STP root bridge.

Summary

– Rapid Spanning tree is good.
– Changing the end nodes to Rapid STP first provides the smallest overall impact. This supports the steps illustrated in Cisco’s example to migrate the access switches first.
– Changing root bridge priority after all network is in Rapid STP mode provides the smallest overall impact to an specific VLAN.
– Worsening the root bridge priority on backup root after all network is in Rapid STP mode appears to have no impact with dual connected edge devices.
– Improving the root bridge priority on the root switch after all network is in Rapid mode appears to have very small or no impact with dual connected edge devices.

— cwr

3 responses to “Migrating to Rapid Spanning Tree

  1. Nice post Carole!

    RSTP is very fast to synchronize. It’s interesting that there was a difference when you changed to RSTP on access vs core. There were more loss in the access and the interfaces bounced which it did not in the core.

    The only thing I can think of is that the cores would have all links forwarding vs access which has one link forwarding. Do you have any theory on that?

    Hopefully as products as CML come a long we can start to test these things out easier, it’s tough today to simulate customer environments.

  2. Daniel –

    The goal was to minimize impact throughout the data center.

    One difference is the access switch changes happened with all standard STP neighbors. When the Core-2 changed, most of the neighbors were already running RSTP, and it was not on forwarding path for VLAN 300 between the access switches. When Core-1 changed, all its neighbors were already on RSTP.

    It looked to me that the RSTP changes were locally significant. VLAN 300 only went down on the specific access switch that changed modes, and it was quick. After all access switches were migrated, changing the backup then primary root bridge did not bounce the access switch SVIs. So depending on how the VLANs are deployed in a data center, you could manage small localized rolling changes.

    fwiw, while I was Telnetted into a switch, I could make the change and not lose connectivity.

    Carole

  3. Nice post indeed. I always like to see some empirical data in a blog post. 😀

    Lately I’ve been a fan of setting up the core as a VSS cluster (6500 or 4500) or VPC pair (Nexus). That way, the access can use multichassis Etherchannel and both (or more) links are simultaneously forwarding for all VLANs.

Leave a Reply