Troubleshooting a Large-Scale CUCM Deployment
Editor’s Note: Sharing expertise with other industry professionals is part of being a true Master Craftsmen, and so it’s an important aspect of the NetCraftsmen mission. The following blog post originally appeared on UCGuerrilla.com, a site where our own William Bell writes about unified communication technologies. He was kind enough to let us repost it here, and we think many technologists will find it valuable. Enjoy.
The Cisco Mobile and Remote Access (MRA) feature is a “client edge” solution that allows external software and hardware clients to register to enterprise Cisco Unified Communication (UC) solutions without requiring a VPN. Like most things, there are a lot of moving parts working together to create a relatively seamless user experience. And, like most things, the first time you deploy MRA there are a few “gotchas” that can eat up a significant amount of troubleshooting time.
This blog entry captures procedures I use when troubleshooting or validating an MRA deployment. These procedures can be used to validate the initial deployment or they can be used to troubleshoot connectivity problems for an individual user.
Proper troubleshooting technique requires that you have a thorough understanding of how things should work during normal operations. I presented on the MRA registration process during a NetCraftsmen Cisco Mid-Atlantic User Group (CMUG) meeting last year. If you need a review of the architecture with a walk-through of the Jabber client discovery and registration process, then a PDF of that presentation is available here.
At a high-level, the MRA registration process follows this flow:
This blog entry is focused on a scenario where we are using corporate presence services and UCM for call control. We are also roughly following the sequence of transactions that are actually used by a Jabber client. Procedures were originally developed with the 10.x version of Jabber running on Mac OS X and Windows.
Upon initialization, the Jabber client enters into a “Service Discovery” mode. At this stage, the client is trying to determine if it is inside the corporate network or outside. The mechanism that is used is DNS. Specifically, the Jabber client will query for specific DNS service records (SRV record) based on the assigned service domain.
The service domain is derived from the Jabber ID (JID) assigned to the end user. For example, if my JID is email@example.com then the service domain is company.com. The service domain is usually specified by the user the first time they attempt to log into Jabber, though the service domain can also be administratively assigned in the jabber-config.xml file.
Once the service domain is known, the Jabber client will go through the sub-process of Service Discovery. This starts with DNS SRV queries for:
_cisco-uds._tcp.company.com: Points to the UDS service on a UCM cluster
_cuplogin._tcp.company.com: Legacy record that points to XMPP service on IM&P service
In an MRA scenario, where a client is outside of the corporate network, the above SRV records should not be resolvable. If the client fails to receive a positive response to the UDS/cuplogin queries, it will then send a DNS SRV query for _collab-edge._tls.company.com. In a properly implemented solution, this query should return one or more records that point to your Expressway Edge (or VCS-E) cluster.
Assuming everything is configured correctly and fully operational, the Jabber client will attempt to establish a TLS connection to one of your Edge appliances.
Once the client establishes a TLS connection to port 8443 on the Edge appliance, the user credentials are authenticated. At this point, the proxy connection is established and the client will start downloading configuration information from the UCM cluster. This configuration information is used to complete the service registration phases.
If the Jabber client is provisioned for IM&P presence services, the client will attempt to establish a connection on TCP port 5222. Registration requests are sent to the Edge appliance, which then proxies the transaction through the Core appliance to the IM&P cluster node(s).
If the Jabber client is provisioned as a voice/video soft phone, the client will attempt to establish a connection on TCP port 5061. Registration requests are sent to the Edge appliance, which is then proxied through the Core appliance to the UCM cluster node(s). Successful registration is required for voice/video call functionality.
If the Jabber client is provisioned with visual voicemail, the Jabber client will submit registration requests to the Edge appliance using the already established TLS connection on port 8443. The Edge appliance proxies the request through the Core to the REST API on Unity Connection.
All of these procedures are performed from the client perspective.
This step is fairly straightforward. We need to determine if the client can resolve the proper DNS SRV records. Using dig or nslookup, verify that the client can resolve the collaboration edge SRV records. For example:
dig srv _collab-edge._tls.company.com
nslookup -type=srv _collab-edge._tls.company.com
It is also a good idea to verify that the client is unable to resolve the UDS and cuplogin SRV records.
If the client can resolve the UDS records then, the Jabber client will never attempt to connect to the Edge. If the client receives a positive response to the UDS query and/or the client fails to receive a positive response to the Edge discovery, then review your external DNS configuration.
This troubleshooting step is a little more involved. The web-based API on the Edge appliance uses API calls comprised of Base64 values. Therefore, you need a way to generate Base64 values (such as openssl). The API calls also are built using specific application hostnames in your environment. So, you will need to have that information handy.
Let’s start with the base64 conversion process. On a Mac OSX system, you can use openssl to generate the base64 string. For example:
echo -n ‘human readable string’ | openssl base64
For Windows users, there are several tools that you can download and install. That is a pain, so you may be better off using this online encoder.
Now that we have a method to create the base64 values for the API calls, we’ll need to compile a set of values that we can use for our testing. Specifically, we’ll need to create base64 values for the following strings:
String 1: company.com
String 2: company.com/https/ucmpub.company.com/8443
String 3: company.com/http/ucmtftp.company.com/6970
Basic TLS Connection
OK, we are now armed with almost all of the tools we need. The last tool you want to use is a web browser. I use Google Chrome (tested with version 45.0.2454.101). The first step is to confirm that we can establish the basic TLS connection and can authenticate the user through the Edge appliance.
Sticking with our example, the base64 value for String 1 (above) is PW4gY29tcGFueS5jb20K. Now, assume that the Edge appliance hostname is expway-e.company.com. Armed with this information, we can use our web browser to go to the following URL:
If everything is provisioned correctly, your browser should render a login window where you will enter the Jabber user ID and assigned password. Similar to the following:
After a successful login, the browser window should render the XML content that is returned from the Edge appliance. For example:
At this point, we are testing a few things:
A clue is revealed in item 3. The internal DNS SRV records we identified during the Service Discovery phase are used by the Core appliance to do its job. So, if you have misconfigurations on your internal DNS, errors will be seen in the XML response above.
Verify UDS Discovery
The next step in troubleshooting is to verify that your client can communicate with the UDS service on your UCM cluster. To do this we use the base64 value of String 2 (above). Using our example:
The URL we are going to test is:
As with the previous test, a successful transaction will render an XML response. If you are running this after you completed the basic TLS connection then you won’t be challenged for authentication credentials.
If you are receiving a response then UDS is operational. If not then the UDS service on the UCM may be experiencing a problem.
You can also test querying a list of UCM UDS servers:
Verify TFTP Configurations
If the previous validation procedures are successful then you have determined that the Jabber client can communicate to the Edge appliance for the purposes of Service Provisioning. Certificates are validated, credentials are validated, and basic UDS functionality is confirmed.
The next step that a Jabber client would take is to to identify device configurations. As with standard telephony devices, the Cisco TFTP service has configuration files that Jabber can download to retrieve device specifications.
To do this test we use the base64 value of String 2 (above). The URL we will put in our browser to get a list of devices for the Jabber user is:
You may or may not be prompted to authenticate. If you are authenticated then enter the same Jabber user credentials as before. If all goes well, then you will receive a list of devices that are associated to the user in UCM (Edit User pages). For example:
Once you have a list of the devices associated with the user, you can then pull the detail configuration for a specific device. The Jabber client (or DX80 or whatever you are using for the Edge registration) will be able to identify which device configuration to retrieve (by device type). To test this yourself, you will need to look at the “name” child node associated to the device identified as a “Cisco Unified Client Services Framework” model identifier.
To test retrieval of the configuration file via the UCM TFTP service we use the base64 value of String 3 (above). Using our example:
The URL we can test with is:
The “devicename” is the name as provided in the UDS device list query in the previous step. A successful response will provide XML content that provides a complete device configuration file.
If you get to this point then the Core/Edge proxy function is fully tested and functional. Next, we need to verify service registration.
We are now done with the funky base64 strings (yay!). To test basic XMPP and SIP connectivity we are going to dumb things down a bit. We can use telnet from a command prompt to verify connectivity to the appropriate ports.
galactus-2:utils wjb$ telnet expway-e.company.com 5222
Connected to expway-e.company.com.
Escape character is ‘^]’.^]
Connection closed by foreign host.
galactus-2:utils wjb$ telnet expway-e.company.com 5061
Connected to expway-e.company.com.
The fact that we received a “Connected” response means that we were able to connect to the Edge device using port 5222 (XMPP) and port 5061 (SIP/TLS). If you receive a Connection Refused response then you may be running into a firewall issue.
This covers all of the basic connectivity tests that you can use to verify or troubleshoot your MRA implementation. Used in conjunction with event logs and validation tools on the Expressway appliances, you should be well on your way to buttoning this up and calling it a day.
Troubleshooting a Large-Scale CUCM Deployment
Gigamon (and Splunk and Phantom) at NFD16
Apstra at NFD16