How a Lync client finds its secondary registrar is a very important topic. After some research and reading an excellent blog by Doug Deitterick on Technet about the process there are some very clear best practices that standout to me. The reason I bring it up here is that my opinion around using a Director for secondary registrar discovery is slightly different to Doug’s. There is a more critical component than the Director for this function. Let me explain.
Let’s take a quick recap at how the process works. The diagram below give’s a good outline of how this works when you have a Director pool in place.
The most important functions related to our discussion are the prioritized DNS SRV responses and the SIP 301 redirect message that is sent back to the client. The DNS SRV records point to the Director as the primary logon and authentication pool with a alternate pool as a backup. The 301 message contains the Primary and secondary registrar. Both of these items are very important to the logon process. Firstly should the Director pool (or any pool related to the DNS SRV record) be unavailable you have an alternative to actually logon and secondly once you have authenticated you can learn where your primary and secondary registrar is located for failover.
It is best practice that you have prioritized DNS SRV records regardless of a Director. Every registrar in your deployment has the ability to use the 301 redirect message process including the SBA. The only time you do not receive your backup registrar information via a 301 is when you log directly onto your primary registrar. Once you have successfully logged onto your primary registrar and it is cached by the client (using the EndpointConfiguration.cache file) unless there are issues, you are very unlikely to receive a 301 message. As Doug points out EndpointConfiguration.cache doesn’t cache your secondary registrar. So in other words the Director is little help after you successfully logon for the first time.
I did a small experiment in my home lab. I started up my Lync client and as it had a cached primary registrar it did not perform the DNS SRV record lookup and it logged on using cached information. I then stopped the frontend service on my Lync Server. The Lync client then preformed the DNS SRV lookup to find an alternate location to logon. Once it received the alternate SRV records it was been able to locate an alternate place to logon. It then received a SIP 301 redirect. Had I actually had a real alternate SE running in my environment it would have logged in there as its secondary registrar after referring to the SIP 301 redirect secondary information.
So how would you create your DNS SRV records ? There two options:
1. Deploy a Director with prioritized DNS SRV records that point to the Director(s) as the first choice and a pool (whether it be a SE or EE pool) in another datacenter as your second choice.
2. With no Director(s) point your prioritized DNS SRV records at the pools that make sense but are most likely located in different datacenters.
Doug’s blog post does make a great point that you will not always get backup registrar information every time you logon but its regardless of whether you are using a Director or not because of the cached primary registrar. This is why it is important to ensure your DNS SRV records are correctly configured when using data center voice resiliency features, so don’t leave home without them.
Comments welcomed.
VoIPNorm
Another Great article on this. I tested similar scenario without having any DNS SRV record for pools except director pool DNS SRV record. I created two SE pool and one director pool, I published director pool DNS Srv record. once client logged on to SE pool 1 via director, I noticed that it got primary and backup registrar information. I stop services on SE pool 1. my client logged on to another pool (within few minutes with red pop up), my assumption is that without having DNS SRV, if your director pool is up, client will stil find out secondary registrar information from director at time of primary registrar failure. makes sense?
ReplyDeleteThanks,
Sachin Desai
Just changing my comment slightly after a reread.
ReplyDeleteThe problem with what you tested is that it was your first logon. Now that your client has cached your primary registrar it wont go back to using the director to logon and will infact go straight to the your SE. To see this bahaviour you will need to monitor DNS requests on the client when you log in and out. You will notice it wont do a DNS SRV lookup or logon on to the director. It will only perform a DNS SRV lookup if there is a failure of the cached primary registrar. So this means it no longer uses the Director to logon and recieve a redirect.Also you have to have an alternate when the director(s) goes down and this can only be done with SRV records. In the end you dont need a director for this functionality and DNS SRV records is the only way to ensure you have a backup to logon and get the SIP 301 redirect.
Yes the director does help but it is not a requirement to make this feature work which is the point I am trying to get across
Thanks, Great. have you done any testing for external users? I always wanted to find out below. Lets assume that there are two data centers with two Lync pools (Pool A and Pool B) both pools (A & B) has their own external edge servers (edge pool A and edge Pool B).
ReplyDeletePublish DNS SRV record with different priority, edge Pool A has lower value than edge pool B. There is a external user hosted on internal pool B, it will use always edge pool A to logon to Lync infra. in case of edge pool A goes down, will external user failover to edge pool B? ( I assume that it is yes - if certificate are provision correctly). I would like to take opinion on this.
f both edge pools and internal pools are up, what would SIP/Audio call flow once external user hosted on pool B logon using external edge pool A? would it use its own edge pool 02 for SIP/Audio/video traffic?
Thanks,
Sachin Desai
I have the same questions about Edge failover ability. In previous versions of OCS you could have only one access edge pool due to dns load balancing not being an option.
ReplyDeleteWhat is the guidance around building resilient edge pools which span geographic locations?
Cheers!
....CUCiMOC, what a crock!
When your external your client will use DNS to sign in and yes it will use a lower priority SRV record but it is not necessarily the edge you will use for media. After signing in your client will discovery what edge you will use for media no matter what edge you sign in from. It depends on what pool your pool configuration. SIP signaling will however still use the same edge you signed in on.
ReplyDeleteHi Chris,
ReplyDeleteIf I have 2 SE pools and 2 central sites. Each site has it's own edge servers. Can i still configure primary and backup registrars? How would the DNS Load balancing be like for the edge? Can I failover from a SE pool to another SE pool? In your post on Jan 20, you said the user will sign in to the lower priority edge first.Is it applicable to 2 separate pools?
I am curious about this as well. We have a west coast datacenter with edge, and plan to roll out a 2nd datacenter on the east coast with its own edge. Branch offices will connect to whichever edge server and datacenter makes the most sense latency-wise, but we would like them to be able to fail over to the other in case one datacenter goes down. How would we do this?
ReplyDeleteWhat about the mediation servers running on the FE pool that goes down? How do we get the PSTN gateways that use the failed mediation over to the Backup registrar?
ReplyDeleteHi Eric,
ReplyDeleteMost gateways have the ability to perform a hunt group type function to send calls to an alternate Mediation Server. Cisco ISR's as an example you can set a preference on which dial peer you use in what order so if a Mediation Server goes down you can route to a different server.
Cheers
Chris
Does this resiliency only works for Voice?
ReplyDeleteWe have 2 pools, one in NA and other in AP. In NA pool, if I put AP pool as a backup registrar pool and vice versa....and users will only use IM, Presence, and Web Conferencing...so will this feature works and provice resileincy to the users.
Thanks
Harry