SpeakerTrack 60 Overview

I think SpeakerTrack 60 has surprised a lot of people, especially me. I was a little cautious of this type of technology at first, mainly because of previous experience with it competitors. They had a lot of zooming in and out going on which makes you feel a little nauseated. SpeakerTrack 60 has changed at least my perception of what this type of technology can do and it takes what once was a average video conference room and adds a more immersive experience.

If you haven't seen SpeakerTrack 60 before take a look at the video below.

SpeakerTrack 60

If you noticed in the video SpeakerTrack 60 uses direct switching between active speakers versus the Polycom Eagle Eye Director demo below which between speakers there is an establishing shot of the entire room as the camera changes presenters. You will also notice that the camera moves between speakers when closely seated next to one another to center the speaker with Polycom. SpeakerTrack 60 on the other hand doesn’t move when both speakers are already in frame, there is not much point really. These seem like subtle differences but after experiencing both in meetings and demo’s the differences can add up to two very different experiences.

Polycom Eagle Eye Director

I am sure there are those that will comment on Cisco being a little late to the game with this type of technology and that is a true statement. The benefit of coming in a little later is the technology to drive this experience is much better now.Things like 4k camera sensors, magnetic motors etc. all come together to drive a much cleaner experience.  In my opinion at least it’s a much cleaner experience with direct switching versus the others in the market.

How does SpeakerTrack 60 work?


There are four parts to how SpeakerTrack 60 does what it does.

  • Audio triangulation - The microphone array behind the fabric panel that is position behind the camera pictured above is able to accurately locate voices within the room. The microphones are only used for audio triangulation .
  • Facial detection - Identification of a full or partial face at the same location as the voice is required to form a positive match.The camera quickly locates a close- up of the active speaker while the other gets ready to seek and display the next active speaker.
  • Camera control - With a positive match, the processor in the camera base instructs the cameras directly where to move.
  • Camera switching - The processor in the camera base instructs the codec which camera to use. The codec does the actual camera switching.

To get more of a feel for how this works SpeakerTrack 60 has a diagnostic mode that shows how this works. Below is a screenshot of diagnostic mode turned on and a face is detected but no audio match.


Now the engineer is speaking below, and the green square indicates a positive match for facial detection and voice. Therefore the system will zoom in to this person.

The indicators down the bottom of the screen match up with the following:

F = 10.1% detected voice
T = 91.6% non-noise
E = 0% voice from far end
C = 28.1% camera movement
U = 0% ultrasound detected
N = 89.9% silence
S = 178 samples from sound algorithm


The SpeakerTrack functions are turned on and off at the touch panel and when not in use both cameras can be used as normal cameras.

Hopefully this helps explain the technology a little and some of the differences from its competitors.



  1. Good technical review however the Polycom EagleEye Director solution has had a 'direct cut' mode as an option since March this year as demonstrated in this YouTube video https://www.youtube.com/watch?v=0fseLGxGiNo

  2. Hi VoIPNorm. Good article! How did you enable SpeakerTrack diagnostic mode? We are having some issues with SpeakerTrack not panning to participants in the far corner of the room (the cameras can be manually positioned there though) and thought maybe this mode might divulge some clues. Thank you.