Zoom Rooms Audio Guidelines
Audio is the most important aspect of your Zoom Room!
Without audio, the meeting cannot happen. In this article, we will discuss many of the concepts and features that are most important when thinking about audio performance in a Zoom Room design.
Before we dive into the technicality of audio process from end to end, please review our article on Acoustics & Audio Concepts. Once the room sounds good, we can proceed into how we capture that experience.
Think about everything that happens from the word being spoken to that word being heard by a participant on the far end. The following are all factors along the way:
- Good Sounding Room (Talker)
- Reduced Reverberance
- Reduced Environmental Noise
- Microphone Quality & Proximity to Source
- Input Processing Quality
- Network Connectivity - Encode
- Network Connectivity - Decode
- Output Processing Quality
- Speaker Quality
- Good Sounding Room (Listener)
This article covers:
- What is Audio?
- Audio Processing Methods
- Signal Processing Types
What is Audio?
Audio is vibration that travels through air that can be perceived when it reaches an ear. In a video conference, a few extra steps are added to that explanation. We represent our ears as a microphone that hears this audio. The Zoom Room takes that audio, processes it if needed, and transmits it over the internet. The audio is turned back into audio waves by speakers so that you can perceive that audio. All of the steps along the way play a role in the perception when it hits the human ear.
Audio Processing Methods
Digital Signal Processors or DSP are audio processors that are software based and may have associated hardware which optimizes audio for different applications. There are two methods for processing audio within a Zoom Room. There are two approaches to audio processing in a Zoom Room:
- Zoom's Software Audio Processing is enabled and the external mic and speaker are independent and unaware of each other.
- The DSP is external to Zoom and all processing and relationship between mic and speaker is completely external and Zoom's Software Audio Processing has been disabled.
If the input and output device are the same, such as a Logitech Rally System, Logitech Meetup, Aver VB342, Polycom Trio, or rack-mounted DSP, that device will handle all of the audio logic that is needed to have an optimal audio experience. Since Zoom is not handling the DSP in this instance, the Software Audio Processing setting should be disabled.
It is important to note that certain devices have been developed to automatically disable Zoom SAP upon selection. If any adjustment is made after the initial setup, Zoom SAP may be automatically enabled, which would not be desired in this situation.
For external DSP designs, please reference:
- Phone Room (1-2 people)
- Huddle (2-7 People)
- ProAV Conference (7-13 People w/table mics)
- ProAV Conference (9-19 People w/ceiling mics)
- ProAV 3-Screen (6-10 people w/speaker tracking camera)
- ProAV All Hands Space (w/ presenter mics only)
- ProAV Training Room/Classroom (w/ ceiling & presenter mics)
- ProAV Divisible Space (w/ ceiling & presenter mics)
Zoom Rooms Software Audio Processing
Here is an overview of Zoom Rooms Software Audio Processing:
There is no need for an external device to do this processing for you if you need to use a mixer or other microphone source that is not integrated with a speaker output. Zoom will do all of the optimization based on adaptive processing to learn the room and optimize the audio. Zoom can hear multiple independent channels of audio in certain applications and apply processing to each channel of audio for an optimized experience. To enable the Zoom software audio processing, on the Zoom Room controller, tap Settings, then Microphone, then tap the Software Audio Processing toggle:
This will be selected automatically whenever the input and output devices differ. In other words, if the mic cannot reference the speaker within the device itself, this can be enabled for echo cancellation and audio optimization.
There is another Zoom SAP setting which will suppress some of the room noise and reverberation. Keep in mind that highly reverberant & noisy rooms will still sound reverberant, but this setting may make it more tolerable with some processing applied to mitigate the issue.
On the ZR Controller, tap Settings, then Microphone, then tap Advanced Noise Suppression:
Then select Moderate, Aggressive or Disabled:
Note: If you select a different speaker such as internal computer speaker and go back to the other speaker that matches the microphone, that may trigger this setting to turn on when it is not wanted.
For Zoom DSP designs, please reference:
- Mobile Cart (2-5 people)
- Collaboration (2-7 people)
- Conference (7-13 People w/table mics)
- Conference (9-19 People w/ceiling mics)
- Broadcast Using a Zoom Room
Here we will discuss some utilities to test your Zoom Rooms environment. It is always recommended to do a test call with at least a couple of peers to hear the space, check each microphone and validate performance.
Test the Speakers
- Tap Settings, then Speaker.
- Tap Test Speaker.
You will hear the Zoom ring tone played through the speakers to verify the output is working.
Test the Microphone
- Tap Settings, then Microphone.
- Tap Test Microphone.
This will start a process of alternating recording and playback so that you can hear the microphones within the room.
Audio Echo Test
Now that we know the inputs and outputs are working, verify that the software audio processing toggle is in the correct location.
- Tap Settings, then Room.
- Tap Start Audio Echo Test.
A progress bar will appear on the Zoom Room controller and display. Tap Cancel at any time to end the test.Once the test is over, the Zoom Room controller and display will show the results of the test.
Also see: Zoom Rooms Daily Audio Testing
Once passed, you are ready to set up your test call with peers to validate the room's performance. Based on feedback, you may need to check firmware, adjust DSP site files, adjust microphone placement, increase microphone counts etc.
Signal Processing Types
For the Zoom Rooms application, there are four key components which we will elaborate on:
- Noise Reduction
- Acoustic Echo Cancellation
- Auto Gain Control
This is the reduction of steady noise such as HVAC or electrical hum. Steady noises are identified by the DSP and reduced at those frequencies that the DSP determines as recurring and inhibitive to the signal. With the steady noise attenuated, speech will be more intelligible as it will pass through the system without reduction.
Note: Noise Reduction will not reduce traffic noise, papers/typing, and most importantly, reverberation. A reverberant room will always sound reverberant both to the ears of the participants in the room as well as the microphones.
Acoustic Echo Cancellation (AEC)
AEC is the removal of your voice that is heard by the microphone on the far end through the speaker on the far end. Here is a diagram that explains the concept for two endpoints:
If AEC is working properly, you will not hear your own voice back in the call. If it is not working, you will hear an echo of your voice that the microphone on that endpoint is picking up and sending back to you.
Note: The endpoint that does not hear the echo is where the issue exists.
Automatic Gain Control
Auto Gain Control or AGC is utilized to deliver the optimum volume to the system depending on the circumstances. The big variation here is people. Some people have loud voices and other people have soft voices. When either is the primary audio source, they will be adjusted up or down. This is something that is automatic within Zoom's DSP and may need to be enabled and configured if it is a feature of an external DSP.
Equalization or EQ is a means to eliminate unwanted frequencies and boost wanted frequencies. Human speech sits in a range from about 250 Hz up to about 6,000 Hz which sits within the range of human hearing which is about 20 Hz up to 20,000 Hz. This means anything between 20 and 250 Hz and 6,000 to 20,000 Hz will be heard if not eliminated and will never be part of the human speech we want to hear.
It is best practice to include a boost around 2,000 to 4,000 Hz to increase intelligibility as this range is the most sensitive to the human ear. Giving this frequency some extra attention will improve intelligibility.
Scooping is another technique that may improve a room based on a frequency that is being emitted in a space or an unwanted reverberation at a specific pitch. By scooping that frequency, performance may be improved in a space. Scooping low-mid frequencies may alleviate some of the resonance in the room.
Join the 100K+ other members in the Zoom Community! Login with your Zoom account credentials and start collaborating.