In real-world scenarios, phone calls often face audio quality challenges such as background noise, echo, or other unwanted sounds.
We have a couple different features that are designed to help you handle these challenges.
Remove noise: this is the default mode that removes background noise. This has nearly no distortion to the wavform, thus does not have any impact on the speech to text accuracy. This will not be able to remove loud background speech.
Remove noise + background speech: this is a more aggressive mode that removes both background noise and background speech. This might distort the wavform, so it can result in lower speech to text accuracy in certain cases. This option incurs a $0.005/min surcharge as it requires more processing power.
If your use cases usually have loud background speech, like a TV in the background or a construction site, you can try the remove noise + background speech mode. For regular use cases, we recommend using the remove noise mode for optimal speech to text accuracy.
The denoising mode setting is to combat background speech and noise before the transcription is generated. And even with those in place, there can still be cases of unwanted interruptions. You can configure the interruption sensitivity to reduce these cases.
In the same settings panel
Set the “Interruption Sensitivity” to 0.8
This setting helps the agent reduce false interruptions from background speech and noise
For extremely noisy environments, you may need to experiment with even higher settings. Note that this setting will hinder the ability of your agent getting interrupted, so it’s a trade off you need to make.