This guide only applies to cascading agents, if you are using speech to speech models, this feature does not apply.
Transcription modes

- optimize for speed: uses the latest interim results with a low endpointing setting for downstream processing.
- optimize for accuracy: uses the results with a higher endpointing setting for downstream processing, essentially waiting longer with more context to generate more accurate transcripts. It will incur ~200ms latency.
Which mode to use?
From our benchmarking, we found that theoptimize for speed
mode and optimize for accuracy
mode have similar WER (Word Error Rate). The difference mainly lies in capturing entities like numbers, dates. If your use case relies heavily on capturing these entities well, you should use the optimize for accuracy
mode. Otherwise you can use the optimize for speed
mode for best latency.