Reduce Latency
Documents what can be done to reduce latency in your agent.
Latency is a core problem in conversational AI, and for some use cases it’s important to keep it as low as possible to ensure a smooth conversation. This doc covers what configuration impacts latency, and what can be done to reduce latency in your agent.
What Impacts Latency
There’re a few factors that impact latency in your agent:
LLM Response Time
The time it takes for LLM to generate a response. This is usually the biggest factor in latency.
To minimize latency here, you can choose a faster (smaller) LLM model like gpt-3.5-turbo
/ claude-haiku
. We do notice
that sometimes with heavier traffic, the model provider’s API gets slower and have more variations on the latency.
Longer prompts (including tool calls) also lead to longer response time, so try to keep the prompt short and concise.
Audio Generation Time
The time it takes to generate audio from the text response varies. The choice of TTS provider impacts this.
Here’s a general latency comparison of different TTS providers:
- 11 labs: ~450ms
- OpenAI: ~650ms
- Deepgram: ~300ms
Language
Laguage with more traffic (en, es, multi) will have better latency compared to other languages.
Responsiveness
This controls how responsive the agent is. If this is set to a lower value, the agent tends to wait longer before speaking, which can increase latency.
Features that Add Latency
Some features we provide will add some latency to the whole process, due to the need of additional processing. If you want to reduce latency, you might want to avoid using these features.
Audio Speed Adjustment
This feature controls how fast the agent speaks. This would require ~50ms of additional processing time.
Normalize Text for Speech
This feature converts numbers, dates, and other entities into spoken form for more consistent speech synthesis. This would require ~75ms of additional processing time.
Boosted Keywords & Disable Transcript Formatting
These features will add around 300-500ms of additional processing time.
V1 vs V2 APIs
V2 APIs are generally faster than V1 APIs, as more optimizations are baked in.
Network
Our servers are deployed mostly in US West. Locating resources closer to our servers can reduce latency.
Was this page helpful?