Check out latency and throughput. We start streaming at first sentence, so time to first token +
throughput of first sentence matters
Make the response short and concise
Filler words and some extend of stammer can make agent more humanlike.
Keep the prompts concise: longer prompts can actually harm performance
If you have a large knowledge base, consider using RAG to filter out only the relevant information
When using function calling, set the temperature lower can help boost accuracy
If you want to bound the agent behaviors, you can consider combining internal states (kind of like IVR tree)
with different prompts & functions at different states.