• Prompt Engineering Guide
    • Note that for conversational AI, latency is very important, so chaining of multiple LLM calls might not be favorable.
  • LLM Benchmark
    • Check out latency and throughput. We start streaming at first sentence, so time to first token + throughput of first sentence matters
  • Make the response short and concise
    • Filler words and some extend of stammer can make agent more humanlike.
  • Keep the prompts concise: longer prompts can actually harm performance
    • If you have a large knowledge base, consider using RAG to filter out only the relevant information
  • When using function calling, set the temperature lower can help boost accuracy
  • If you want to bound the agent behaviors, you can consider combining internal states (kind of like IVR tree) with different prompts & functions at different states.