Custom LLM Best Practices

Prompt Engineering Guide
- Note that for conversational AI, latency is very important, so chaining of multiple LLM calls might not be favorable.
LLM Benchmark
- Check out latency and throughput. We start streaming at first sentence, so time to first token + throughput of first sentence matters
Make the response short and concise
- Filler words and some extend of stammer can make agent more humanlike.
Keep the prompts concise: longer prompts can actually harm performance
- If you have a large knowledge base, consider using RAG to filter out only the relevant information
When using function calling, set the temperature lower can help boost accuracy
If you want to bound the agent behaviors, you can consider combining internal states (kind of like IVR tree) with different prompts & functions at different states.