Custom LLM
Custom LLM Best Practices
- Prompt Engineering Guide
- Note that for conversational AI, latency is very important, so chaining of multiple LLM calls might not be favorable.
- LLM Benchmark
- Check out latency and throughput. We start streaming at first sentence, so time to first token + throughput of first sentence matters
- Make the response short and concise
- Filler words and some extend of stammer can make agent more humanlike.
- Keep the prompts concise: longer prompts can actually harm performance
- If you have a large knowledge base, consider using RAG to filter out only the relevant information
- When using function calling, set the temperature lower can help boost accuracy
- If you want to bound the agent behaviors, you can consider combining internal states (kind of like IVR tree) with different prompts & functions at different states.
Was this page helpful?