> ## Documentation Index
> Fetch the complete documentation index at: https://docs.retellai.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Custom LLM Best Practices

> Best practices for using a custom LLM with Retell — keep latency low, write concise responses, avoid chained calls, and benchmark first-token throughput.

* [Prompt Engineering Guide](https://www.promptingguide.ai/)
  * Note that for conversational AI, latency is very important,
    so chaining of multiple LLM calls might not be favorable.
* [LLM Benchmark](https://artificialanalysis.ai/models)
  * Check out latency and throughput. We start streaming at first sentence, so time to first token +
    throughput of first sentence matters
* Make the response short and concise
  * Filler words and some extent of stammer can make agent more humanlike.
* Keep the prompts concise: longer prompts can actually harm performance
  * If you have a large knowledge base, consider using RAG to filter out only the relevant information
* When using function calling, setting the temperature lower can help boost accuracy
* If you want to bound the agent behaviors, you can consider combining internal states (kind of like IVR tree)
  with different prompts & functions at different states.