Interaction Overview Diagram
The upper part of the diagram is interaction between your backend response generating server and Retell server.
-
A phone or web call is made with the AI agent. Our server establishes the
audio WebSocket. -
Our server will connect with
llm_websocket_urlyou provided in the agent. - Your LLM server needs to send the message upon the WebSocket connection is ready. If you want the agent to speak first, set the content; otherwise, set content to empty string.
- User says, “My name is Mike”.
- Our model detects a high chance of turntaking, or the user pauses, we request a response from your LLM.
-
Your server checks for
interaction_typein our json. If it isresponse_required, you need to send the response. After receiving your response, we have our model check if AI should speak - User continues and says “My name is Mike Trump”.
- Same as step 3
- Our server receives the response from your LLM and decides to speak
-
We send the AI voice in the
audio websocket. Meanwhile, we will send you json withinteraction_typeasupdate_only. You don’t need to update but you can get the transcript from the json body.
Example Custom LLM Demo Repositories
Fork the complete code used in the following guides to follow along to integrate your custom LLM solutions. These demo repos show how to build an LLM solution withopenai / azure openai, how to start an LLM websocket server,
and how to use Twilio to make phone calls with Retell agents programmatically.
- Backend Server:

