Integrating AI with domain-specific knowledge involves setting up LLM WebSocket. Our API manages the acoustic interactions, while your LLM (or any other response systems) adds domain expertise. This setup allows our system to communicate directly with your server via WebSocket.

In this guide, you will see a step by step walkthrough how to set a websocket server up and integrate with our API with a dummy response system (don’t worry, we’ll cover how to connect to LLM in next section). The guide contains code snippets for Node.js (with Express.js) / Python (with FastAPI), and for other languages / tech stacks, feel free to adapt the underlying concepts as necessary.

The example repos are currently a bit outdated.

This guide provides a step by step tutorial, the codes are taken from Node.js Express.js Demo / Python FastAPI Demo.

Incoming requests by only allowlist these Retell IP addresses: 100.20.5.228

Understanding WebSockets

Unlike the request-response model of HTTPS, WebSockets maintain an open connection between the client and server. This facilitates two-way message exchange without needing to reestablish connections, enabling faster data streaming. For more details on WebSockets, check out this blog and Websocket API Doc.

Understand Communication Protocol

We have defined this protocol that our server would communicate with your server in. We recommend reading this first before following the guide.

Generally, the protocol requires:

  • Your server to send the first message: send empty response to let user speak first.
  • We will send live transcripts to your server, and expect responses when we need to.
  • You will stream what you want your agent to say to our server, and we will speak it out.

Step 1: Add a basic websocket endpoint to your server

In this step, you will add a basic websocket endpoint to your express server to receive message.

If you already have a server up and running, you can add the following code next to your other routes.

Using postman, You can send websocket call to your localhost. First click “Connect”, then enter “Hello” in Message tab and click “Send”.

You should be able to receive the message in your server

Step 2: Create a Dummy Response System

In this step, You will not connect with your LLM yet. Instead, let’s just build a dummy response system who can greet with “How may I help you?”, and reply every users’ questions with “I am sorry, can you say that again?”.

Don’t worry about the dumb agent, we will connect your LLM and make it smart later.

Update your websocket endpoint. After receiving “message” event, you will call llmClient.DraftResponse() to get response.

Step 3: Test your basic agent on Dashboard

At this point, you are ready to make your basic agent speak in the dashboard.

  1. If you deploy your server, you can get a url using your domain: wss://your_domain_name/llm-websocket/

  2. If you want to test your code locally, you can use ngrok to generate a production url forwarding requests to your local endpoints. You can watch this video to learn how to do that. After getting your ngrok url, you will have a url wss://xxxxx.ngrok-free.app/llm-websocket/

Add either the ngrok url or your production url into the dashboard

Click “Make a web call” and you should be able to hear the agent talking. It will greet with “How may I help you?”, and reply every users’ questions with “I am sorry, can you say that again?”.

Congrats! You just connect your websocket to our server. Let’s connect to your LLM to make the agent smarter.

If you still cannot hear the agent talking, check out our troubleshooting guide