In the last guide, you learnt how to set a websocket server up and integrate with our API with a dummy response system. In this guide, you will integrate with a LLM of your choice. The guide contains code snippets for Node.js (with Express.js) / Python (with FastAPI), and for other languages / tech stacks, feel free to adapt the underlying concepts as necessary.

The example repos are currently a bit outdated.

This guide provides a step by step tutorial, the codes are taken from Node.js Express.js Demo / Python FastAPI Demo.

Selecting LLM & LLM best Practices

We start streaming at first sentence, so your response system’s time to first sentence latency (time to first token + time to generate a sentence) is factored into the overall latency, and it’s crucial to have a low latency LLM inference to make the overall experience smooth. Check out LLM Best Practices for tips and tricks.

Connect to your LLM Client

Here we just provide a simple sample of integrating with a LLM provider. Feel free to modify it with all the customization you need, like using different LLM & provider, RAG, internal states, dynamic prompts, etc.

Our github demo repo might have more examples:

Here you are going to replace the dummy class you wrote in the last step with this real LLM. Here we are not doing anything fancy, just a prompt to feed into the LLM. Feel free to customize the prompt to let agents behave differently.

If you are using Azure openAI, you can find the example client class here

If you have your own custom LLM, you can use the examples above to adapt your LLM.

Try it in Dashboard

Now you are connected to a LLM, try it out following the same step from last guide in the dashboard to see it in action.