Orchestration Overview

On this page

Building Blocks
Our Orchestration Solution

Retell provides a sophisticated voice AI orchestration layer that seamlessly integrates frontier audio technologies to create natural, responsive voice interactions optimized for phone call conditions.

Building Blocks

Traditional voice AI systems consist of three core components:

Speech-to-Text (STT): Converts spoken words into written text
Large Language Model (LLM): Processes and generates contextual responses
Text-to-Speech (TTS): Converts text responses into natural speech

Recently, Speech-to-Speech (S2S) models have emerged as a new building block in the voice AI stack. Specifically it is capable of understanding audio input and generating audio output without needing the step of generating text first. However, simply utilizing these building blocks often results in high-latency, unnatural interactions, easily interrupted by background noise, and lacks critical capabilities.

Our Orchestration Solution

Retell’s orchestration layer solves the challenges in optimizing real-time operations, managing scalable infrastructure, and ensuring human-like conversations. It organizes and connects following systems:

Audio Models
- Helps manage and scales building blocks mentioned in the previous section, with no need to worry about rate limit and latency
- Multiple choices of models and providers to meet different use cases
- Advanced status check and automatic fallback mechanisms to ensure minimal disruption to calls
- Unified configuration options, providing flexibility with ease of use
- Security and compliance propogated down to every underlying provider
Noise Management
- Advanced streaming background noise filtering
- Echo cancellation
Intelligent Endpointing & Turn-taking
- Precise detection of speech completion
- Context-aware turn-taking with configurable thresholds
Dynamic Interruption Handling
- Graceful handling of mid-conversation interruptions
- Adaptive response timing based on user speech patterns
- Configurable interruption sensitivity levels
Reminders & Backchanneling
- Reminders for when user is not responding
- Backchanneling to keep the conversation engaging and natural
Background Sound
- Background sound to create a more natural calling experience
Telephony Features
- Telephony features like voicemail detection, call transfer, press digits (DTMF)
- integrates seamlessly via function calling
End-call Criteria
- End-call as a function call, or end when user is not responding
- Maximum call time to ensure no outstanding charges

Get Started

Build

Test

Deploy

Monitor

Reliability & Debugging

Accounts and Workspace

Advanced Topics

Network Ecosystem

Orchestration Overview

Building Blocks

Our Orchestration Solution

Get Started

Build

Test

Deploy

Monitor

Reliability & Debugging

Accounts and Workspace

Advanced Topics

Network Ecosystem

​Building Blocks

​Our Orchestration Solution

Building Blocks

Our Orchestration Solution