Normalize text for speech

Language setting

Note: For MiniMax, the normalizeForSpeech parameter is passed directly to MiniMax’s API for server-side normalization. The rest of this documentation applies to other TTS providers where normalization occurs as a pre-processing step. Normalize the some part of text (number, currency, date, etc) to its spoken form for more consistent speech synthesis (sometimes TTS models might read unnormalized text wrong). For example, before starting audio generation, it will convert Call my number 2137112342 on Jul 5th, 2024 for the $24.12 payment to

Call my number two one three seven one one two three four two on july fifth, twenty twenty four for the twenty four dollars twelve cents payment

Note that this feature adds a bit of latency (~100ms) to the whole process.

Language setting

Currently, for non-MiniMax TTS providers, speech normalization is supported for the following languages:

English
Spanish
French
German

Other languages will result in a no-op (text not modified) for this step. If you selected a language that’s not multilingual, it will use that language code to normalize the text (e.g. 1 will be normalized to one if using English). If you selected multilingual, it will auto detect the language based on the generated text and normalize it accordingly.

Add custom pronunciation Add backchannel

⌘I

Get Started

Build

Test

Deploy

Monitor

Reliability & Debugging

Accounts and Workspace

Other Topics

Integrations

Network Ecosystem

AI Quality Assurance

Normalize text for speech

Language setting

Get Started

Build

Test

Deploy

Monitor

Reliability & Debugging

Accounts and Workspace

Other Topics

Integrations

Network Ecosystem

AI Quality Assurance

​Language setting

Language setting