normalizeForSpeech parameter is passed directly to MiniMax’s API for server-side normalization. The rest of this documentation applies to other TTS providers where normalization occurs as a pre-processing step.
Normalize the some part of text (number,
currency, date, etc) to its spoken form for more
consistent speech synthesis (sometimes TTS models might read unnormalized text wrong).
For example, before starting audio generation, it will convert
Call my number 2137112342 on Jul 5th, 2024 for the $24.12 payment
to
Call my number two one three seven one one two three four two on july fifth, twenty twenty four for the twenty four dollars twelve cents payment
Note that this feature adds a bit of latency (~100ms) to the whole process.
Language setting
Currently, for non-MiniMax TTS providers, speech normalization is supported for the following languages:- English
- Spanish
- French
- German