- Speech recognition — which language the agent transcribes the caller from.
- Voice pronunciation — the language the voice uses to pronounce words and shape its accent.
- Agent text — the agent is automatically instructed to respond in the configured language. You do not need to add a “respond in X” instruction to your prompt.
Pick a single language (recommended)
A single-language agent is the most accurate setup: the agent transcribes in one language only, the voice speaks with that language’s pronunciation, and the agent always responds in that language — no language detection involved.
Need more than one language?
Configure a multilingual agent
For agents that serve callers in different languages, see the multilingual guide.