POST
/
create-agent
curl --request POST \
  --url https://api.retellai.com/create-agent \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json'
{
  "agent_id": "oBeDLoLOeuAbiuaMFXRtDOLriTJ5tSxD",
  "llm_websocket_url": "wss://your-websocket-endpoint",
  "agent_name": "Jarvis",
  "voice_id": "11labs-Adrian",
  "voice_temperature": 1,
  "voice_speed": 1,
  "responsiveness": 1,
  "interruption_sensitivity": 1,
  "enable_backchannel": true,
  "backchannel_frequency": 0.9,
  "backchannel_words": [
    "yeah",
    "uh-huh"
  ],
  "reminder_trigger_ms": 10000,
  "reminder_max_count": 2,
  "ambient_sound": "coffee-shop",
  "language": "en-US",
  "webhook_url": "https://webhook-url-here",
  "boosted_keywords": [
    "retell",
    "kroger"
  ],
  "opt_out_sensitive_data_storage": true,
  "pronunciation_dictionary": [
    {
      "word": "actually",
      "alphabet": "ipa",
      "phoneme": "ˈæktʃuəli"
    }
  ],
  "normalize_for_speech": true,
  "last_modification_timestamp": 1703413636133
}

Authorizations

Authorization
string
headerrequired

Authentication header containing API key (find it in dashboard). The format is "Bearer YOUR_API_KEY"

Body

application/json
llm_websocket_url
string
required

The URL we will establish LLM websocket for getting response, usually your server. Check out LLM WebSocket for more about request format (sent from us) and response format (send to us).

agent_name
string | null

The name of the agent. Only used for your own reference.

voice_id
string
required

Unique voice id used for the agent. Find list of available voices and their preview in Dashboard.

voice_temperature
number

Controls how stable the voice is. Value ranging from [0,2]. Lower value means more stable, and higher value means more variant speech generation. Currently this setting only applies to 11labs voices. If unset, default value 1 will apply.

voice_speed
number

Controls speed of voice. Value ranging from [0.5,2]. Lower value means slower speech, while higher value means faster speech rate. If unset, default value 1 will apply.

responsiveness
number

Controls how responsive is the agent. Value ranging from [0,1]. Lower value means less responsive agent (wait more, respond slower), while higher value means faster exchanges (respond when it can). If unset, default value 1 will apply.

interruption_sensitivity
number

Controls how sensitive the agent is to user interruptions. Value ranging from [0,1]. Lower value means it will take longer / more words for user to interrupt agent, while higher value means it's easier for user to interrupt agent. If unset, default value 1 will apply. When this is set to 0, agent would never be interrupted.

enable_backchannel
boolean

Controls whether the agent would backchannel (agent interjects the speaker with phrases like "yeah", "uh-huh" to signify interest and engagement). Backchannel when enabled tends to show up more in longer user utterances. If not set, agent will not backchannel.

backchannel_frequency
number

Only applicable when enable_backchannel is true. Controls how often the agent would backchannel when a backchannel is possible. Value ranging from [0,1]. Lower value means less frequent backchannel, while higher value means more frequent backchannel. If unset, default value 0.8 will apply.

backchannel_words
string[]

Only applicable when enable_backchannel is true. A list of words that the agent would use as backchannel. If not set, default backchannel words will apply. Check out backchannel default words for more details. Note that certain voices do not work too well with certain words, so it's recommended to expeirment before adding any words.

reminder_trigger_ms
number

If set (in milliseconds), will trigger a reminder to the agent to speak if the user has been silent for the specified duration after some agent speech. Must be a positive number. If unset, default value of 10000 ms (10 s) will apply.

reminder_max_count
integer

If set, controls how many times agent would remind user when user is unresponsive. Must be a non negative integer. If unset, default value of 1 will apply (remind once). Set to 0 to disable agent from reminding.

ambient_sound
enum<string> | null

If set, will add ambient environment sound to the call to make experience more realistic. Currently supports the following options:

Set to null to remove ambient sound from this agent.

Available options:
coffee-shop,
convention-hall,
summer-outdoor,
mountain-outdoor,
static-noise
language
enum<string>

Specifies what language (and dialect) the speech recognition will operate in. For instance, selecting en-GB optimizes speech recognition for British English. If unset, will use default value en-US.

Available options:
en-US,
en-IN,
en-GB,
de-DE,
es-ES,
es-419,
hi-IN,
ja-JP,
pt-PT,
pt-BR,
fr-FR
webhook_url
string | null

The webhook for agent to listen to call events. See what events it would get at webhook doc. If set, will binds webhook events for this agent to the specified url, and will ignore the account level webhook for this agent. Set to null to remove webhook url from this agent.

boosted_keywords
string[] | null

Provide a customized list of keywords to bias the transcriber model, so that these words are more likely to get transcribed. Commonly used for names, brands, street, etc.

opt_out_sensitive_data_storage
boolean

Whether this agent opts out of sensitive data storage like transcript, recording, logging. These data can still be accessed securely via webhooks. If not set, default value of false will apply.

pronunciation_dictionary
object[] | null

A list of words / phrases and their pronunciation to be used to guide the audio synthesize for consistent pronunciation. Currently only supported for English & 11labs voices. Set to null to remove pronunciation dictionary from this agent.

normalize_for_speech
boolean

If set to true, will normalize the some part of text (number, currency, date, etc) to spoken to its spoken form for more consistent speech synthesis (sometimes the voice synthesize system itself might read these wrong with the raw text). For example, it will convert "Call my number 2137112342 on Jul 5th, 2024 for the $24.12 payment" to "Call my number two one three seven one one two three four two on july fifth, twenty twenty four for the twenty four dollars twelve cents payment" before starting audio generation.

Response

201 - application/json
agent_id
string
required

Unique id of agent.

llm_websocket_url
string
required

The URL we will establish LLM websocket for getting response, usually your server. Check out LLM WebSocket for more about request format (sent from us) and response format (send to us).

agent_name
string | null

The name of the agent. Only used for your own reference.

voice_id
string
required

Unique voice id used for the agent. Find list of available voices and their preview in Dashboard.

voice_temperature
number

Controls how stable the voice is. Value ranging from [0,2]. Lower value means more stable, and higher value means more variant speech generation. Currently this setting only applies to 11labs voices. If unset, default value 1 will apply.

voice_speed
number

Controls speed of voice. Value ranging from [0.5,2]. Lower value means slower speech, while higher value means faster speech rate. If unset, default value 1 will apply.

responsiveness
number

Controls how responsive is the agent. Value ranging from [0,1]. Lower value means less responsive agent (wait more, respond slower), while higher value means faster exchanges (respond when it can). If unset, default value 1 will apply.

interruption_sensitivity
number

Controls how sensitive the agent is to user interruptions. Value ranging from [0,1]. Lower value means it will take longer / more words for user to interrupt agent, while higher value means it's easier for user to interrupt agent. If unset, default value 1 will apply. When this is set to 0, agent would never be interrupted.

enable_backchannel
boolean

Controls whether the agent would backchannel (agent interjects the speaker with phrases like "yeah", "uh-huh" to signify interest and engagement). Backchannel when enabled tends to show up more in longer user utterances. If not set, agent will not backchannel.

backchannel_frequency
number

Only applicable when enable_backchannel is true. Controls how often the agent would backchannel when a backchannel is possible. Value ranging from [0,1]. Lower value means less frequent backchannel, while higher value means more frequent backchannel. If unset, default value 0.8 will apply.

backchannel_words
string[]

Only applicable when enable_backchannel is true. A list of words that the agent would use as backchannel. If not set, default backchannel words will apply. Check out backchannel default words for more details. Note that certain voices do not work too well with certain words, so it's recommended to expeirment before adding any words.

reminder_trigger_ms
number

If set (in milliseconds), will trigger a reminder to the agent to speak if the user has been silent for the specified duration after some agent speech. Must be a positive number. If unset, default value of 10000 ms (10 s) will apply.

reminder_max_count
integer

If set, controls how many times agent would remind user when user is unresponsive. Must be a non negative integer. If unset, default value of 1 will apply (remind once). Set to 0 to disable agent from reminding.

ambient_sound
enum<string> | null

If set, will add ambient environment sound to the call to make experience more realistic. Currently supports the following options:

Set to null to remove ambient sound from this agent.

Available options:
coffee-shop,
convention-hall,
summer-outdoor,
mountain-outdoor,
static-noise
language
enum<string>

Specifies what language (and dialect) the speech recognition will operate in. For instance, selecting en-GB optimizes speech recognition for British English. If unset, will use default value en-US.

Available options:
en-US,
en-IN,
en-GB,
de-DE,
es-ES,
es-419,
hi-IN,
ja-JP,
pt-PT,
pt-BR,
fr-FR
webhook_url
string | null

The webhook for agent to listen to call events. See what events it would get at webhook doc. If set, will binds webhook events for this agent to the specified url, and will ignore the account level webhook for this agent. Set to null to remove webhook url from this agent.

boosted_keywords
string[] | null

Provide a customized list of keywords to bias the transcriber model, so that these words are more likely to get transcribed. Commonly used for names, brands, street, etc.

opt_out_sensitive_data_storage
boolean

Whether this agent opts out of sensitive data storage like transcript, recording, logging. These data can still be accessed securely via webhooks. If not set, default value of false will apply.

pronunciation_dictionary
object[] | null

A list of words / phrases and their pronunciation to be used to guide the audio synthesize for consistent pronunciation. Currently only supported for English & 11labs voices. Set to null to remove pronunciation dictionary from this agent.

normalize_for_speech
boolean

If set to true, will normalize the some part of text (number, currency, date, etc) to spoken to its spoken form for more consistent speech synthesis (sometimes the voice synthesize system itself might read these wrong with the raw text). For example, it will convert "Call my number 2137112342 on Jul 5th, 2024 for the $24.12 payment" to "Call my number two one three seven one one two three four two on july fifth, twenty twenty four for the twenty four dollars twelve cents payment" before starting audio generation.

last_modification_timestamp
integer
required

Last modification timestamp (milliseconds since epoch). Either the time of last update or creation if no updates available.