Skip to main content
We have enforced some constraints and limitations to ensure the smooth operation of your agents, and prevent any misuse of the service. Note that these constraints can be adjusted based on your operational needs, on a case-by-case basis.

Concurrency

Concurrency refers to the number of simultaneous active voice calls that can be handled by your system at any given moment. For example, if 15 users are engaged in voice calls with your agents at the same time, that counts as 15 concurrent calls. Concurrency limits apply per workspace, not per account. Each workspace has its own quota, burst settings, and reserved inbound capacity, and traffic in one workspace does not consume slots in another. Pay-As-You-Go workspaces are allocated a quota of 20 concurrent calls by default. If your operational needs require more concurrency, you can adjust your limit from the dashboard. See Manage limits in the dashboard below. You can check your current number of concurrent calls in the dashboard.
  • Handling Multiple Calls per Agent: You don’t need to create multiple agents to manage multiple calls concurrently. Each agent within your plan is capable of handling an unlimited number of calls, provided that the total concurrency remains within your designated quota. This means you can efficiently manage your workload without unnecessary agent duplication.

Reserved Inbound Concurrency

Reserved inbound concurrency protects inbound calls from being crowded out by outbound traffic. When reserved_inbound_concurrency is configured, outbound calls can use at most your concurrency limit minus the reserved amount. Inbound calls can still use the full concurrency limit when capacity is available. For example, if your concurrency limit is 100 and reserved inbound concurrency is 20:
  • Outbound calls can use up to 80 slots.
  • Inbound calls can use the reserved 20 slots, plus any other available slots up to the full 100.
You can check the configured value with the Get Concurrency API. Reserved inbound concurrency must be lower than your standard concurrency limit.

Inbound Queue and Fallback

When inbound call traffic reaches your concurrency limit, Retell briefly keeps new inbound calls waiting for an available slot. If a slot opens, the inbound call proceeds. If no slot opens after about 40 seconds, Retell handles the call as follows:
  1. If the phone number has a fallback_number configured, Retell transfers the caller to that number.
  2. If there is no fallback number, or the fallback transfer fails, the call ends with concurrency_limit_reached.
  3. If the fallback transfer succeeds, the Retell call record ends with no_concurrency_fallback.

Concurrency Burst

Concurrency Burst allows you to temporarily exceed your standard concurrency limit during peak demand periods. When enabled, calls that would normally be rejected due to hitting your concurrency limit will instead be allowed to proceed with an additional surcharge.

How It Works

When concurrency burst is enabled:
  1. Normal calls: Calls within your standard concurrency limit proceed as usual with no additional charges
  2. Burst calls: Calls that exceed your normal limit (but stay within the burst limit) will proceed with an additional $0.10/min surcharge applied to the entire call duration

Burst Limit Calculation

Your burst limit is calculated as the lower of:
  • 3× your concurrency limit, OR
  • Your concurrency limit + 300
For example:
  • If your limit is 50, burst allows up to 150 concurrent calls (3 × 50 = 150)
  • If your limit is 200, burst allows up to 500 concurrent calls (200 + 300 = 500, which is less than 3 × 200 = 600)

Enabling Concurrency Burst

You can enable or disable concurrency burst from the Settings > Limits page in your dashboard.
Concurrency Burst Settings

Pricing

Call TypeAdditional Cost
Normal (within standard limit)No additional charge
Burst (above standard limit)$0.10/min for the entire call duration
The burst surcharge applies to the entire duration of any call that started while in burst mode, not just the portion of time spent above the normal limit.

Use Cases

Concurrency burst is ideal for:
  • Unpredictable traffic spikes: Handle sudden increases in call volume without rejected calls
  • Campaign launches: Support higher-than-normal call volumes during marketing campaigns
  • Seasonal peaks: Manage increased demand during busy periods without permanently upgrading your concurrency limit
While concurrency burst provides flexibility, consistent high usage above your normal limit may indicate a need to increase your base concurrency allocation for cost efficiency.

Manage limits in the dashboard

You can view and adjust your concurrency and CPS limits from the Settings > Limits page.
Settings Limits page with the Concurrent Calls Limit card and CPS cards
Adjusting concurrency and CPS limits is available to the Admin and Developer roles. Reserving inbound capacity and toggling concurrency burst change workspace settings and require the Admin role. See Access Control for details.

Adjust your concurrency limit

On the Concurrent Calls Limit card, click Adjust Concurrency. The dialog shows your current limit and how high you can go.
Adjust Concurrency dialog

Reserve inbound capacity

On the same card, click Reserve Inbound Capacity to set how many slots are held for inbound calls (see Reserved Inbound Concurrency above). The remainder is available to outbound and web calls.
Reserve Inbound Capacity dialog showing the inbound and outbound split

Adjust CPS (calls per second)

CPS is how quickly new calls can be started, set per telephony path. The Limits page has a card for Telnyx CPS, Twilio CPS, and Custom Telephony CPS. Click Adjust Limit on a card to change that provider’s CPS; each has its own allowed range. Custom Telephony CPS scales with your concurrency, so a higher CPS there may require more concurrency.
Adjust CPS Limit dialog (Telnyx shown; the same dialog is used for each provider)

Estimate values with the calculator

If you’re unsure what to set, open the calculator from the top of the Limits page. Enter your inbound and outbound traffic (calls per busy hour, average durations, and pickup rate) and it returns a recommended concurrency, inbound reservation, and CPS, each with headroom for spikes. These are suggestions; you still apply them with the cards above.
Concurrency and CPS calculator dialog with traffic sliders and recommended values

Max Call Duration

The maximum duration of a call is 1 hour by default, and the call will end automatically after 1 hour. You can increase this up to 2 hours in your agent settings. Should your operational needs require longer calls, please reach out to our team at support@retellai.com to discuss options.

Max Prompt Token Length

The maximum length of prompt when using Retell LLM framework is 32768 by default, and longer prompts will be rejected when creating or updating the LLM. Note that prompts over 3500 tokens will be charged extra, read more at Billing Exceptions. Should your operational needs require longer context, please reach out to our team at support@retellai.com to discuss options.