Overview
This page defines all metrics, terms, and concepts used in AI QA to help you understand your call analysis results.Performance Metrics
Latency
Average Latency: Measures the end-to-end delay between a user speaking and the Voice AI beginning its spoken response. Lower latency indicates more responsive interactions. Latency P50: The 50th percentile (median) of latency measurements. This metric shows the typical response time, with half of all responses being faster and half being slower.Latency is measured in seconds (s). Lower values indicate better performance.
Sentiment Analysis
User Sentiment: Represents the emotional state of the caller as inferred from speech content, tone, and pitch. Sentiment can be positive, negative, or neutral.- User Positive Sentiment Rate: Percentage of user interactions with positive sentiment
- User Negative Sentiment Rate: Percentage of user interactions with negative sentiment
- Negative Sentiment Rate: Overall rate of negative sentiment detected in the conversation
- Agent Positive Sentiment Rate: Percentage of agent responses with positive sentiment
- Agent Natural Tonality Rate: Measures how natural and human-like the agent’s tone sounds
Transcription Metrics
WER (Word Error Rate): Measures the accuracy of speech-to-text transcription by calculating the percentage of words that were incorrectly transcribed. Lower WER indicates better transcription accuracy. Mistranscribed Entities: Count of specific entities (names, dates, numbers, etc.) that were incorrectly transcribed during the call.WER is calculated as: (Substitutions + Insertions + Deletions) / Total Words × 100%
Call Quality Metrics
Interruptions: Count of times the user interrupted the agent during the conversation. Higher interruption counts may indicate the agent is speaking too long or not responding appropriately. Avg. Interruptions: Average number of interruptions per call across the cohort. Agent Naturalness: Measures how human-like the agent sounded, including pronunciation, intonation, pacing, turn-taking behavior, and the absence of robotic patterns. Higher values indicate more natural-sounding speech. Natural Tonality Rate: Percentage of agent speech that sounds natural and human-like in tone and delivery.AI Accuracy Metrics
LLM Hallucination Rate: Measures how often the Large Language Model (LLM) generated incorrect or fabricated information that wasn’t supported by the conversation context or knowledge base. Agent Hallucination: Measures how often the agent hallucinated during conversations. This is a critical metric for ensuring factual accuracy.Knowledge Base Metrics
KB Recall: Measures how effectively the agent retrieved and used relevant information from the knowledge base. Higher recall indicates better knowledge base utilization.KB Recall is calculated as the percentage of relevant knowledge base entries that were successfully retrieved and used during the conversation.
Tool and Function Metrics
Tool Call Accuracy: Measures the rate at which the agent correctly invoked tools or functions. Higher accuracy means the agent is using the right tools at the right time. Tool Call Inaccuracy: Measures the rate at which the agent invoked incorrect tools. This is the inverse of Tool Call Accuracy. Custom Tool Success Rate: Percentage of custom tool calls that completed successfully. Avg Custom Tool Latency: Average time taken for custom tools to execute and return results.Conversation Flow Metrics
Transition Accuracy: Measures the accuracy of transitions between conversation nodes or states. Higher accuracy indicates the agent is following the intended conversation flow correctly. Node Transition Inaccuracy: Measures incorrect node transitions in conversation flows. This metric helps identify when the agent moves to the wrong conversation state.Call Resolution Metrics
Call Resolution Rate: Percentage of calls that were successfully resolved according to your defined resolution criteria. Average Score: Overall quality score for calls in the cohort, calculated based on your resolution criteria and weighted scoring configuration. Calls Analyzed: Total number of calls that have been analyzed in the cohort.Transfer Metrics
Transfer Success Rate: Percentage of calls that were successfully transferred to another agent or system. Transfer Wait Time: Average time users wait before a transfer is completed.Call-Level Data
Call Identification
Call ID: Unique identifier for each individual call in the system. Call Start Time: Timestamp indicating when a call began. Call Length: Total duration of the call, typically measured in seconds or minutes.Evaluation Status
Eval: Evaluation status or score for individual calls, indicating whether the call met the defined resolution criteria.Statistical Terms
Percentiles
P50 (50th Percentile): The median value, where half of all measurements are above and half are below. Also known as the median.Percentiles help understand the distribution of metrics. P50 shows typical performance, while P95 or P99 show worst-case scenarios.