Skip to main content

Documentation Index

Fetch the complete documentation index at: https://koreai.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

Evaluation Metrics enable supervisors to define and monitor performance indicators for assessing the quality of agent–customer interactions. These metrics use AI-driven or rule-based analysis to evaluate conversations across different dimensions.

Key Benefits

BenefitDescription
AI-powered intelligenceUses GenAI for contextual evaluation with minimal training data.
Comprehensive coverageProvides multiple measurement types to address diverse evaluation scenarios.
Automated QAReduces manual review workload through intelligent analysis.
Flexible configurationSupports both static and dynamic evaluation options.

Access Evaluation Metrics

Navigate to Quality AI > Configure > Evaluation Forms > Evaluation Metrics. Evaluation Metrics The dashboard shows:
ColumnDescription
NameMetric name.
Metric TypeMeasurement type.
Evaluation FormsAssociated evaluation forms (for example, By Speech, By Question).
Ellipsis iconEdit and delete options.
SearchQuick search to find metrics.
New Evaluation MetricsOption to create new metrics.

Create a New Evaluation Metric

  1. Select the Evaluation Metrics tab.
  2. Select + New Evaluation Metrics.
  3. Choose an Evaluation Metrics Measurement Type.
Evaluation Metrics Measurement Type After selecting a measurement type, you have to configure details such as metric name, evaluation criteria, and adherence settings.

Measurement Types

By Question

Evaluates adherence to specific questions asked or answered during interactions. Key features:
  • Static Adherence-applies to all conversations
  • Dynamic Adherence-conditional evaluation triggered by specific events
  • GenAI Detection-contextual understanding with no training samples required
  • Deterministic Detection-semantic matching against predefined patterns
  • Flexible thresholds-set different similarity scores per use case
Common use cases: Script adherence, greeting compliance, policy verification, response quality. For full configuration details, see By Question.

By Speech

Analyzes speech characteristics during voice interactions. Key features
  • Crosstalk-detects overlapping speech with configurable thresholds
  • Dead Air-monitors silence periods (configurable duration)
  • Speaking Rate-tracks Words Per Minute (WPM)
Use cases: Voice quality, conversation flow analysis, speaking pace optimization. For full configuration details, see By Speech.

By Value

Verifies customer-specific information shared by an agent against trusted data sources. Key features:
  • API integration-real-time verification with CRM and external systems
  • Business rules engine-five rule types (first/last value, negotiated, strict matching, custom)
  • Compliance tracking-detects deviations from expected values
  • Audit trails-logs validation results for supervisory review
Use cases: Pricing accuracy, interest rate verification, account balance confirmation, compliance validation. For full configuration details, see By Value.

By Dialog Task

Assesses completion and quality of specific tasks or workflows within a conversation. Key features:
  • Dialog agent selection-choose which dialog agent to evaluate
  • Evaluation scope-entire conversation or time-bound segment
  • Time parameters-configurable in seconds (voice) or message count (chat)
Use cases: Workflow adherence, task completion verification, dialog flow optimization. For full configuration details, see By Dialog Task.

By Playbook Adherence

Measures how well interactions follow predefined playbooks or procedures. Key features:
  • Entire Playbook-assesses adherence across all playbook components
  • Specific Steps-targets evaluation at specific stages or steps
  • Percentage thresholds-define minimum adherence levels required
Use cases: Process compliance, procedure adherence, enforcement of standards. For full configuration details, see By Playbook Adherence.

By AI Agent

Uses AI agents for sophisticated, multistep evaluations with autonomous decision-making. Key features:
  • Complex analysis: Multi-step reasoning across conversation elements
  • Domain expertise: Supports specialized evaluation contexts (compliance, technical support)
  • Contextual understanding: Nuanced evaluation requiring full conversation context
  • Advanced decision-making: Goes beyond pattern matching for judgment calls
Use cases: Complex compliance assessments, technical troubleshooting evaluation, sophisticated quality analysis. For full configuration details, see By AI Agent.

By Manual Evaluation

Manual Evaluation metrics enable QA teams to assess agent performance through human-led reviews, especially in scenarios where automated detection is less reliable. QA managers configure these metrics in the form, assigning a weight only in points. Key features:
  • Human-Driven Assessment-metrics are evaluated exclusively by QA auditors without Auto QA involvement.
  • Points-Based Only-available only within points-based evaluation forms to ensure accurate scoring allocation.
  • No AI Dependency-independent of GenAI, deterministic detection, triggers, and adherence thresholds.
  • Clear Visual Identification-displays distinctly across Audit screens, Conversation Mining, Heatmaps, and Reports with the suffix (Manual Evaluation Metric).
Use Cases: Manual Evaluation is ideal for assessing complex soft skills (such as tone, empathy, and negotiation), regulatory scenarios requiring human judgment, dispute handling quality, escalation decisions, and high-risk or edge-case interactions. For full configuration details, see By Manual Evaluation.

By Hold

Evaluates how effectively agents manage customer hold scenarios during voice interactions, ensuring proper communication, timing, and resumption behavior. Key Features:
  • Static Adherence-applies consistently to all conversations with hold events
  • Event-driven Evaluation-triggers automatically when hold events occur via telephony integration
  • Multi-instance Detection-evaluates multiple hold events within a single interaction
  • GenAI Detection-contextual, flexible evaluation using LLM-based understanding
  • Deterministic Detection-embedding-based semantic matching against predefined utterances
  • Configurable Sub-criteria-assess hold notification, duration compliance, and call resumption
  • Flexible Thresholds-defines similarity scores, hold duration limits, and evaluation windows
  • Weighted Scoring-assigns percentage-based contributions to each sub-criterion
Use Cases: Hold etiquette compliance, agent coaching, customer experience improvement, regulatory adherence, and interaction quality monitoring during hold scenarios. For full configuration details, see By Hold.

Edit or Delete Evaluation Metrics

  1. Search the required metric to update.
  2. Select the ellipsis (⋮) menu.
  3. Choose Edit to modify or Delete to remove the metrics.
  4. Select Update to save changes.