GETTING STARTED
Introduction to SmartAssist
Glossary
Minimum System and Browser Requirements
SmartAssist Lifecycle Management
SmartAssist Setup Guide
Sign Up for SmartAssist
Setup SmartAssist for Use With AgentAssist
Upgrade from SmartAssist to XO v11
Release Notes
Recent Updates
Previous Versions
Frequently Asked Questions (FAQ)

EXPERIENCE DESIGNERS
Flow Designer
Introduction
Create Experience Flows
Navigate the Flow Designer
Experience Flow Nodes
Introduction
Node Types
Start
IVR Menu
IVR Digit Input
Conversational Input
Split
Check Agent Availability
Check Business Hours
Message Prompt
Run Automation
Agent Transfer
Connect to API
Go to Flow
Deflect to Chat
Script Task
Set Queue
End Flow
Waiting Experience
Conversation Automation
Testing Widget
Use Cases
Overview
Questions & Answers
Conversations

ADMINISTRATORS
Account Management
Switch Account
Invite Developers to an Account
Routing
SmartAssist Routing
Skills
Skill Groups
Queues
Hours of Operation
Default Flows
User Management
Users
Agent Groups
Agent Settings
Role Management
Agent Status
System Setup
Channels
Voice
Fallback and Resiliance Configuration
ASR & TTS Troubleshooting Guide
Limitations With Accounts Using AudioCodes
Chat
Email
Agent Transfer
Surveys
Agent Forms
Dispositions
Language & Speech
Overview
Language Management
Voice Preferences
Hold Audio
Standard Responses
Widgets
Agent AI Events
Agent AI Mapping
Agent AI Settings
Wallboards
Utils
SearchAssist
Widget Theming (Layout Customization)
Settings
System Settings
Co-Browse Settings
Supervisor Join/Exit Notification to User
Export SmartAssist Instance
Automatic Away Status for Agent Inactivity
Obscure Customer Info in Interactions Dashboard
Advanced Settings
Community WFM (Beta)
Automatic Conversation Summary (Beta)
Intelligent Agent Tools
Reject Calls With a Delayed First Response
Translation Configurations
System Error Handling
API Reference
API Setup
Rate Limits
API List
Integrations
Voice Automation Integration with Third-Party Applications
Amazon Connect
Voice Automation - Integration with Amazon Connect
Amazon Connect Integration with Kore using Amazon External Voice Connector (Voice Automation)
Genesys
Genesys Voice Bot
Genesys + Kore Voice Automation - Manual Installation Guide
Voice Automation With Genesys Using SIP Invite and AgentAssist Integration
Voice Automation Integration With Genesys Using Audio Connector and Agent AI
ID R&D
ID R&D Integration With Kore
Nice CX
Voice Automation NiceCX (CX One) - SIP Integration
Talkdesk
Talkdesk Voice Automation
Zoom Contact Center (ZCC)
Kore Voice Automation (IVA) Integration with Zoom Contact Center (CC)
Audit Report

AGENTS
Agent Console
Introduction
Conversation Tray
Incoming Interactions
Interacting with Customers
Additional Tools
My Dashboard

SUPERVISORS
SmartAssist Metrics
Dashboard
Automation
Queues and Agents
Interactions
Wallboards
Monitor Queues, Agents, Interactions, and Service Levels
Manage Layout

BUSINESS USERS
Reports
Introduction
Reports List

ASR and TTS Troubleshooting Guide

This guide explains production-tested fixes for ASR and TTS issues on the Kore.ai Voice Gateway. Each section identifies the symptom, root cause, and the exact configuration or code change required.

All parameters are applied using:

userSessionUtils.setCallControlParam('paramName', value);

Microsoft Azure ASR

Utterance Splitting — Speech Split into Multiple Parts

Symptom

The ASR splits one continuous user utterance into multiple turns. For example, "Hello, I want to check my balance" arrives as two separate inputs: "Hello" followed by "I want to check my balance". This causes intent mismatches, loop behaviour, or the bot responding to partial input.

Root Cause

Azure Speech uses AzureSegmentationSilenceTimeoutMs to detect end-of-speech. Natural pauses of 300 ms or more between words can trigger a premature commit. The default value is too aggressive for conversational IVR use cases.

Fix

Add a Script node at the start of your Experience Flow:

// Standard conversational flows
userSessionUtils.setCallControlParam(
  'AzureSegmentationSilenceTimeoutMs', 1200
);

// Longer conversational utterances (agentic app flows)
userSessionUtils.setCallControlParam(
  'AzureSegmentationSilenceTimeoutMs', 2000
);

// Do NOT exceed 3000 — significantly increases latency.

Parameter Reference

Parameter Value / Range Purpose
AzureSegmentationSilenceTimeoutMs 800–2000 ms End-of-speech silence window. Lower = faster response. Higher = fewer splits.
sttMinConfidence 0.4–0.6 Lower to accept more inputs; raise to filter background noise.
botNoInputTimeoutMS 5000–8000 ms Increase for non-English speakers or callers who pause before speaking.

Note: Start with 1200ms and an increase in 200 ms increments if splitting persists. Test with real callers in your target language before going live.

Digit and Number Misrecognition

Symptom

Numbers are misinterpreted by the ASR engine. Common patterns:

  • Spoken Hindi numbers like ek (one) combined with 22,000 become 1,202,000
  • Card numbers with repeated digits are truncated or garbled
  • Confidence scores are above threshold (for example, 75.7%), but the transcription is still incorrect

Fix

// Enable continuous ASR at the digit-collection entity node
userSessionUtils.setCallControlParam('continuousASR', true);
userSessionUtils.setCallControlParam('continuousASRTimeoutInMS', 2000);

// DTMF fallback — strongly recommended for 16-digit card numbers
userSessionUtils.setCallControlParam('dtmfCollectMaxDigits', 16);
userSessionUtils.setCallControlParam('dtmfCollectInterDigitTimeoutMS', 4000);

Input Type Guidance

Input Type Recommended Method Configuration Notes Error Rate
4–8 digit PIN Speech ASR Enable continuousASR. Low
10-digit phone number Speech or DTMF Speech for English; DTMF for non-English. Medium
16-digit card number DTMF preferred ASR struggles with repeated digits. Use DTMF as primary. High with ASR
Currency amounts (Hindi) Speech + lower threshold sttMinConfidence=0.4 + AzureSegmentationSilenceTimeoutMs=1500 Medium

Widespread ASR Failure — Outage or Concurrency Breach

Symptom

All IVR calls fail simultaneously. The bot responds with the default error message, and no speech is recognised across any bots. An Azure region outage typically causes either a concurrency limit breach or expired API credentials.

Immediate Recovery Steps

  1. Check Azure Service Health — Verify the Azure region status at the Azure Service Health dashboard. Confirm whether the outage is regional or global.
  2. Verify API Credentials — Confirm the Azure Speech API key and endpoint have not expired. Rotate credentials if necessary and update them in the Kore.ai platform.
  3. Switch to Fallback ASR — Activate the fallback ASR configuration (different region or secondary vendor) to restore service while the primary region recovers.
  4. Monitor Concurrency Limits — Review your Azure subscription concurrency quota. If the limit has been breached, request a quota increase or shed load to the fallback.

Prevention — Configure Before Go-Live

// Primary and Fallback MUST use different labels.
// Using the same label for both provides no fallback protection.

Primary ASR:  Label = your-primary-label    // primary region
Fallback ASR: Label = your-fallback-label   // different region, same vendor

Note: Always configure fallback ASR/TTS before going to production. Use the same vendor but a different region for optimal compatibility. For vendor-level fallback, configure a secondary vendor (for example, Deepgram as a fallback for Azure).

Microsoft Azure TTS

Mispronounced Words — SSML Phoneme and Say-As Tags

Symptom

Azure Neural Voice mispronounces product names, currency amounts, acronyms, or non-English terms. Digits may be read as words — for example, 6987 spoken as "Che Nau Sath Aath" when using a mismatched voice.

Fix

Use SSML tags in your Message nodes:

<!-- Currency amounts -->
<speak>
  <say-as interpret-as="currency" language="hi-IN">
    ₹3,40,650.23
  </say-as>
</speak>

<!-- Digit-by-digit — prevents misread artifacts -->
<speak>
  Your OTP is <say-as interpret-as="digits">6987</say-as>
</speak>

<!-- Acronym spelled out -->
<speak>
  <say-as interpret-as="spell-out">EMI</say-as>
</speak>

<!-- Pause before key information -->
<speak>
  Your balance is <break time="300ms"/>
  <say-as interpret-as="currency">₹5000</say-as>
</speak>

SSML Tag Reference

Problem SSML Fix Effect
Currency read incorrectly <say-as interpret-as="currency"> Reads with correct language context
Digits read as words <say-as interpret-as="digits"> Forces digit-by-digit: six nine eight seven
Acronym mispronounced <say-as interpret-as="spell-out"> Spells out letter by letter: E-M-I
Brand name wrong <sub alias="correct pronunciation"> Replaces text with specified pronunciation
Need a pause <break time="500ms"/> Inserts a 0.5-second pause
Speak slower <prosody rate="slow"> Reduces speech rate for complex information

TTS Reading Newlines from LLM or Agentic Responses

Symptom

When using LLM or agentic app responses in voice flows, the TTS engine reads literal newline characters aloud — saying "backslash n backslash n" instead of pausing. This happens because LLM outputs contain formatting characters (\n, markdown bold, bullet points) that TTS cannot interpret.

Fix

Add a Script node before the Message node to sanitise the LLM output:

// Replace 'YourLLMNode' with your actual node name
var raw = context.steps.YourLLMNode.response || '';

var clean = raw
  .replace(/\\n/g, ' ')         // literal \n strings from LLM
  .replace(/\n/g, ' ')          // actual newline characters
  .replace(/\r/g, ' ')          // carriage returns
  .replace(/\*\*/g, '')         // markdown bold
  .replace(/\*/g, '')           // markdown bullets / italic
  .replace(/#+\s/g, '')         // markdown headers
  .replace(/^\s*[-•]\s*/gm, '') // bullet characters
  .replace(/\s{2,}/g, ' ')      // collapse multiple spaces
  .trim();

context.ttsOutput = clean;
// Reference context.ttsOutput in your Message node

Note: There is no call control parameter for this. Sanitisation must be applied in a Script node before the Message node. Adjust the variable path (context.steps.YourLLMNode.response) to match your specific flow design.

Hindi TTS Failure — Voice-Language Mismatch

Symptom

On language switch to Hindi mid-call, TTS throws a synthAudio requires language error. Logs show language: undefined and voice: undefined. Digits like 6987 may be spoken incorrectly when an English voice receives Devanagari numerals.

Root Cause

The voice name and language code do not match. For example, using language=hi-IN with voice=en-IN-AartiIndicNeural — which is an English-Indian voice, not a Hindi voice.

Fix

Add a Script node before the first Hindi message node:

userSessionUtils.setCallControlParam('ttsLanguage', 'hi-IN');
userSessionUtils.setCallControlParam('voiceName', 'hi-IN-SwaraNeural');
userSessionUtils.setCallControlParam('ttsProvider', 'microsoft');

// CRITICAL: voice name and language MUST match.
// WRONG:   language=hi-IN + voice=en-IN-AartiIndicNeural
// CORRECT: language=hi-IN + voice=hi-IN-SwaraNeural

Supported Hindi Voices

Voice Name Gender Language Notes
hi-IN-SwaraNeural Female hi-IN Recommended — natural Hindi
hi-IN-MadhurNeural Male hi-IN Alternate male voice
en-IN-AartiIndicNeural Female en-IN English-Indian — do not use for Hindi text

Note: The platform does not currently validate voice-language matching during configuration. Verify manually that the voiceName language prefix matches the ttsLanguage code. A platform-level validation is planned.

Configured Voice Not Applying at Runtime

Symptom

The voice configured in Voice Preferences (for example, hi-IN-SwaraNeural) is not applied during live calls. The bot uses the default English voice instead, with no error shown in the UI.

Resolution

Issue Resolution
Root Cause TTS initialises at the first message node. If automation or a script runs first, TTS uses defaults before the voice configuration loads.
Fix 1 (Recommended) Add a short message node as the very first node in your Experience Flow — before any script or automation node.
Fix 2 After saving the voice config in the UI, edit the flow description field and re-save to force re-registration.
Fix 3 (Script) Set voice via script: userSessionUtils.setCallControlParam('voiceName', 'hi-IN-SwaraNeural'); before the message node.
Fix 4 (Google) If you see 'No credentials for Google with labels: undefined' — set ttsLabel using a script, not the UI. See No Credentials for Google.

Deepgram ASR / TTS

Utterance Splitting — utteranceEndMs Tuning

Symptom

The user says a long sentence, but Deepgram commits mid-sentence due to natural pauses. Setting deepgramUtteranceEndMs to 3000 or 4000 still results in splits.

Root Cause

deepgramUtteranceEndMs alone is insufficient. It must be paired with deepgramEndpointing to handle micro-pauses within sentences. The endpointing parameter controls Voice Activity Detection (VAD) sensitivity for short gaps. utteranceEndMs controls the final silence window.

Fix

// Recommended for agentic / conversational flows
userSessionUtils.setCallControlParam('deepgramUtteranceEndMs', 1500);
userSessionUtils.setCallControlParam('deepgramEndpointing', 500);
userSessionUtils.setCallControlParam('sttMinConfidence', 0.4);
userSessionUtils.setCallControlParam('deepgramNumerals', true);
userSessionUtils.setCallControlParam('deepgramPunctuate', true);

// For Spanish / Latin American Spanish
userSessionUtils.setCallControlParam('sttLanguage', 'es-419');

Note: Why smaller utteranceEndMs works better: endpointing=500 handles micro-pauses within speech; utteranceEndMs=1500 handles the longer gap at the end. Setting utteranceEndMs=3000–4000 delays the final commit but does not fix mid-sentence splits.

Parameter Reference

Parameter Value / Range Purpose
deepgramUtteranceEndMs 800–2000 ms End-of-utterance silence window. Start at 1200; increase by 200 ms if splitting persists.
deepgramEndpointing 300–600 ms VAD sensitivity. 300 ms = responsive; 600 ms = more patient with pauses.
deepgramNumerals true/false Converts spoken numbers to digits (for example, five hundred500).
deepgramNer true/false Named Entity Recognition — improves proper noun and entity recognition.
deepgramPunctuate true/false Adds punctuation to transcripts — useful for NLU processing.
deepgramKeyterms String array Boosts recognition of specific listed words.

Digit and RX Number Recognition

Symptom

The bot collects multi-digit numbers (for example, 5–7 digit prescription or account numbers). Callers pause between digits. The bot splits the utterance or misses digits.

Production-Tested Configuration

var recognizerConfig = {
  vendor: 'deepgram',
  model: 'nova-3-medical',   // Use nova-3 for non-medical flows
  language: 'en-US',
  interim: true,
  deepgramOptions: {
    endpointing: 300,
    utteranceEndMs: 1000,
    keyterms: [
      'zero', 'one', 'two', 'three', 'four',
      'five', 'six', 'seven', 'eight', 'nine'
    ],
    numerals: true
  }
};

userSessionUtils.setCallControlParam('dtmfBargein', false);
userSessionUtils.setCallControlParam('listenDuringPrompt', true);

Model Selection

Model Best For Use Case Status
nova-3 Best general accuracy Default IVR flows Recommended
nova-3-medical Medical/clinical terms, RX numbers Healthcare, pharmacy, insurance Recommended
nova-2-phonecall Optimised for telephony audio Call centre deployments Stable
enhanced High accuracy, slightly higher latency Accuracy-critical flows Stable

Google ASR and TTS

16-Digit Card Number — Repeated Digit Failure

Symptom

Google ASR fails on sequences with 8 or more consecutive identical digits (for example, 509099999999990002). The bot may barge in early. This is a known Google ASR limitation — DTMF is the reliable alternative for 16-digit card numbers.

Fix

// Recommended: DTMF input for card numbers
userSessionUtils.setCallControlParam('dtmfCollectMaxDigits', 16);
userSessionUtils.setCallControlParam('dtmfCollectInterDigitTimeoutMS', 4000);
userSessionUtils.setCallControlParam('dtmfCollectTermDigit', '#');

// If speech must be used — enable continuous ASR
userSessionUtils.setCallControlParam('continuousASR', true);
userSessionUtils.setCallControlParam('continuousASRTimeoutInMS', 2500);

// Note: AzureSegmentationSilenceTimeoutMs does NOT affect Google ASR.

Google ASR Model Reference

Model Description Use Case Status
chirp_2 Best for telephony + accents General IVR Recommended
chirp_telephony Optimised for 8 kHz / 16 kHz audio Telephony deployments Recommended
telephony Legacy — stable, widely tested Default Good default
telephony_short Fast, for short utterances (< 5 s) Menu / Y-N commands Recommended
long Batch file processing Not for real-time IVR Do not use for IVR

No Credentials for Google — Label: undefined Error

Symptom

TTS fails with: No text-to-speech service credentials for Google with labels: undefined. This occurs even after setting ttsLabel in the Voice Preferences UI.

Root Cause

The ttsLabel configured in the Voice Preferences UI does not propagate to the TTS engine at runtime in some configurations. The label must be set using a Script node.

Fix

Add a Script node at the start of your Experience Flow:

// TTS configuration
userSessionUtils.setCallControlParam('ttsProvider', 'google');
userSessionUtils.setCallControlParam('ttsLabel', 'your_google_label');
userSessionUtils.setCallControlParam('ttsLanguage', 'en-US');
userSessionUtils.setCallControlParam('voiceName', 'en-US-Chirp3-HD-Aoede');

// ASR configuration (same approach)
userSessionUtils.setCallControlParam('sttProvider', 'google');
userSessionUtils.setCallControlParam('sttLabel', 'your_google_label');

Note: The Voice Preferences UI setting  ttsLabel does not propagate at runtime for Google TTS. Always set the label using a script as shown above until this is resolved in a future platform update.

Background Streaming — Prompts Not Played

Symptom

When using Google TTS with Chirp voices and Background Streaming enabled at the Experience Flow level, follow-up prompts (for example, "Is there anything else?") are logged as synthesised successfully, but no audio is delivered to the caller.

Fix Options

Option Action
Option 1 — Disable Background Streaming Go to Experience Flow > Edit > Background Streaming > Disabled. Confirm audio plays correctly.
Option 2 — Use a non-Chirp voice Switch to a Google Neural voice (for example, en-US-Wavenet-F). Confirmed compatible with Background Streaming.
Option 3 — Use Deepgram TTS Deepgram TTS has native streaming support and is more stable with Background Streaming enabled.

Note: Google Chirp voices are not compatible with Background Streaming. This is a known platform issue being tracked for resolution.

Sarvam TTS

TTS Fails on Currency, Commas, and Special Characters

Symptom

Sarvam TTS fails on messages containing the symbol, commas in Indian number format (for example, 3,40,650.23), or mixed punctuation. The Sarvam API does not handle Unicode currency symbols or Indian number formatting natively.

Fix

Pre-process text in a Script node before passing it to Sarvam TTS:

function sanitiseForSarvam(text) {
  return text
    .replace(/₹([\d,]+\.?\d*)/g, function(m, n) {
      var num = parseFloat(n.replace(/,/g, ''));
      if (num >= 100000) return (num / 100000).toFixed(0) + ' lakh rupaye';
      if (num >= 1000)   return (num / 1000).toFixed(0)   + ' hazaar rupaye';
      return num + ' rupaye';
    })
    .replace(/[,;]/g, ' ')       // remove commas and semicolons
    .replace(/\.(?!\d)/g, ' ')   // periods not followed by a digit
    .replace(/[()\[\]{}]/g, '')  // brackets
    .replace(/\s{2,}/g, ' ')     // collapse multiple spaces
    .trim();
}

context.sarvamInput = sanitiseForSarvam(context.llmResponse);

// Provider configuration
userSessionUtils.setCallControlParam('ttsProvider', 'custom:sarvamTTS');
userSessionUtils.setCallControlParam('voiceName', 'simran');

Note: Sarvam does not support SSML tags. Use plain text only. All text formatting must be handled in the sanitisation script before passing to Sarvam TTS.

Background Streaming — Audio Not Played

Symptom

When Background Streaming is enabled at the Experience Flow level with Sarvam TTS, audio synthesis is logged as successful in Interactions, but no audio plays to the caller.

Fix

Disable Background Streaming when using Sarvam TTS. Go to Experience Flow > Edit > Background Streaming > Disabled.

Note: Sarvam TTS is not compatible with Background Streaming. A platform fix is being tracked. Until resolved, Background Streaming must be disabled for any flow using Sarvam TTS.

Call Flow — Barge-in, Transfer, and Transcription

Aggressive Barge-in — Bot Interrupts User Mid-Speech

Symptom

The bot plays filler music in under 3 seconds, cutting off the user’s utterance. The ActionHookDelayProcessor fires too early, especially when there is a language mismatch between the TTS text and the voice configuration.

Fix

// Disable barge-in for specific nodes (legal, OTP, transfer messages)
userSessionUtils.setCallControlParam('node.bargein', false);

// Delay filler music — prevent premature firing
userSessionUtils.setCallControlParam('actionHookDelayInMs', 8000);

// Session-level barge-in sensitivity
userSessionUtils.setCallControlParam('session.bargein', true);
userSessionUtils.setCallControlParam('session.bargeInSensitivity', 'low');

Scenario Guide

Scenario Recommended Fix
Bot interrupts greetings Set actionHookDelayInMs=8000 or higher to give the caller time to state their full intent.
Barge in on legal or OTP prompts Set node.bargein=false on those specific message nodes.
Background noise triggers barge-in Set bargeInSensitivity=low. Ensure noise suppression is active in your telephony setup.
Transfer message cut short Use queueCommand: true — see Transfer Fires Before TTS Message Completes.

Transfer Fires Before TTS Message Completes

Symptom

Agent transfer fires before the farewell or transfer message finishes playing. The caller hears a partial message before being transferred.

Root Cause

A direct Agent Transfer Node placed after a Message Node creates a race condition — the transfer may execute before TTS playback completes.

Fix

Use queueCommand: true in a Script node to ensure TTS completes first:

var transferCmd = {
  type: 'command',
  command: 'redirect',
  queueCommand: true,  // Critical — waits for TTS to complete before transfer
  data: [{
    verb: 'transfer',
    destination: 'sip:agent@your-sip-domain.com'
  }]
};

// Execute using your flow's command execution method.>

Note: The queueCommand: true parameter tells the Voice Gateway to wait until the current TTS playback is fully completed before executing the transfer. Always use this for any action that follows a voice prompt.

Transcription Not Displaying — Twilio or ASR Timeout

Symptom

The Transcripts tab is empty after completed calls. This can have two distinct root causes: Twilio webhook misconfiguration or an ASR timeout that is too short.

Diagnosis and Fix

Root Cause How to Diagnose Fix
Twilio webhook misconfiguration The Twilio Media Stream webhook is missing the Kore transcript endpoint. Transcription is enabled, but results are not sent to the platform. Verify your webhook URL in the Twilio Console includes the correct Kore transcript endpoint for your environment. Contact Kore support to confirm the correct URL.
ASR timeout too short ASR timeout is set too low (for example, 2000 ms). Calls end with  NO_INPUT before transcription completes. Increase botNoInputTimeoutMS to 8000 ms or higher. Verify sttMinConfidence is not set too high.
ChannelOverrideTemplate override Template-level timeout overrides Experience Flow settings silently. Set botNoInputTimeoutMS explicitly in the ChannelOverrideTemplate itself, not just in the Experience Flow.

Quick Reference — All Parameters

All parameters are set using:

userSessionUtils.setCallControlParam('paramName', value);
Parameter Vendor Value Range Purpose
AzureSegmentationSilenceTimeoutMs Azure ASR 800–2000 ms End-of-speech silence. Increase to reduce utterance splits.
continuousASR Azure, Google true/false Enable for digit or number collection.
continuousASRTimeoutInMS Azure, Google 1500–3000 ms Wait after the last input in continuous mode.
sttMinConfidence All vendors 0.4–0.7 Minimum confidence to accept an utterance. Lower = more permissive.
sttLanguage All vendors BCP-47 code ASR language (for example, en-US, hi-IN, es-419, ja-JP).
sttProvider All vendors Provider string ASR vendor selection (microsoft, google, deepgram).
botNoInputTimeoutMS All vendors 5000–15000 ms Wait before the no-input event. Increase for non-English.
deepgramUtteranceEndMs Deepgram 800–2000 ms End-of-utterance silence. Pair with deepgramEndpointing.
deepgramEndpointing Deepgram 300–600 ms VAD sensitivity. 300 ms = responsive; 600 ms = patient.
deepgramNumerals Deepgram true/false Spoken numbers → digits (five hundred500).
deepgramPunctuate Deepgram true/false Adds punctuation to transcripts.
deepgramKeyterms Deepgram String array Boost recognition of specific words.
ttsLanguage All TTS BCP-47 code TTS language. Must match voiceName language.
ttsProvider All TTS Provider string TTS vendor (microsoft, google, custom:sarvamTTS).
ttsLabel Google TTS Label string Must be set using a script — not UI only.
voiceName All TTS Voice identifier Must match ttsLanguage.
node.bargein All vendors true/false Enable or disable barge-in per node.
session.bargeInSensitivity All vendors low/medium/high Session-level barge-in sensitivity.
actionHookDelayInMs All vendors 5000–10000 ms Delay before filler music fires.
dtmfCollectMaxDigits All vendors 1–20 Maximum DTMF digits to collect (card numbers: 16).
dtmfCollectInterDigitTimeoutMS All vendors 3000–5000 ms Wait between DTMF digits.
dtmfCollectTermDigit All vendors # or * Termination digit for DTMF collection.
listenDuringPrompt All vendors true/false Allow ASR to listen while TTS is playing.

ASR and TTS Troubleshooting Guide

This guide explains production-tested fixes for ASR and TTS issues on the Kore.ai Voice Gateway. Each section identifies the symptom, root cause, and the exact configuration or code change required.

All parameters are applied using:

userSessionUtils.setCallControlParam('paramName', value);

Microsoft Azure ASR

Utterance Splitting — Speech Split into Multiple Parts

Symptom

The ASR splits one continuous user utterance into multiple turns. For example, "Hello, I want to check my balance" arrives as two separate inputs: "Hello" followed by "I want to check my balance". This causes intent mismatches, loop behaviour, or the bot responding to partial input.

Root Cause

Azure Speech uses AzureSegmentationSilenceTimeoutMs to detect end-of-speech. Natural pauses of 300 ms or more between words can trigger a premature commit. The default value is too aggressive for conversational IVR use cases.

Fix

Add a Script node at the start of your Experience Flow:

// Standard conversational flows
userSessionUtils.setCallControlParam(
  'AzureSegmentationSilenceTimeoutMs', 1200
);

// Longer conversational utterances (agentic app flows)
userSessionUtils.setCallControlParam(
  'AzureSegmentationSilenceTimeoutMs', 2000
);

// Do NOT exceed 3000 — significantly increases latency.

Parameter Reference

Parameter Value / Range Purpose
AzureSegmentationSilenceTimeoutMs 800–2000 ms End-of-speech silence window. Lower = faster response. Higher = fewer splits.
sttMinConfidence 0.4–0.6 Lower to accept more inputs; raise to filter background noise.
botNoInputTimeoutMS 5000–8000 ms Increase for non-English speakers or callers who pause before speaking.

Note: Start with 1200ms and an increase in 200 ms increments if splitting persists. Test with real callers in your target language before going live.

Digit and Number Misrecognition

Symptom

Numbers are misinterpreted by the ASR engine. Common patterns:

  • Spoken Hindi numbers like ek (one) combined with 22,000 become 1,202,000
  • Card numbers with repeated digits are truncated or garbled
  • Confidence scores are above threshold (for example, 75.7%), but the transcription is still incorrect

Fix

// Enable continuous ASR at the digit-collection entity node
userSessionUtils.setCallControlParam('continuousASR', true);
userSessionUtils.setCallControlParam('continuousASRTimeoutInMS', 2000);

// DTMF fallback — strongly recommended for 16-digit card numbers
userSessionUtils.setCallControlParam('dtmfCollectMaxDigits', 16);
userSessionUtils.setCallControlParam('dtmfCollectInterDigitTimeoutMS', 4000);

Input Type Guidance

Input Type Recommended Method Configuration Notes Error Rate
4–8 digit PIN Speech ASR Enable continuousASR. Low
10-digit phone number Speech or DTMF Speech for English; DTMF for non-English. Medium
16-digit card number DTMF preferred ASR struggles with repeated digits. Use DTMF as primary. High with ASR
Currency amounts (Hindi) Speech + lower threshold sttMinConfidence=0.4 + AzureSegmentationSilenceTimeoutMs=1500 Medium

Widespread ASR Failure — Outage or Concurrency Breach

Symptom

All IVR calls fail simultaneously. The bot responds with the default error message, and no speech is recognised across any bots. An Azure region outage typically causes either a concurrency limit breach or expired API credentials.

Immediate Recovery Steps

  1. Check Azure Service Health — Verify the Azure region status at the Azure Service Health dashboard. Confirm whether the outage is regional or global.
  2. Verify API Credentials — Confirm the Azure Speech API key and endpoint have not expired. Rotate credentials if necessary and update them in the Kore.ai platform.
  3. Switch to Fallback ASR — Activate the fallback ASR configuration (different region or secondary vendor) to restore service while the primary region recovers.
  4. Monitor Concurrency Limits — Review your Azure subscription concurrency quota. If the limit has been breached, request a quota increase or shed load to the fallback.

Prevention — Configure Before Go-Live

// Primary and Fallback MUST use different labels.
// Using the same label for both provides no fallback protection.

Primary ASR:  Label = your-primary-label    // primary region
Fallback ASR: Label = your-fallback-label   // different region, same vendor

Note: Always configure fallback ASR/TTS before going to production. Use the same vendor but a different region for optimal compatibility. For vendor-level fallback, configure a secondary vendor (for example, Deepgram as a fallback for Azure).

Microsoft Azure TTS

Mispronounced Words — SSML Phoneme and Say-As Tags

Symptom

Azure Neural Voice mispronounces product names, currency amounts, acronyms, or non-English terms. Digits may be read as words — for example, 6987 spoken as "Che Nau Sath Aath" when using a mismatched voice.

Fix

Use SSML tags in your Message nodes:

<!-- Currency amounts -->
<speak>
  <say-as interpret-as="currency" language="hi-IN">
    ₹3,40,650.23
  </say-as>
</speak>

<!-- Digit-by-digit — prevents misread artifacts -->
<speak>
  Your OTP is <say-as interpret-as="digits">6987</say-as>
</speak>

<!-- Acronym spelled out -->
<speak>
  <say-as interpret-as="spell-out">EMI</say-as>
</speak>

<!-- Pause before key information -->
<speak>
  Your balance is <break time="300ms"/>
  <say-as interpret-as="currency">₹5000</say-as>
</speak>

SSML Tag Reference

Problem SSML Fix Effect
Currency read incorrectly <say-as interpret-as="currency"> Reads with correct language context
Digits read as words <say-as interpret-as="digits"> Forces digit-by-digit: six nine eight seven
Acronym mispronounced <say-as interpret-as="spell-out"> Spells out letter by letter: E-M-I
Brand name wrong <sub alias="correct pronunciation"> Replaces text with specified pronunciation
Need a pause <break time="500ms"/> Inserts a 0.5-second pause
Speak slower <prosody rate="slow"> Reduces speech rate for complex information

TTS Reading Newlines from LLM or Agentic Responses

Symptom

When using LLM or agentic app responses in voice flows, the TTS engine reads literal newline characters aloud — saying "backslash n backslash n" instead of pausing. This happens because LLM outputs contain formatting characters (\n, markdown bold, bullet points) that TTS cannot interpret.

Fix

Add a Script node before the Message node to sanitise the LLM output:

// Replace 'YourLLMNode' with your actual node name
var raw = context.steps.YourLLMNode.response || '';

var clean = raw
  .replace(/\\n/g, ' ')         // literal \n strings from LLM
  .replace(/\n/g, ' ')          // actual newline characters
  .replace(/\r/g, ' ')          // carriage returns
  .replace(/\*\*/g, '')         // markdown bold
  .replace(/\*/g, '')           // markdown bullets / italic
  .replace(/#+\s/g, '')         // markdown headers
  .replace(/^\s*[-•]\s*/gm, '') // bullet characters
  .replace(/\s{2,}/g, ' ')      // collapse multiple spaces
  .trim();

context.ttsOutput = clean;
// Reference context.ttsOutput in your Message node

Note: There is no call control parameter for this. Sanitisation must be applied in a Script node before the Message node. Adjust the variable path (context.steps.YourLLMNode.response) to match your specific flow design.

Hindi TTS Failure — Voice-Language Mismatch

Symptom

On language switch to Hindi mid-call, TTS throws a synthAudio requires language error. Logs show language: undefined and voice: undefined. Digits like 6987 may be spoken incorrectly when an English voice receives Devanagari numerals.

Root Cause

The voice name and language code do not match. For example, using language=hi-IN with voice=en-IN-AartiIndicNeural — which is an English-Indian voice, not a Hindi voice.

Fix

Add a Script node before the first Hindi message node:

userSessionUtils.setCallControlParam('ttsLanguage', 'hi-IN');
userSessionUtils.setCallControlParam('voiceName', 'hi-IN-SwaraNeural');
userSessionUtils.setCallControlParam('ttsProvider', 'microsoft');

// CRITICAL: voice name and language MUST match.
// WRONG:   language=hi-IN + voice=en-IN-AartiIndicNeural
// CORRECT: language=hi-IN + voice=hi-IN-SwaraNeural

Supported Hindi Voices

Voice Name Gender Language Notes
hi-IN-SwaraNeural Female hi-IN Recommended — natural Hindi
hi-IN-MadhurNeural Male hi-IN Alternate male voice
en-IN-AartiIndicNeural Female en-IN English-Indian — do not use for Hindi text

Note: The platform does not currently validate voice-language matching during configuration. Verify manually that the voiceName language prefix matches the ttsLanguage code. A platform-level validation is planned.

Configured Voice Not Applying at Runtime

Symptom

The voice configured in Voice Preferences (for example, hi-IN-SwaraNeural) is not applied during live calls. The bot uses the default English voice instead, with no error shown in the UI.

Resolution

Issue Resolution
Root Cause TTS initialises at the first message node. If automation or a script runs first, TTS uses defaults before the voice configuration loads.
Fix 1 (Recommended) Add a short message node as the very first node in your Experience Flow — before any script or automation node.
Fix 2 After saving the voice config in the UI, edit the flow description field and re-save to force re-registration.
Fix 3 (Script) Set voice via script: userSessionUtils.setCallControlParam('voiceName', 'hi-IN-SwaraNeural'); before the message node.
Fix 4 (Google) If you see 'No credentials for Google with labels: undefined' — set ttsLabel using a script, not the UI. See No Credentials for Google.

Deepgram ASR / TTS

Utterance Splitting — utteranceEndMs Tuning

Symptom

The user says a long sentence, but Deepgram commits mid-sentence due to natural pauses. Setting deepgramUtteranceEndMs to 3000 or 4000 still results in splits.

Root Cause

deepgramUtteranceEndMs alone is insufficient. It must be paired with deepgramEndpointing to handle micro-pauses within sentences. The endpointing parameter controls Voice Activity Detection (VAD) sensitivity for short gaps. utteranceEndMs controls the final silence window.

Fix

// Recommended for agentic / conversational flows
userSessionUtils.setCallControlParam('deepgramUtteranceEndMs', 1500);
userSessionUtils.setCallControlParam('deepgramEndpointing', 500);
userSessionUtils.setCallControlParam('sttMinConfidence', 0.4);
userSessionUtils.setCallControlParam('deepgramNumerals', true);
userSessionUtils.setCallControlParam('deepgramPunctuate', true);

// For Spanish / Latin American Spanish
userSessionUtils.setCallControlParam('sttLanguage', 'es-419');

Note: Why smaller utteranceEndMs works better: endpointing=500 handles micro-pauses within speech; utteranceEndMs=1500 handles the longer gap at the end. Setting utteranceEndMs=3000–4000 delays the final commit but does not fix mid-sentence splits.

Parameter Reference

Parameter Value / Range Purpose
deepgramUtteranceEndMs 800–2000 ms End-of-utterance silence window. Start at 1200; increase by 200 ms if splitting persists.
deepgramEndpointing 300–600 ms VAD sensitivity. 300 ms = responsive; 600 ms = more patient with pauses.
deepgramNumerals true/false Converts spoken numbers to digits (for example, five hundred500).
deepgramNer true/false Named Entity Recognition — improves proper noun and entity recognition.
deepgramPunctuate true/false Adds punctuation to transcripts — useful for NLU processing.
deepgramKeyterms String array Boosts recognition of specific listed words.

Digit and RX Number Recognition

Symptom

The bot collects multi-digit numbers (for example, 5–7 digit prescription or account numbers). Callers pause between digits. The bot splits the utterance or misses digits.

Production-Tested Configuration

var recognizerConfig = {
  vendor: 'deepgram',
  model: 'nova-3-medical',   // Use nova-3 for non-medical flows
  language: 'en-US',
  interim: true,
  deepgramOptions: {
    endpointing: 300,
    utteranceEndMs: 1000,
    keyterms: [
      'zero', 'one', 'two', 'three', 'four',
      'five', 'six', 'seven', 'eight', 'nine'
    ],
    numerals: true
  }
};

userSessionUtils.setCallControlParam('dtmfBargein', false);
userSessionUtils.setCallControlParam('listenDuringPrompt', true);

Model Selection

Model Best For Use Case Status
nova-3 Best general accuracy Default IVR flows Recommended
nova-3-medical Medical/clinical terms, RX numbers Healthcare, pharmacy, insurance Recommended
nova-2-phonecall Optimised for telephony audio Call centre deployments Stable
enhanced High accuracy, slightly higher latency Accuracy-critical flows Stable

Google ASR and TTS

16-Digit Card Number — Repeated Digit Failure

Symptom

Google ASR fails on sequences with 8 or more consecutive identical digits (for example, 509099999999990002). The bot may barge in early. This is a known Google ASR limitation — DTMF is the reliable alternative for 16-digit card numbers.

Fix

// Recommended: DTMF input for card numbers
userSessionUtils.setCallControlParam('dtmfCollectMaxDigits', 16);
userSessionUtils.setCallControlParam('dtmfCollectInterDigitTimeoutMS', 4000);
userSessionUtils.setCallControlParam('dtmfCollectTermDigit', '#');

// If speech must be used — enable continuous ASR
userSessionUtils.setCallControlParam('continuousASR', true);
userSessionUtils.setCallControlParam('continuousASRTimeoutInMS', 2500);

// Note: AzureSegmentationSilenceTimeoutMs does NOT affect Google ASR.

Google ASR Model Reference

Model Description Use Case Status
chirp_2 Best for telephony + accents General IVR Recommended
chirp_telephony Optimised for 8 kHz / 16 kHz audio Telephony deployments Recommended
telephony Legacy — stable, widely tested Default Good default
telephony_short Fast, for short utterances (< 5 s) Menu / Y-N commands Recommended
long Batch file processing Not for real-time IVR Do not use for IVR

No Credentials for Google — Label: undefined Error

Symptom

TTS fails with: No text-to-speech service credentials for Google with labels: undefined. This occurs even after setting ttsLabel in the Voice Preferences UI.

Root Cause

The ttsLabel configured in the Voice Preferences UI does not propagate to the TTS engine at runtime in some configurations. The label must be set using a Script node.

Fix

Add a Script node at the start of your Experience Flow:

// TTS configuration
userSessionUtils.setCallControlParam('ttsProvider', 'google');
userSessionUtils.setCallControlParam('ttsLabel', 'your_google_label');
userSessionUtils.setCallControlParam('ttsLanguage', 'en-US');
userSessionUtils.setCallControlParam('voiceName', 'en-US-Chirp3-HD-Aoede');

// ASR configuration (same approach)
userSessionUtils.setCallControlParam('sttProvider', 'google');
userSessionUtils.setCallControlParam('sttLabel', 'your_google_label');

Note: The Voice Preferences UI setting  ttsLabel does not propagate at runtime for Google TTS. Always set the label using a script as shown above until this is resolved in a future platform update.

Background Streaming — Prompts Not Played

Symptom

When using Google TTS with Chirp voices and Background Streaming enabled at the Experience Flow level, follow-up prompts (for example, "Is there anything else?") are logged as synthesised successfully, but no audio is delivered to the caller.

Fix Options

Option Action
Option 1 — Disable Background Streaming Go to Experience Flow > Edit > Background Streaming > Disabled. Confirm audio plays correctly.
Option 2 — Use a non-Chirp voice Switch to a Google Neural voice (for example, en-US-Wavenet-F). Confirmed compatible with Background Streaming.
Option 3 — Use Deepgram TTS Deepgram TTS has native streaming support and is more stable with Background Streaming enabled.

Note: Google Chirp voices are not compatible with Background Streaming. This is a known platform issue being tracked for resolution.

Sarvam TTS

TTS Fails on Currency, Commas, and Special Characters

Symptom

Sarvam TTS fails on messages containing the symbol, commas in Indian number format (for example, 3,40,650.23), or mixed punctuation. The Sarvam API does not handle Unicode currency symbols or Indian number formatting natively.

Fix

Pre-process text in a Script node before passing it to Sarvam TTS:

function sanitiseForSarvam(text) {
  return text
    .replace(/₹([\d,]+\.?\d*)/g, function(m, n) {
      var num = parseFloat(n.replace(/,/g, ''));
      if (num >= 100000) return (num / 100000).toFixed(0) + ' lakh rupaye';
      if (num >= 1000)   return (num / 1000).toFixed(0)   + ' hazaar rupaye';
      return num + ' rupaye';
    })
    .replace(/[,;]/g, ' ')       // remove commas and semicolons
    .replace(/\.(?!\d)/g, ' ')   // periods not followed by a digit
    .replace(/[()\[\]{}]/g, '')  // brackets
    .replace(/\s{2,}/g, ' ')     // collapse multiple spaces
    .trim();
}

context.sarvamInput = sanitiseForSarvam(context.llmResponse);

// Provider configuration
userSessionUtils.setCallControlParam('ttsProvider', 'custom:sarvamTTS');
userSessionUtils.setCallControlParam('voiceName', 'simran');

Note: Sarvam does not support SSML tags. Use plain text only. All text formatting must be handled in the sanitisation script before passing to Sarvam TTS.

Background Streaming — Audio Not Played

Symptom

When Background Streaming is enabled at the Experience Flow level with Sarvam TTS, audio synthesis is logged as successful in Interactions, but no audio plays to the caller.

Fix

Disable Background Streaming when using Sarvam TTS. Go to Experience Flow > Edit > Background Streaming > Disabled.

Note: Sarvam TTS is not compatible with Background Streaming. A platform fix is being tracked. Until resolved, Background Streaming must be disabled for any flow using Sarvam TTS.

Call Flow — Barge-in, Transfer, and Transcription

Aggressive Barge-in — Bot Interrupts User Mid-Speech

Symptom

The bot plays filler music in under 3 seconds, cutting off the user’s utterance. The ActionHookDelayProcessor fires too early, especially when there is a language mismatch between the TTS text and the voice configuration.

Fix

// Disable barge-in for specific nodes (legal, OTP, transfer messages)
userSessionUtils.setCallControlParam('node.bargein', false);

// Delay filler music — prevent premature firing
userSessionUtils.setCallControlParam('actionHookDelayInMs', 8000);

// Session-level barge-in sensitivity
userSessionUtils.setCallControlParam('session.bargein', true);
userSessionUtils.setCallControlParam('session.bargeInSensitivity', 'low');

Scenario Guide

Scenario Recommended Fix
Bot interrupts greetings Set actionHookDelayInMs=8000 or higher to give the caller time to state their full intent.
Barge in on legal or OTP prompts Set node.bargein=false on those specific message nodes.
Background noise triggers barge-in Set bargeInSensitivity=low. Ensure noise suppression is active in your telephony setup.
Transfer message cut short Use queueCommand: true — see Transfer Fires Before TTS Message Completes.

Transfer Fires Before TTS Message Completes

Symptom

Agent transfer fires before the farewell or transfer message finishes playing. The caller hears a partial message before being transferred.

Root Cause

A direct Agent Transfer Node placed after a Message Node creates a race condition — the transfer may execute before TTS playback completes.

Fix

Use queueCommand: true in a Script node to ensure TTS completes first:

var transferCmd = {
  type: 'command',
  command: 'redirect',
  queueCommand: true,  // Critical — waits for TTS to complete before transfer
  data: [{
    verb: 'transfer',
    destination: 'sip:agent@your-sip-domain.com'
  }]
};

// Execute using your flow's command execution method.>

Note: The queueCommand: true parameter tells the Voice Gateway to wait until the current TTS playback is fully completed before executing the transfer. Always use this for any action that follows a voice prompt.

Transcription Not Displaying — Twilio or ASR Timeout

Symptom

The Transcripts tab is empty after completed calls. This can have two distinct root causes: Twilio webhook misconfiguration or an ASR timeout that is too short.

Diagnosis and Fix

Root Cause How to Diagnose Fix
Twilio webhook misconfiguration The Twilio Media Stream webhook is missing the Kore transcript endpoint. Transcription is enabled, but results are not sent to the platform. Verify your webhook URL in the Twilio Console includes the correct Kore transcript endpoint for your environment. Contact Kore support to confirm the correct URL.
ASR timeout too short ASR timeout is set too low (for example, 2000 ms). Calls end with  NO_INPUT before transcription completes. Increase botNoInputTimeoutMS to 8000 ms or higher. Verify sttMinConfidence is not set too high.
ChannelOverrideTemplate override Template-level timeout overrides Experience Flow settings silently. Set botNoInputTimeoutMS explicitly in the ChannelOverrideTemplate itself, not just in the Experience Flow.

Quick Reference — All Parameters

All parameters are set using:

userSessionUtils.setCallControlParam('paramName', value);
Parameter Vendor Value Range Purpose
AzureSegmentationSilenceTimeoutMs Azure ASR 800–2000 ms End-of-speech silence. Increase to reduce utterance splits.
continuousASR Azure, Google true/false Enable for digit or number collection.
continuousASRTimeoutInMS Azure, Google 1500–3000 ms Wait after the last input in continuous mode.
sttMinConfidence All vendors 0.4–0.7 Minimum confidence to accept an utterance. Lower = more permissive.
sttLanguage All vendors BCP-47 code ASR language (for example, en-US, hi-IN, es-419, ja-JP).
sttProvider All vendors Provider string ASR vendor selection (microsoft, google, deepgram).
botNoInputTimeoutMS All vendors 5000–15000 ms Wait before the no-input event. Increase for non-English.
deepgramUtteranceEndMs Deepgram 800–2000 ms End-of-utterance silence. Pair with deepgramEndpointing.
deepgramEndpointing Deepgram 300–600 ms VAD sensitivity. 300 ms = responsive; 600 ms = patient.
deepgramNumerals Deepgram true/false Spoken numbers → digits (five hundred500).
deepgramPunctuate Deepgram true/false Adds punctuation to transcripts.
deepgramKeyterms Deepgram String array Boost recognition of specific words.
ttsLanguage All TTS BCP-47 code TTS language. Must match voiceName language.
ttsProvider All TTS Provider string TTS vendor (microsoft, google, custom:sarvamTTS).
ttsLabel Google TTS Label string Must be set using a script — not UI only.
voiceName All TTS Voice identifier Must match ttsLanguage.
node.bargein All vendors true/false Enable or disable barge-in per node.
session.bargeInSensitivity All vendors low/medium/high Session-level barge-in sensitivity.
actionHookDelayInMs All vendors 5000–10000 ms Delay before filler music fires.
dtmfCollectMaxDigits All vendors 1–20 Maximum DTMF digits to collect (card numbers: 16).
dtmfCollectInterDigitTimeoutMS All vendors 3000–5000 ms Wait between DTMF digits.
dtmfCollectTermDigit All vendors # or * Termination digit for DTMF collection.
listenDuringPrompt All vendors true/false Allow ASR to listen while TTS is playing.