Automatic Speech Recognition (ASR)
Kore.ai supports the following third-party service providers for ASR services:
ASR | On-Prem / Cloud | Languages | Regions | Word Error Rate (WER) | Comments |
Cloud | https://cloud.google.com/speech-to-text/docs/speech-to-text-supported-languages | 1. https://cloud.google.com/speech-to-text/v2/docs/speech-to-text-supported-languages 2. https://cloud.google.com/speech-to-text/v2/docs/locations |
4-9% | 1. Good for shorter utterances like ‘yes’, and ‘no’. 2. Good for number inputs, and alphanumeric inputs (for example, IDs, SSN, etc.). 3. Supports class tokens, so that output format can be formatted up to some extent. 4. Hints, Hint-hosts are supported. 5. Extensive language support. |
|
Deepgram | Cloud and On-Prem | https://developers.deepgram.com/docs/models-languages-overview | Supports all regions across the globe | 3.44% | 1. Hints are supported 2. Custom language models can be prepared with the help of the Deepgram technical team 3. Smart formatting feature detects the type of input (like numbers, dates, etc.) and gives transcription in the respective format. |
Azure | Cloud and On-Prem | https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt | https://learn.microsoft.com/en-us/azure/ai-services/speech-service/regions?tabs=geographies | 5-10% | 1. Preferred ASR provider. It should be the default for new accounts. 2. Low WER, lots of flexibility with custom models. 3. Hints are supported. 4. Extensive language support. 5. Custom language models can be created/deployed through DIY (through the Azure portal). |
Nvidia Riva (Nvidia) | On-Prem | https://docs.nvidia.com/deeplearning/riva/user-guide/docs/asr/asr-overview.html | – | 6.67% | |
Amivoice ASR ( Advanced Media Inc) | Cloud | https://docs.amivoice.com/en/amivoice-api/manual/supported-languages/ | While their API is accessible globally, the physical processing and data storage for their cloud platform are primarily based in Japan. | N/A | |
Amazon Transcribe | Cloud | https://docs.aws.amazon.com/transcribe/latest/dg/supported-languages.html | https://docs.aws.amazon.com/general/latest/gr/transcribe.html | 2.60% | |
Gnani.AI | Cloud and On-Prem | https://gnani-ai.github.io/API-service/ | Gnani.ai offers ASR that can be deployed in a region of customer choice if you opt for a private cloud or an on-premises solution | 2% |
Text to Speech (TTS)
Kore.ai supports the following third-party service providers for TTS services:
TTS | On-Prem / Cloud | Languages | Regions | Comments |
Cloud | https://cloud.google.com/text-to-speech/docs/list-voices-and-types | Google Cloud TTS is a cloud-based service, so it operates within Google Cloud’s global infrastructure. (https://cloud.google.com/about/locations) | ||
Azure | Cloud and On-Prem | https://learn.microsoft.com/en-us/azure/ai-services/speech-service/language-support?tabs=stt | https://learn.microsoft.com/en-us/azure/ai-services/speech-service/regions?tabs=geographies | 1. Extensive language support. 2. An extensive number of voices. 3. Custom voice preparation can be done through the portal. 4. SSML support (limited to MS Azure-supported tags). |
Open AI TTS | Cloud | https://platform.openai.com/docs/guides/text-to-speech/supported-languages | 1. Human-like voices. 2. Limited number of voices. |
|
Eleven Labs | Cloud | https://elevenlabs.io/docs/capabilities/text-to-speech | 1. Human-like voices. 2. Speed, temperature, and stability can be controlled through call control parameters. 3. Voice cloning is possible with voice samples ranging from 30 seconds to 1 minute. |
|
AWS | Cloud | https://docs.aws.amazon.com/polly/latest/dg/supported-languages.html | https://docs.aws.amazon.com/general/latest/gr/pol.html | |
Gnani.AI | Cloud and On-Prem | https://gnani-ai.github.io/API-service/ | ||
playHT | Cloud and On-Prem | https://play.ht/faq/ | ||
Deepgram | Cloud and On-Prem | https://developers.deepgram.com/docs/language | 1. Limited number of languages. 2. Human-like voices. |
|
Nvidia Riva TTS | On-Prem |
Voice Biometrics
Kore.ai supports the following third-party service providers for voice biometrics:
Voice Biometric Vendor | Voice Biometric Engine | On-Prem / Cloud | Comments |
ID R&D | ID Voice | – | – |