This document provides information on the latest feature updates and enhancements introduced in the Voice Gateway of AI for Service (XO) v11.x releases. For previous updates, see release notes of 2025 releases.
IST Generative TTS: Text Streaming SupportIST Generative TTS now supports text streaming, configurable via call control parameters. Enable streaming at the agent node level to reduce latency and improve response delivery. SSML is supported.
Grok Voice Models for Real-Time VoiceThe Platform now supports the Grok Voice (xAI) model for real-time voice interactions alongside the existing providers. Upon configuration, it enables real-time audio streaming, response handling, and logging. It maintains backward compatibility with the existing providers.
Google Cloud TTS Streaming SupportGoogle Cloud Text-to-Speech now supports both audio and text streaming. Audio streaming is enabled by default, while text streaming requires the TTS Streaming and Model Response Streaming flags. Streaming works across standard flows, Agent Node, and Agentic App use cases, with seamless playback, defined latency thresholds, fallback handling, and full backward compatibility. Support applies only to HD (Chirp) voices.Azure TTS Text Streaming Support in Voice GatewayVoice Gateway now supports Azure TTS text streaming, enabling progressive speech synthesis for real-time LLM responses. Streaming operates over WebSocket with seamless audio continuity, validation controls, and graceful fallback to non-streaming mode when needed.Deepgram ASR: Flux Model IntegrationVoice Gateway now supports the Deepgram Flux ASR model, delivering improved turn detection, lower latency, and better transcription quality. The ASR selection dropdown includes Deepgram ASR-Flux, supporting English across all accents.
Remove Tool-Call Audio in Agentic AppThe system no longer plays default music when the Agentic App makes a tool call using real-time voice APIs. The platform remains silent during tool execution until the bot responds, preventing audio overlap—even if the response is delayed.
SIP Trunk Configuration: Same DID HandlingIf a SIP trunk uses the same DID with a different IP/FQDN, the system allows it across different accounts or apps, but prompts for confirmation within the same account and app. If both the DID and IP/FQDN match an existing entry, the system blocks creation and displays an error message.
TTS Streaming at Start Flow LevelTTS Streaming can be configured at the start flow level to reduce voice response latency. The platform maintains a persistent streaming connection throughout the call and adapts playback based on the model’s response streaming behavior.Continuous Gather at Start Flow LevelProvided a Continuous Gather option at the start flow when TTS streaming is enabled. Caller input is captured continuously to reduce latency and support agentic voice interactions, without altering default behavior unless configured.Model Selection for ASR and TTS ProvidersThe platform now permits model selection for ASR and TTS providers directly from the app-level and Start Flow-level UIs. The selected model is consistently applied across design-time and runtime voice scenarios, including interactions, transcriptions, and monitoring, without requiring call-control parameters.TTS Providers: Required Language SupportThe platform ensures that all natively supported TTS providers offer support for required languages wherever the vendor supports them. This applies to AWS Amazon Polly, Google, Microsoft Azure, ElevenLabs, OpenAI TTS, Deepgram, and similar integrations. The required languages include English, Japanese, Spanish, German, Arabic, French, Hindi, and Filipino.