Skip to main content
Transcription converts user speech into text during real-time voice interactions. This text is used for logging, analytics, and agent processing within the platform. By configuring transcription settings, you can significantly improve accuracy, latency, and recognition of domain-specific terms. Configuring transcription helps:
  • Improve accuracy
  • Reduce latency
  • Better handle domain-specific vocabulary
  • Support multilingual deployments
  • Supported for real-time models from OpenAI and Azure OpenAI providers.
  • Applicable across the app.

Enable Transcription Settings

  1. Go to Agent Platform → Orchestrator
  2. Enable Voice-to-Voice interactions
  3. Open Voice AI Model Settings
  4. Scroll to Input Audio Transcription
  5. Configure:
    • Transcription Language
    • Transcription Prompt
  6. By default, the transcription language is set to auto-detect, and the prompt is empty
  7. Click Save

Transcription Configurations

Transcription Language

Specifies the language of user speech.
  • Default: Auto-detect
  • Format: ISO-639-1 (for example, en, hi, ta)
Best Practices
  • Use Auto-detect for global/multi-language apps
  • Set a specific language for:
    • Known user base
    • Better accuracy
    • Lower latency

Transcription Prompt

Provides context to the ASR model that transcribes asynchronously. For example, it helps recognize:
  • Product names
  • Acronyms
  • Industry-specific terms
This prompt only affects transcription, not conversation responses.