Testing, deployment & operations

Studio provides a complete set of pages for evaluating agent quality, deploying agents to production environments, and monitoring live conversations. These capabilities span the Evaluate and Operate groups in the project sidebar.

Evaluations

Evaluations provide a systematic framework for testing and measuring agent quality. Studio’s evaluations system lets you define test personas, create realistic scenarios, configure automated evaluators, bundle them into evaluation sets, and run evaluations against your agents.

Evaluations page

Navigate to Evaluate > Evaluations from the project sidebar. The evaluations page is organized into five tabs:

Tab	Purpose
Personas	Define synthetic user profiles that simulate real customers
Scenarios	Create test conversation scripts and situations
Evaluators	Configure automated judges that score agent responses
Eval Sets	Bundle personas, scenarios, and evaluators into reusable test suites
Runs	Execute evaluations and review results

A Quick Eval button in the header provides a shortcut for running a fast, ad-hoc evaluation without setting up a full eval set.

Personas

Personas represent the types of users who interact with your agents. Each persona defines a user profile that shapes how test conversations unfold. To create a persona, select the Personas tab, click Create Persona, and fill in:

Name — a descriptive label (e.g., “Frustrated Customer,” “Technical Expert,” “New User”).
Description — background information about this user type.
Traits — behavioral characteristics that influence conversation style (e.g., impatient, detail-oriented, non-technical).
Goals — what this persona is trying to achieve when talking to the agent.
Context — additional background information (account type, history, preferences).

Tip: Create personas that represent your actual user segments. Include edge-case personas (e.g., users with accessibility needs, users who speak in short phrases, users who provide irrelevant information) to test agent robustness.

Scenarios

Scenarios define specific situations or conversation flows to test. To create a scenario, select the Scenarios tab, click Create Scenario, and configure:

Name — a descriptive title (e.g., “Password Reset Request,” “Product Return with Missing Receipt”).
Description — the situation being tested.
Initial message — the opening message that starts the test conversation.
Expected flow — key steps or outcomes the conversation should reach.
Success criteria — what constitutes a successful resolution.
Variables — dynamic data used in the scenario (order numbers, account IDs, etc.).

Evaluators

Evaluators are automated judges that score agent performance on specific dimensions. To create an evaluator, select the Evaluators tab, click Create Evaluator, and configure:

Name — the quality dimension being measured (e.g., “Helpfulness,” “Accuracy,” “Tone”).
Description — what this evaluator assesses.
Scoring rubric — criteria for each score level (e.g., 1-5 scale with descriptions for each level).
Evaluation prompt — the instructions given to the LLM judge for scoring.

Studio provides templates for common evaluation dimensions: Helpfulness, Accuracy, Tone, Completeness, and Efficiency.

Eval sets

Eval sets bundle personas, scenarios, and evaluators into reusable test suites. To create an eval set, select the Eval Sets tab, click Create Eval Set, and configure a name, description, and selections for personas, scenarios, and evaluators. The evaluation engine runs every combination of persona and scenario, scoring each conversation with all selected evaluators.

Tip: Start with a small eval set (2-3 personas, 3-5 scenarios, 2-3 evaluators) to validate your setup before scaling to larger test suites.

Running evaluations

To start a run, select the Runs tab, click Run Evaluation (or use Quick Eval for an ad-hoc run), select the eval set, choose the target agent and environment, and click Start. Each evaluation run follows this pipeline:

Test data generation — creates conversations from persona-scenario combinations.
Conversation execution — simulates conversations with the agent using each persona-scenario pair.
LLM judge evaluation — runs each evaluator against completed conversations to generate scores.
Recommendation generation — analyzes results and produces improvement recommendations.

Active runs display progress indicators showing total conversations to execute, completed vs. remaining, and current pipeline step.

Reviewing results

Completed runs show an overview with overall score averages, score distribution (high, medium, low), and pass/fail rates based on configured thresholds. Drill into individual conversations to see full transcripts, per-evaluator scores with justifications, and highlighted issues. Comparison view — compare results across multiple runs to track improvement over time or A/B test different agent configurations. Heatmap view — scores across all persona-scenario combinations, making it easy to spot patterns such as a persona that consistently scores low or a scenario causing failures. Quick Eval provides a streamlined path for fast evaluations: select an agent, choose a few scenarios (or let the system auto-generate them), and run. Quick Eval skips the full eval set configuration and produces results faster, making it useful for iterative development.

Deployment & channels

Deployment moves your agents from development into environments where real users can interact with them. Studio manages the deployment pipeline, environment configuration, channel setup, and API key management.

Deployments page

Navigate to Operate > Deployments from the project sidebar. The page is organized into three tabs:

Tab	Purpose
Environments	Manage deployment environments and active deployments
Channels	Configure communication channels (web, voice, API)
API Keys	Create and manage API keys for programmatic access

Environments

Environments represent deployment targets where agents run. Common environments include development, staging, and production. Each environment card displays the environment name, status badge (active, inactive, or deploying), entry agent, and created timestamp. Creating a deployment:

Select the Environments tab.
Click Deploy Agent.
Configure the environment, entry agent (the initial conversation entry point), and version (defaults to the current working copy).
Click Deploy.

Promoting deployments — to promote a deployment from one environment to another (e.g., staging to production), click the promote action on the deployment card, select the target environment, optionally copy environment-specific variables, and confirm. Environment variables — each environment can have its own set of configuration variables that override project-level defaults. Use environment variables for API endpoints that differ between staging and production, feature flags, and environment-scoped credentials.

Channels

Channels define how users connect to your deployed agents. Studio supports multiple channel types: Web channel — deploy a web chat widget that can be embedded in your website. Configure the widget appearance (chat bubble position and color, welcome message, branding options) and copy the embed code snippet. Voice channel — connect agents to voice interfaces. Configure speech-to-text provider, text-to-speech voice selection, latency targets, and voice interaction parameters. API channel — expose agents through a REST API. Configure authentication method, rate limits, and response format preferences. Use the provided endpoint and keys for programmatic access. Each channel card displays the channel type and name, active/inactive status, and configuration summary. Click a card to view or edit its detailed configuration.

API keys

The API Keys tab manages keys used for programmatic access to deployed agents. To create an API key, click Create API Key, configure a name, permissions scope, and optional expiration date. Copy the generated key immediately — it is shown only once. From the keys list you can view active keys with usage metadata, revoke keys, and rotate keys by creating a new key and revoking the old one.

Warning: Store API keys securely. Do not embed them in client-side code or commit them to version control.

Deployment pipeline

The deployment pipeline supports a structured flow from development to production:

Development — agents run in a development environment for initial testing.
Staging — pre-production environment for integration testing and validation.
Production — live environment serving real users.

Agent versions can be promoted through these stages, with each promotion creating an auditable record.

Operations

Operations pages provide tools for monitoring, troubleshooting, and intervening in live agent conversations.

Session browser

Navigate to Operate > Sessions from the project sidebar. The session browser shows all conversations between users and agents in your project. Sessions list — conversations are displayed in a sortable, filterable table with columns for Session ID (click to copy), Agent Name, Created At, Message Count, and Trace Event Count. Use the date range filter (Last 24h, Last 48h, This Week, Last 7 Days, This Month, Last 30 Days, All), column sorting, and pagination (20 per page) to find specific sessions. The sessions page provides two tabs: Conversations (the session table view) and Traces (a dedicated trace viewer for exploring execution traces across all sessions). Session detail view — click any session row to open the session detail page:

Conversation tab — full conversation transcript, agent conversation tree visualization showing branching across agents in multi-agent projects, and session summary panel with metadata.
Trace tab — execution trace timeline showing every action the agent took, including LLM calls, tool invocations, handoffs, state changes, and errors. Each event shows timing information and expandable request/response payloads.

Tip: Use the trace tab to diagnose why an agent behaved unexpectedly. Trace events show the complete decision chain, including which tools were called, what the LLM reasoned, and where handoffs occurred.

Human-in-the-loop inbox

Navigate to Operate > Inbox from the project sidebar. The inbox consolidates all tasks that require human attention. Task types:

Type	Description
Approval	A workflow step or agent action requires explicit approval before proceeding
Data Entry	The agent needs information that must be provided by a human operator
Review	Agent output or a decision requires human review before finalization
Decision	A choice point where a human must select the next course of action
Escalation	An agent has escalated an issue that it cannot resolve autonomously

Filter tabs at the top of the inbox let you view all tasks or filter to a specific type. Each tab shows a count badge. Task cards display the title, priority indicator, SLA countdown, task type badge, and timestamp. Click a task card to expand the action panel: approve/reject approvals, fill in data entry forms, mark reviews as reviewed, select from decision options, or resolve escalations. After responding, the task is removed from the inbox and the associated workflow or agent conversation resumes. The inbox polls for new tasks every 5 seconds.

Transfer sessions

Navigate to Operate > Transfer Sessions from the project sidebar. This page monitors active agent transfer sessions — conversations being handed off between agents or between an agent and a human representative. The transfer session table displays Session ID, Status, Provider (e.g., SmartAssist, Genesys, NICE, Five9), Channel (Chat, Voice, Email, Messaging), and timestamps. Status values:

Status	Description
Pending	Transfer initiated, waiting to be picked up
Queued	Transfer is in the queue for the target agent or human
Active	Transfer is in progress
Post-Agent	Transfer completed, in post-processing
Ended	Transfer is complete

Use filter dropdowns (provider, status, channel) to narrow the view. Click a transfer session row to open a detail modal with the full transfer timeline, context and metadata, and management actions.

Alerts

Alerts keep you informed about events that require attention across your project. Navigate to Operate > Alerts from the project sidebar.

Approval alerts

The Approvals tab displays workflow steps that are waiting for human approval. This is a focused view of the approval tasks that also appear in the Inbox. Each approval card shows the workflow name, step name, who requested it, when it was requested, and relevant context. Click a card to approve or reject with an optional comment.

Alert rules

The Alert Rules tab lets you configure automated notifications for important events:

Agent errors exceed a threshold — get alerted when an agent’s error rate spikes above a configured percentage.
Session volume changes — notifications for unusual increases or decreases in conversation volume.
SLA breaches — alerts when human-in-the-loop tasks are not responded to within the configured time window.
Evaluation score drops — notifications when evaluation scores fall below a minimum threshold.
Deployment events — alerts when agents are deployed, promoted, or rolled back.

Note: Alert rules are being actively developed. The page provides visibility into the planned notification capabilities.

Notification channels

When alert rules are configured, notifications can be delivered through:

In-app notifications — displayed within Studio.
Email — sent to configured recipients.
Webhook — sent to external systems for integration with tools like Slack, PagerDuty, or custom dashboards.

Agent development — building and testing agents
Tools, knowledge & connections — tools and workflows that generate sessions
Insights & settings — analytics derived from sessions, and project configuration
Studio overview — navigation and project management

Building Agents

Administration

References

Testing, deployment & operations

Testing, deployment & operations

Evaluations

Evaluations page

Personas

Scenarios

Evaluators

Eval sets

Running evaluations

Reviewing results

Deployment & channels

Deployments page

Environments

Channels

API keys

Deployment pipeline

Operations

Session browser

Human-in-the-loop inbox

Transfer sessions

Alerts

Approval alerts

Alert rules

Notification channels

Building Agents

Administration

References

Documentation Index

​Testing, deployment & operations

​Evaluations

​Evaluations page

​Personas

​Scenarios

​Evaluators

​Eval sets

​Running evaluations

​Reviewing results

​Deployment & channels

​Deployments page

​Environments

​Channels

​API keys

​Deployment pipeline

​Operations

​Session browser

​Human-in-the-loop inbox

​Transfer sessions

​Alerts

​Approval alerts

​Alert rules

​Notification channels

​Related pages

Testing, deployment & operations

Evaluations

Evaluations page

Personas

Scenarios

Evaluators

Eval sets

Running evaluations

Reviewing results

Deployment & channels

Deployments page

Environments

Channels

API keys

Deployment pipeline

Operations

Session browser

Human-in-the-loop inbox

Transfer sessions

Alerts

Approval alerts

Alert rules

Notification channels

Related pages