Skip to main content
Insights provides analytics and monitoring for your AI agent program. It brings together executive dashboards, evaluation pages, voice channel diagnostics, and a configurable pipeline framework in a single section, giving teams visibility into agent quality, customer sentiment, operational efficiency, and cost performance.
InsightPurpose
DashboardExecutive overview with KPIs, trend charts, outcome distribution, ROI metrics, and a filterable conversation list
AnalyticsEvent volume, LLM performance, cost tracking, and session/trace exploration with granular time ranges
Billing and UsageBilling-unit consumption reporting with time-range controls for cost governance
Agent PerformancePer-agent quality scorecards with side-by-side comparison, status flagging, and quality trends
Quality MonitorAggregated quality health across five evaluation dimensions with trend analysis and issue flagging
Customer InsightsIntent distribution, sentiment trajectory, frustration detection, and resolution tracking
FeedbackEnd-user ratings and verbatim comments captured from chat sessions, with multi-filter controls
Voice AnalyticsCall quality (MOS), ASR accuracy, end-to-end latency, barge-in, and DTMF fallback metrics
Agent TransferEfficiency, queue performance, and human-agent metrics for escalated conversations
PipelinesBuilt-in and custom analytics pipelines with a visual node-based editor for custom evaluation logic

Before You Begin

Confirm the following before working with Insights pages:
  • You must have at least Viewer-level access to the project.
  • Agent Performance, Quality Monitor, and Customer Insights require analytics pipelines enabled in Settings. Without pipelines, these pages display a placeholder prompting you to enable them.
  • Voice Analytics requires at least one voice channel deployment to generate data. Without a voice deployment, the page is empty.
  • Dashboard KPI cards, such as Quality Score and Avg Sentiment, display a dash (–) until pipelines evaluate sufficient conversation data.

Accessing Insights

Navigation: ProjectSidebarInsights The Insights sidebar lists all available pages. Click any page name to navigate directly.

Dashboard

The Dashboard page provides a pre-built executive overview of your AI agent program. It aggregates key performance indicators, trend visualizations, outcome breakdowns, and a conversation-level drill-down into a single view, giving stakeholders immediate visibility into agent performance without any configuration. Use it as the starting point for daily operational checks or to prepare data for leadership reviews. Navigation: ProjectInsightsDashboard Date range selector: A toggle in the top-right corner lets you select 7d, 30d (default), or 90d. Changing the range refreshes all KPI cards, charts, and conversation data on the page. Dashboard KPI Metric Cards Six metric cards appear at the top of the page. Each card displays a primary value and, where applicable, a sub-label with supporting context. Warning icons appear on cards where the metric falls below expected thresholds.
MetricDescription
ConversationsTotal conversation count in the selected period. The sub-label shows how many conversations pipelines evaluated.
Containment RatePercentage of sessions the agent resolved without human escalation. A warning icon appears when the rate drops below the platform-configured threshold. The sub-label shows the resolved count versus the total evaluated.
Quality ScoreAggregated quality score across all evaluated conversations, derived from pipeline evaluations. Displays a dash (–) if no quality pipeline has processed data yet.
Avg SentimentAverage sentiment score across all conversations in the period. Displays a dash (–) if the sentiment pipeline has not yet run or if the platform lacks sufficient data.
Cost SavingsEstimated cost savings compared to human-handled conversations. A positive value indicates cost-efficiency; a negative value indicates the program has not yet reached cost parity with human support.
Escalation RatePercentage of sessions escalated to a human agent. The sub-label shows the escalated count versus the total evaluated.
Tabs Below the KPI cards, four tabs organize the dashboard’s detailed views:
TabWhat it shows
OverviewA Conversation Volume & Containment Rate trend chart plotting daily conversation count and containment percentage over the selected period. Below the chart, an Outcome Distribution horizontal bar breaks down conversations into three categories: Resolved (green), Contained-unresolved (amber: the AI handled the conversation but resolution remains uncertain), and Escalated (red). Each segment shows its count and percentage.
TrendsLongitudinal trend lines for all core KPIs, conversations, containment, quality, sentiment, cost savings, and escalation, over the selected period. Use this tab to identify sustained improvements or regressions across multiple metrics simultaneously.
ROIReturn-on-investment metrics comparing agent costs to human-handled baselines. Includes cost-per-conversation, total savings, and efficiency ratios.
ConversationsA filterable, sortable list of individual conversations with columns for status, outcome, agent name, duration, and key metrics. Click any row to open the full conversation detail.

Analytics

The Analytics page monitors event volume, LLM performance, token consumption, and cost in near real time. Unlike the Dashboard, it offers granular time controls down to 30-minute windows—useful for investigating production incidents, tracking the impact of model changes, or auditing LLM spend during peak traffic. Navigation: ProjectInsightsAnalytics Time range controls Analytics supports the most granular time ranges in the Insights section: 30m, 1h, 3h, 6h, 12h, 24h, 2d, 7d, 30d, or a Custom range where you specify exact start and end timestamps. This granularity is especially useful for correlating agent errors or latency spikes with specific deployment events. Analytics Overview Tab Six metric cards summarize the selected period at a glance:
MetricDescription
SessionsTotal sessions in the selected period. A session represents a single end-to-end interaction between a user and the agent system.
MessagesTotal messages across all sessions, including both user messages and agent responses.
LLM CallsTotal LLM API calls agents made during the period. Includes calls to all configured models (for example, routing, generation, evaluation).
ErrorsTotal errors during agent execution, including LLM failures, timeout errors, and tool invocation errors.
TokensTotal LLM tokens (input + output) across all calls.
CostEstimated cost based on token usage and per-model pricing. Use this to track spend against budgets or compare cost-efficiency across model configurations.
Additional Tabs
TabPurpose
LLM PerformanceModel-level metrics including per-call latency distributions, average tokens per call, error rates by model, and throughput. Use this tab to benchmark model performance and identify candidates for optimization or replacement.
Sessions ExplorerBrowse and filter individual sessions with full conversation details, trace counts, token usage, and duration. Each session row expands into a detailed view with turn-by-turn replay, token breakdown, and model information.
Traces ExplorerSearch and inspect individual trace events across sessions. Filter by event type, agent name, or error status to isolate specific execution paths for debugging.
QueryRun custom analytics queries against project event data using a query interface. Useful for ad-hoc investigations that don’t fit pre-built views.
Session Detail View When you open a session from the Sessions Explorer (or from the Trace Viewer under Evaluate), the platform displays a comprehensive session detail page with the following sections:
SectionDetails
Session headerSession ID, agent name, trace status badge (for example, “history / partial”), total traces, total tokens, and session cost in a summary bar at the top.
Conversation paneFull turn-by-turn dialog replay showing user messages and agent responses. Each agent’s turn displays the responding agent’s name and response latency. Multi-agent sessions show hand-offs between agents.
Session Overview tabAgent name, session ID, message count, trace event count, connection state (Connected / Disconnected), and timestamps (Started, Finished).
Token BreakdownTokens In, Tokens Out, Total Tokens, LLM Calls count, and total Cost in a grid of metric cards.
Models UsedLists each LLM model the session invoked, with its model identifier and version string.
Trace tabsTabbed navigation across: Overview, Traces, Errors, Data, Conversation, Performance, IR (Intermediate Representation), and a Traces download option.
Timeout DiagnosticsBrowser idle timeout and Access Token TTL values for the session, useful for diagnosing disconnections.

Billing and Usage

The Billing and Usage page provides a consolidated view of your project’s billing-unit consumption, enabling finance and operations teams to monitor spend, forecast costs, and ensure the project stays within allocated budgets. The platform calculates billing data from materialized processing batches, so there may be a short delay before the most recent usage appears. Navigation: ProjectInsightsBilling and Usage Use the time range selector to view usage for the last 7 days, 30 days, or 90 days. The page displays aggregated billing-unit counts, breakdowns by resource type (LLM calls, token consumption, pipeline executions), and trend lines showing usage over the selected period.

Agent Performance

The Agent Performance page lets you monitor and compare the quality of every agent in your project across all evaluation dimensions. It surfaces which agents are performing well, which need attention, and how quality trends over time—useful for multi-agent architectures where different agents handle different conversation types. Navigation: ProjectInsightsAgent Performance Date range selector: Use the toggle in the top-right corner to select 7d, 30d, or 90d. A Compare button next to the date selector opens a side-by-side agent comparison view. Agent Performance Agent Health Summary A banner at the top of the page displays the total number of agents, total conversations evaluated, and a status breakdown showing how many agents the system flags as Critical (red) versus Healthy (green). This gives you an instant read on overall agent health before diving into individual scores. KPI Metric Cards Five metric cards show aggregated scores across all agents:
MetricDescriptionScale
QualityAggregated quality score across all evaluated conversations. A warning triangle appears if the score falls below the threshold.0–5 (avg score)
Hallucination RatePercentage of agent responses the system flags for unsupported claims, self-contradictions, or factual inaccuracies.0–100% (lower is better)
Knowledge GapsCount of conversations where the agent lacked sufficient knowledge base coverage to answer the query.Count (lower is better)
Safety ScoreGuardrail pass rate, the percentage of responses passing all configured safety guardrails.0–100% (higher is better)
Context ScoreAverage score for how well agents preserved relevant conversational context across multi-turn interactions.0–5 (avg score)
Agent Table Below the KPI cards, a searchable, sortable table lists every agent with the following columns:
ColumnDescription
AgentAgent name.
StatusHealth status: Critical (red badge) or Healthy (green badge), based on aggregate scores.
ConversationsNumber of conversations the agent handled in the selected period.
QualityAgent’s individual quality score (0–5).
HallucinationAgent’s hallucination rate (%).
Knowledge GapsCount of knowledge gap detections for this agent.
SafetyAgent’s guardrail pass rate (%).
ContextAgent’s context preservation score (0–5).
Use the search bar to filter by agent name. Toggle between Critical and All using the filter pills to focus on agents needing immediate attention. Quality Trend Chart A time-series chart at the bottom of the page plots two lines, Avg Quality and Flagged, over the selected period. The shaded area between the lines highlights the quality gap, making regressions visually obvious. Hover over any point to see exact values and dates.
This page requires analytics pipelines. Enable pipelines in Settings to start tracking agent quality, hallucination rates, knowledge gaps, and more. Without active pipelines, the page displays a placeholder.

Quality Monitor

The Quality Monitor page provides a centralized health check across all evaluation dimensions. Use it to assess how quality is trending and which dimensions need attention. It aggregates outputs from multiple pipelines into a unified scoring view with trend analysis, dimension-level drill-downs, and issue flagging. Navigation: ProjectInsightsQuality Monitor Date range selector: Use the toggle to select 7d, 30d, or 90d. Quality Monitor Quality Health Summary A banner at the top displays the total number of evaluated conversations, the aggregated quality score, and color-coded counts of dimension statuses: Critical (red), Warning (amber), and Healthy (green). Evaluation Dimension Cards Five dimension cards appear below the summary banner. Each card shows the dimension name, its current score or percentage, a mini sparkline showing the trend over the selected period, a count of flagged items, and a status icon (warning triangle for dimensions below threshold).
DimensionDescriptionScaleTarget
Overall QualityAggregated quality score across all evaluated dimensions.0–100%Higher is better
Faithfulness ScorePercentage of responses the system verifies as factually grounded and free of hallucinated content. Flags responses containing unsupported claims, self-contradictions, or fabricated information.0–100%Higher is better
Knowledge CoveragePercentage of queries where the knowledge base provides sufficient coverage to support the agent’s response. Gaps indicate topics that need additional knowledge base content.0–100%Higher is better
Safety ScorePercentage of responses passing all configured guardrail safety checks. The system flags violations for review.0–100%Higher is better
Context PreservationPercentage of responses correctly maintaining conversational context across multi-turn sessions. Flagged items indicate where the agent lost or incorrectly applied context.0–100%Higher is better
Quality Trend Chart A time-series chart plots all five dimensions as separate colored lines (Context, Guardrails, Hallucination, Knowledge Gap, Quality) over the selected period. Use this chart to correlate quality changes across dimensions, for example, a drop in Knowledge Coverage may coincide with a new intent category that the knowledge base doesn’t cover yet. Dimension Details Below the trend chart, a Dimension Details section lists individual evaluation results. Each row shows the evaluation name (for example, “Quality Evaluation”), its score, the number of flagged conversations, and a status badge (Warning, Critical, Healthy). Click a row to drill into the specific conversations that contributed to that score.

Customer Insights

The Customer Insights page helps you understand what customers are asking about and how they feel about the experience. It combines intent classification, sentiment scoring, frustration detection, and resolution tracking into a single view to help you identify emerging topics, detect dissatisfaction early, and measure whether the agent resolves the intents it encounters. Navigation: ProjectInsightsCustomer Insights Date range selector: Use the toggle to select 7d, 30d, or 90d. Customer Insight Customer Sentiment Summary A banner at the top displays a contextual summary of the analyzed data, for example: “322 analyzed conversations · 144 intent-classified · 322 sentiment-scored · 13 intents detected · 26.7% frustration.” A sentiment indicator (for example, “Mixed Sentiment”) provides a quick qualitative read. KPI Metric Cards
MetricDescription
Analyzed ConversationsTotal conversations that pipelines analyzed in the selected period. The sub-label shows the breakdown.
Unique IntentsNumber of distinct intents the system identified across all analyzed conversations.
Avg SentimentAverage sentiment score across all conversations. A score near 0 indicates neutral sentiment; positive values indicate positive sentiment.
Frustration RatePercentage of conversations where the system detected user frustration signals (repeated questions, negative language, escalation requests). A warning triangle appears when the rate exceeds the configured threshold.
Resolution RatePercentage of conversations reaching successful resolution. A warning triangle appears when the rate drops below the threshold. The sub-label shows the count of resolved conversations the system evaluated.
Intent Distribution A horizontal bar chart ranks detected intents by volume. Each bar shows the intent name, count, and percentage of total conversations. The chart groups low-volume intents (typically those below a threshold count) into an “Other” category. The footer indicates the total classified intent assignments and how many low-volume intents the chart grouped. Sentiment Trajectory A side-by-side panel displays conversation counts across three sentiment directions:
DirectionDescription
ImprovingConversations where sentiment trended positively over the course of the interaction.
StableConversations where sentiment remained consistent throughout.
DecliningConversations where sentiment deteriorated—candidates for investigation.
The footer indicates the total conversations with sentiment data contributing to the trajectory analysis. Trends Over Time A section below the distribution charts plots intent volumes and sentiment scores as time-series data, enabling you to track whether specific intents are growing or shrinking and whether sentiment is trending in the right direction.

Feedback

The Feedback page surfaces end-user feedback captured directly from chat sessions. It gives you access to raw ratings and verbatim comments so you can identify satisfaction patterns, detect underperforming agents or channels, and prioritize improvements based on what users say. Navigation: ProjectInsightsFeedback Date range selector: Use the toggle to select Today, 7d, or 30d. Filters Four filter controls let you narrow feedback results:
FilterDescription
All ratingsFilter by feedback rating. Use the dropdown to select a specific star rating or view all ratings.
Comment: anyFilter by comment presence. Choose whether to show all feedback, only entries with comments, or only entries without comments. Entries with comments are often the most actionable for qualitative analysis.
Agent nameFilter by the agent that handled the conversation. Type an agent name to narrow results, useful for isolating feedback for a specific agent after a deployment.
ChannelFilter by the channel through which the conversation occurred. Type a channel name to narrow results.
Click Refresh to reload feedback data with the current filter selections.

Voice Analytics

The Voice Analytics page provides a dedicated dashboard for monitoring call quality, speech recognition accuracy, and end-to-end latency across the voice processing pipeline. For voice-enabled agents, use this page to ensure audio quality, recognition accuracy, and response latency meet caller expectations, and to diagnose degradations before they affect customer satisfaction at scale. Navigation: ProjectInsightsVoice Analytics Date range selector: Use the toggle to select Today, 7d, or 30d. Voice Analytics KPI Metric Cards
MetricDescription
Total CallsNumber of voice calls in the selected period.
Avg MOSAverage Mean Opinion Score for call quality on a scale of 1–5. Scores below 3.5 typically indicate noticeable quality issues.
ASR QualityAutomatic Speech Recognition quality score (0–100, higher is better). Measures how accurately the ASR engine transcribes the caller’s speech.
E2E LatencyEnd-to-end latency in milliseconds for the voice processing pipeline. Covers the full round-trip from user speech input through ASR transcription, LLM processing, and TTS output back to the caller.
Barge-In RatePercentage of calls where the caller interrupted the agent mid-response. A rising barge-in rate may indicate that responses are too long or latency is too high, prompting callers to cut in.
DTMF FallbackPercentage of calls falling back to touch-tone (keypad) input, typically when ASR fails to understand the caller. A rising rate may indicate ASR quality issues or unsupported accents/languages.
Trend Charts
ChartDescription
Network Quality and Call VolumeDual-axis chart plotting MOS scores and call count trends over the selected period. Use this to correlate call quality dips with volume spikes, quality often degrades during peak traffic if capacity-constrained infrastructure cannot keep up.
Speech Recognition Quality (ASR)ASR quality scores plotted over time. Monitor for sustained degradation, which may indicate noisy caller environments, model drift, or the introduction of new vocabulary that the ASR model does not yet recognize.
Track E2E Latency trends after model or pipeline changes. Even small latency increases (50–100ms) can affect caller experience and drive up barge-in rates.

Agent Transfer

The Agent Transfer page provides efficiency and performance metrics for conversations that AI agents escalated to human operators. It helps operations teams monitor transfer volumes, queue wait times, and human-agent performance to ensure that escalated conversations receive timely, high-quality support. Navigation: ProjectInsightsAgent Transfer Date range selector: Use the toggle to select Today, 7d, or 30d. The page organizes data into three sections:
SectionWhat it shows
EfficiencyTransfer efficiency metrics split by channel: Voice, Chat, and overall Transfers. Includes transfer counts, average handling time, and resolution rates for escalated conversations.
Queue PerformanceQueue-level metrics including average wait time, longest wait time, queue abandonment rate, and handling rates per queue. Use this to identify queues that are understaffed or experiencing unusual demand.
Agent PerformanceHuman agent performance metrics for transferred conversations, including conversations handled, average handle time, resolution rate, and customer satisfaction scores per agent.

Pipelines

The Pipelines page is where Insights shifts from pre-built dashboards to user-defined analytics. The platform ships with built-in pipelines covering common evaluation needs—from sentiment analysis to anomaly detection—and provides a visual node-based editor for creating custom pipelines that encode your organization’s specific quality criteria. Pipelines are the engine behind Agent Performance, Quality Monitor, and Customer Insights: without active pipelines, those pages have no data to display. Navigation: ProjectInsightsPipelines Each pipeline card displays its name, description, enabled/disabled status (green “Enabled” badge or gray “Disabled” badge), trigger count, and last processed timestamp (or “Never processed” if the pipeline hasn’t run yet). Use the search bar at the top to find pipelines by name. Pipelines Tabs
TabPurpose
Built-inPre-configured pipelines that ship with the platform, ready to enable with a single toggle.
CustomUser-defined processing workflows created using the visual pipeline editor.
Recent RunsPipeline execution history with timestamps, durations, status (success/failure), and links to output data.
DataPipeline output data available for dashboard integration, export, and downstream consumption.

Built-in Pipelines

The platform ships with eleven pre-built pipelines covering evaluation, classification, detection, and monitoring needs. Enable each pipeline with a single toggle, no configuration required for the defaults.
PipelineDescription
Sentiment AnalysisPer-message sentiment scoring with conversation-level trajectory analysis. Feeds the Sentiment Trajectory chart in Customer Insights.
Intent ClassificationClassifies conversation intent using LLM analysis with customer-defined taxonomy or auto-discovery. Feeds the Intent Distribution chart in Customer Insights.
Quality EvaluationLLM-as-judge quality evaluation with configurable rubric dimensions. Feeds the Quality Score in Agent Performance and Quality Monitor.
Hallucination DetectionDetects unsupported claims, self-contradictions, and factual accuracy issues in agent responses.
Knowledge Gap AnalysisIdentifies gaps in knowledge base coverage by analyzing retrieval precision and uncovered topics.
Guardrail AnalysisEvaluates guardrail effectiveness — detects false positives, false negatives, and bypass attempts.
Context PreservationEvaluates whether agents preserve relevant user and workflow context through the conversation.
Friction DetectionDetects user frustration signals — rephrased questions, message escalation, caps, and exclamation patterns.
Anomaly DetectionMonitors analytics metrics for statistical anomalies using z-score and SPC (Statistical Process Control) charts.
Drift DetectionMonitors analytics metrics for gradual performance drift by comparing baseline and current windows.
Evaluation RunExecutes persona × scenario × evaluator matrix evaluation with bias mitigation and trajectory scoring.

Custom Pipelines

Custom pipelines let you define your own analytics logic using a visual, drag-and-drop node-based editor. Use custom pipelines to build organization-specific evaluation criteria that go beyond the built-in set, for example, regulatory compliance checks, brand voice adherence, or domain-specific accuracy scoring. How the editor works The editor presents a visual canvas where you construct a pipeline as a directed graph of connected nodes. A Node Palette on the left side provides draggable node types organized into categories:
CategoryNode types
DataRead Conversation, Read Message Window, Aggregate, Database Query, Filter, Transform Data, Inspect Output
LogicSub-Pipeline (reuse an existing pipeline as a step), Wait for Event (pause execution until a named event fires)
The core pattern: define a trigger, connect one or more processing nodes, then wire the output to a storage or metrics node. Pipelines-2 Pipeline structure
ComponentDescription
TriggerDefines when the pipeline runs. Trigger types include Kafka events, scheduled intervals, in-platform events (for example, conversation completed), or filtered subsets of sessions.
Processing nodesThe evaluation and transformation logic applied to each triggered item. Nodes can read conversation data, query databases, aggregate metrics, filter records, transform data shapes, or call LLMs for evaluation.
Metrics outputThe results the pipeline produces. Metrics take the form of named, typed values (counts, percentages, scores) that appear in the Data tab and that you can wire into dashboards.
Pipeline lifecycle Pipelines follow a Draft → Validate → Test → Activate lifecycle. The editor toolbar shows the current state (Draft badge, “Validation passed” indicator) and provides buttons for Test, Validate, Save, and Activate. You can iterate on draft pipelines without disrupting live ones, save drafts as you go, validate to catch configuration errors before deployment, and run test executions to verify output before activating. Sub-pipelines You can reference one pipeline as a step inside another, enabling reuse and composability. For example, a “Compliance Evaluation” pipeline could call both the built-in “Guardrail Analysis” and a custom “Regulatory Wording Check” as sub-steps. Attaching to dashboards Once a custom pipeline produces data, its metrics appear alongside built-in metrics in Agent Performance, Quality Monitor, and Customer Insights, giving teams a single pane of glass across both standard and organization-specific evaluation.
Start with the built-in pipelines to establish baselines, then create custom pipelines for organization-specific quality dimensions. The validate-before-activate workflow catches configuration errors before pipelines go live.