Skip to main content
The Runtime is the execution engine of Agent Platform. It receives messages from users and systems, executes agent logic, invokes tools, manages conversation state, and returns responses. Every agent interaction — regardless of channel or deployment environment — passes through the Runtime.

How a Message Executes

When a user sends a message, the Runtime processes it through a structured pipeline before returning a response. The cycle repeats until the agent produces a final text response or reaches the iteration limit. The default is 10 tool call iterations per turn, configurable in the agent’s execution settings.

Channels

The Runtime accepts messages from the following channels: Realtime (WebSocket or persistent connection):
ChannelDescription
Web ChatBrowser-based chat widget
SDKJavaScript/mobile SDK over WebSocket
APIDirect REST API call
VoiceGeneric voice, Twilio, LiveKit, and real-time voice pipeline
A2AAgent-to-Agent protocol for cross-service agent calls
Async / webhook (inbound event, outbound REST):
ChannelDescription
WhatsAppMeta Cloud, Gupshup, Infobip, Netcore
SlackSlack app
Microsoft TeamsMS Teams bot
MessengerMeta Messenger
InstagramInstagram Direct
TelegramTelegram bot
SMSTwilio SMS
EmailInbound email
ZendeskZendesk ticket and chat
GenesysGenesys Cloud contact center
HTTP AsyncGeneric webhook-based integration
Channel connections are configured per project in Studio under Settings → Connections. Each connection maps a channel to its authentication credentials, routing rules, and channel-specific options.

Tool Execution

Tools extend agent capabilities. When the LLM requests a tool call, the Runtime dispatches it to the appropriate executor and returns the result to the reasoning loop.
Tool typeWhat it does
HTTPCalls an external REST or GraphQL API with optional auth injection
MCPConnects to a Model Context Protocol server; tools are discovered from the server’s capability manifest
CodeExecutes JavaScript or Python in an isolated sandbox
ConnectorUses a named integration (Salesforce, Jira, and others) with credential injection
WorkflowInvokes a registered workflow; supports both synchronous and long-running async execution
Knowledge BaseQueries a SearchAI knowledge base and returns ranked results
Async WebhookSends a request to an external system and suspends the session until a callback is received

Execution Pipeline

When a tool executes, the Runtime processes it in sequence:
  1. Resolves the tool binding from the deployment configuration.
  2. Validates inputs against declared parameter types before the call.
  3. Makes the external call with the appropriate authentication.
  4. Processes the result — available in conversation context for reasoning agents, or as session variables for agents with steps.
  5. Handles errors using ON_ERROR handlers: retry logic, fallback responses, or escalation triggers.
Tool execution also enforces timeouts, categorizes errors, and retries automatically on transient failures. For setup and configuration, see Tools Overview.

Session Management

Each conversation is represented as a session. A session stores the conversation history, variables, agent state, and execution metadata.

Session Lifecycle

StateDescription
ActiveProcessing a request
IdleWaiting for the next user message
CompletedConversation ended normally
FailedTerminated due to a runtime error
AbandonedEnded due to user inactivity
EscalatedTransferred to a human agent
ArchivedRetained for historical reference
Sessions time out after 24 hours of inactivity. Conversation history is retained for 90 days.

Conversation Window and Compaction

The Runtime maintains a sliding window over the conversation history to control how much context is sent to the LLM on each turn. The default window is 40 messages. When the window fills, the Runtime can compact older turns into a summary rather than discarding them. This preserves context from earlier in long conversations without increasing token usage. Compaction is disabled by default and can be enabled in Runtime Config in Studio.
SettingDefaultDescription
Conversation window40 messagesMaximum messages sent to the LLM per turn
CompactionDisabledWhen enabled, summarizes older turns as the window fills
Compaction threshold80%How full the window must be before compaction triggers

Concurrency

The Runtime uses a configurable strategy to handle multiple messages arriving within the same session:
StrategyBehaviorWhen to use
Serial (default)Messages are queued and processed one at a time, in order.Most conversational agents — ensures each message has full context from the previous one.
PreemptiveA new message cancels in-progress execution and starts fresh.Real-time interfaces where the user may correct themselves mid-response.
ParallelMultiple messages process simultaneously.Batch operations where messages are independent.

Multi-Agent Orchestration

The Runtime executes multi-agent topologies defined in ABL. When routing rules match, the Runtime transitions the active thread to the target agent, forwards context, and manages the return path.
PatternWhat happens at runtime
SupervisorReceives every message; evaluates HANDOFF rules top-to-bottom; routes to first match
HandoffTransfers conversation to the target agent; optionally returns control when RETURN: true
DelegateSends a task to a sub-agent; blocks the parent until the sub-agent completes or times out
Fan-outDispatches multiple agents in parallel; merges results when all complete
EscalationTransfers the conversation to a human agent via a connected agent desktop

Thread Hierarchy

When a supervisor hands off to a specialist, the Runtime creates a thread within the existing session — not a new session. Threads form a stack: handoffs push new threads, completions pop back to the parent. The user experiences one continuous conversation regardless of how many agents participate. Each thread maintains its own conversation history and gathered variables, but can read data from parent threads. For configuration, syntax reference, and examples, see Multi-Agent Orchestration.

Observability

Every execution path emits structured trace events. Traces are accessible from the Sessions page in Studio.
Event typeWhat it captures
llm_callLLM invocation — model used, token count, latency
tool_callTool invocation request with input parameters
tool_resultTool execution result or error
decisionRouting or flow decision outcome
handoffAgent-to-agent transfer
state_changeSession variable update
guardrail_evalGuardrail policy evaluation result
errorRuntime or execution error
Each session also captures summary metrics: total messages, runtime cost, token consumption, response latency, tool latency, and model usage breakdown. For details on navigating sessions and traces in Studio, see Sessions and Runtime Diagnostics.

Rate Limits

The Runtime enforces per-tenant rate limits on a rolling 1-minute window.
DimensionDefault limit
Requests100 / min
LLM tokens100,000 / min
Concurrent sessions50
Tool calls200 / min
Messages per session30 / min
When a limit is exceeded, the Runtime returns HTTP 429. The response includes a Retry-After header (seconds to wait) and X-RateLimit-Remaining. Contact your account team to adjust limits for your plan.

Limits Reference

LimitValue
Request body size1 MB
WebSocket message size512 KB
Tool iterations per turn10 (default)
Conversation window40 messages
Session TTL (inactivity)24 hours
Conversation history retention90 days

Troubleshooting

Agent stops responding mid-conversation Open the session in Studio and check the Traces tab. Look for an error event or a tool_result with a non-200 status. Tool failures that aren’t handled by an ON_FAILURE block can cause the reasoning loop to stall. Earlier context is missing in long conversations Once the conversation window fills (40 messages by default), older messages are no longer sent to the LLM. Enable compaction in Runtime Config to preserve earlier context as a summary rather than dropping it. Tool calls return unexpected results Check the tool_call and tool_result trace events in Studio for the exact request sent and response received. Verify that the tool’s auth profile and endpoint are correct in Settings → Connections. HTTP 429 — Too Many Requests The Retry-After header tells you how many seconds to wait before retrying. If you hit limits consistently during evaluation, check whether multiple concurrent test sessions are sharing the same tenant quota. Agents route to the wrong specialist HANDOFF rules evaluate top-to-bottom and the first match wins. Check rule ordering in the Supervisor — overly broad conditions placed above specific ones will capture requests before the intended rule is reached. See Orchestration Troubleshooting.

See Also