Documentation Index
Fetch the complete documentation index at: https://koreai.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Audience: Enterprise CTOs/VP Engineering, AI/ML leads evaluating agent platforms
Objection addressed: “Every platform claims the same features. Why is Kore’s depth different?”
Executive Summary
Every agent platform can demo the same features: tool calling, guardrails, multi-agent orchestration, observability. The demo is the surface. Beneath it, each capability has 3-4 levels of depth that only emerge in production — when your guardrails need to cascade across cost tiers, when your tool needs OAuth token rotation, when your agent hands off mid-conversation and the session state needs to survive, when 10 teams deploy 50 agents with different compliance policies. Most platforms stop at depth level 1 or 2. Production lives at level 4.
Kore’s agent platform goes to level 4 across every dimension: a compilable YAML DSL that spans flow and reasoning paradigms (BUILD), 20+ runtime primitives for orchestration, memory, knowledge, and execution intelligence (ORCHESTRATE), multi-protocol tool management with credential lifecycle (DEPLOY), a 3-tier guardrail cascade with 5 action types and per-agent policy scoping (GOVERN), decision-level traces with span trees and session replay (OBSERVE), and project-isolated multi-team deployment with versioning (SCALE). Open-source frameworks give you depth in orchestration but leave you to build governance, observability, and compliance from scratch. Enterprise incumbents give you ecosystem depth but bolt agents onto platforms designed for CRM and productivity. AI-native startups give you managed depth you can’t see, configure, or own.
And the platform is AI-programmable by design. Arch, Kore’s built-in AI architect, and open MCP/CLI/SDK interfaces let any AI assistant — Claude Code, Cursor, Copilot — design, build, test, deploy, and debug agents through the platform’s abstractions, not around them. AI programming 15 lines of ABL with compiler validation is fundamentally more reliable than AI generating 400 lines of framework glue code with no safety net. The question isn’t whether AI can build agents — it can. The question is whether AI should program purpose-built abstractions or reinvent infrastructure. Kore makes AI better at building agents by giving it the right level of abstraction to program.
TL;DR (30 seconds)
- Depth beats features: each “capability” has 3–4 depth levels; production needs Level 4.
- Most platforms are deep in 1–2 dimensions; Kore is deep in all 7 (see the radar + depth map).
- AI is more reliable on abstractions than glue code: ~15 lines of compiler-validated ABL vs. ~200–400 lines of framework code.
Jump to: Visuals · Dimensions · Depth Map
The Three Visuals
Visual 1: The Depth Iceberg
Slide headline: “Every platform demos the waterline. Production lives beneath it.”
Design guidance for slides:
- Above waterline: Light, airy — white/light gray background. The 4 feature claims in a calm, generic font. This is what every competitor shows. Make it feel small.
- Waterline: Animated wave or gradient transition. Blue. The visual break should feel like crossing a boundary.
- Below waterline: Deep navy/dark blue gradient getting darker as it goes down. The grouped capability boxes use the dimension colors from the radar (see Visual 2). Each box is a cluster, not a list — the mass of boxes should feel overwhelming compared to the 4 lines above.
- Key data point on slide: “10% / 90%” labels on left side. The ratio is the entire argument in two numbers.
- Animation: On click, the waterline appears and the bottom unfolds. The audience sees the 4 claims first, feels comfortable (“we have this too”), then the waterline drops and the depth hits.
Visual 2: The 7-Dimension Depth Radar
Slide headline: “Depth in 1-2 dimensions doesn’t make a platform. Depth in all 7 does.”
Design guidance for slides:
- Color palette: Kore = vibrant teal (#00BFA6), filling the full heptagon with a semi-transparent fill. Each competitor = a distinct color with lower opacity, layered underneath so Kore’s coverage dominates.
- Animation sequence (5 clicks):
- Show empty radar with 7 axis labels — the audience sees the dimensions
- Plot “Build it myself” (red dot near center) — “This is where AI coding tools get you”
- Plot Open-source (gray, lopsided) — “This is what frameworks add”
- Plot Enterprise + AI-native (blue + orange, larger but uneven) — “This is what platforms add”
- Plot Kore (teal, fills everything) — “This is what depth across all 7 looks like”
- Key visual effect: The Kore polygon should be the only one that’s symmetrical and fills the chart. Every competitor polygon should look lopsided or small. The shape asymmetry is the argument — you can’t see it in a feature checklist, but you can see it in a radar.
- Data callouts: At each axis tip for Kore, show the key number: “20+ primitives” (ORCHESTRATE), “3 tiers, 5 actions” (GOVERN), “8-15 events/turn” (OBSERVE), etc.
Visual 3: The AI Abstraction Ladder
Slide headline: “AI is only as good as what it’s programming.”
Design guidance for slides:
- Layout: Three horizontal tiers, top to bottom. Each tier has a left card (“what AI writes”), a right card (“what you get”), and a right-margin data column (“the math”).
- Color progression: Top tier = red/warning tones (fragile). Middle tier = amber/caution (better but incomplete). Bottom tier = green/teal (production-ready). The visual should feel like descending from danger to safety.
- Key visual element: The bottom tier should be physically larger than the other two — it contains more value despite being fewer lines. Use a glow, bold border, or spotlight effect to make it the focal point.
- The bottom bar: A horizontal scale at the bottom showing the three data points (lines, checks, primitives) creates a scannable summary. The 400 → 200 → 15 reduction should be the single most memorable number.
- Animation: Reveal top-down. Audience sees the raw code problem first, feels the pain, then sees the framework improvement, then sees the platform solution. The “15 lines” reveal should land like a punchline.
- Arch callout: A small badge or icon on the bottom tier: “Arch programs this layer natively. So can Claude Code, Cursor, or any AI assistant via MCP/CLI.”
Presentation Flow: Using All Three
| Slide | Visual | Time | Message |
|---|
| 1 | Iceberg | 60s | ”Here’s what every platform shows you. Here’s what production actually needs. The gap is 90% of the problem.” |
| 2 | Radar | 90s | ”Map that depth across 7 dimensions. Watch what happens to each competitor.” (Animate competitor polygons, then Kore fills the chart.) |
| 3 | Ladder | 60s | ”And when you try to build it yourself with AI? The abstraction level determines the outcome. 400 lines of fragile code, or 15 lines of compiler-validated platform abstractions.” |
Total: 3 slides, ~3.5 minutes. The entire competitive argument in visual form.
The narrative arc: Problem (iceberg) → Evidence (radar) → Solution (ladder). The audience moves from “I didn’t realize the problem was this deep” to “nobody covers all of this” to “and AI works better on this platform than on raw code.”
The Competitive Landscape
The Four Competitors
| Category | Players | Strength | Structural Ceiling |
|---|
| ”Build it myself” | Claude Code + SDKs, Cursor + frameworks | Maximum flexibility, AI-assisted velocity | AI generates implementations, not infrastructure. Every agent is a green-field project. No governance, observability, or compliance unless you build it. |
| Enterprise Incumbents | Copilot Studio, Salesforce Agentforce, Glean | Ecosystem integration, enterprise trust | Agent capability is bolted onto an existing platform — shallow by construction |
| AI-Native Startups | Sierra, Decagon, OpenAI Agents | LLM-native UX, fast innovation | Managed/opaque — you rent capability, you don’t own the deployment model |
| Open-Source Frameworks | LangGraph, CrewAI, AutoGen, n8n | Flexibility, no vendor lock-in | Everything beyond “make it run” is your problem — governance, observability, compliance, orchestration |
Players listed are representative examples, not exhaustive.
The Structural Problem with “I’ll Build It”
Before we map each dimension, we need to address the competitor that doesn’t appear in any analyst quadrant: the belief that AI coding tools make platforms unnecessary.
Claude Code, Codex, Cursor, Copilot — these tools are extraordinary. An engineer with an AI coding assistant can build an agent in a day. They can generate guardrails, tool calling, even a basic orchestration layer. This creates a powerful intuition: “I don’t need a platform. I can build exactly what I need.”
The intuition is correct for agents. It’s wrong for agent infrastructure.
AI coding tools generate implementations — specific code that does a specific thing. What production agents need are abstractions — composable primitives that handle the 1,000 runtime concerns beneath the surface:
- AI writing an implementation: “Call the OpenAI API, parse the response, check for PII, retry if blocked.” It works. It’s frozen. When you need streaming guardrails, you regenerate. When you need multi-agent handoff, you regenerate. Each regeneration is a new codebase with no relationship to the last.
- AI writing to an abstraction: “Define an agent with a goal, tools, guardrails, and a handoff policy.” The platform compiles it, runs it, governs it. Switch LLM providers — same definition. Add streaming guardrails — same definition. Deploy to a new region — same definition.
The right question isn’t “can AI build it?” It’s “what should AI be building?”
The Seven Dimensions of Depth
Each dimension follows the same structure: the problem you’ll hit, the depth levels, where competitors plateau, and where Kore continues. The depth iceberg (Visual 1) shows all of these beneath the waterline. The radar (Visual 2) shows how competitors cover 1-2 dimensions while Kore covers all 7.
How to Use This in a Vendor Evaluation (10 minutes)
Pick one realistic scenario and force the demo below the waterline. These seven prompts are designed to expose whether you’re seeing a Level 1–2 feature demo or a Level 4 production system.
| Dimension | Ask to demo (one stress test) | Shallow looks like |
|---|
| BUILD | Show the agent definition as diffable files; introduce a breaking change and show compile-time validation catching it. | ”It’s code” / “it’s a canvas” with no compiler, IR, or portable representation. |
| ORCHESTRATE | Run a multi-agent handoff mid-conversation with state continuity, cycle protection, and a failure path. | ”We can call another agent/tool” but the edge cases are custom code. |
| DEPLOY | Add a tool with OAuth refresh + rotation; set per-tool retry/idempotency; switch protocol (REST → MCP) without rewriting the agent. | ”Paste an API key” + function calls, with auth/protocol semantics hidden in bespoke glue. |
| GOVERN | Demonstrate a 3-tier cascade with different actions (redact vs. reask vs. escalate) and per-agent scoping — in streaming. | A single moderation check on the final output with one global action. |
| OBSERVE | Replay a wrong decision and show the span tree: tool inputs/outputs, guardrail tiers, policy scoping, expression evaluations. | Logs + token/cost metrics without decision-level replay. |
| SCALE | Show project isolation + versioned deployments; change a shared tool/guardrail safely without breaking other teams’ agents. | ”We can deploy many agents” without isolation, dependency control, or safe sharing. |
| AI-PROGRAMMABLE | Use an AI assistant (Arch or external via CLI/MCP) to design → build → test → deploy, and show the compiler catching an invalid change. | AI writes templates/code, but can’t program the platform’s abstractions end-to-end. |
Dimension 1: BUILD — How You Represent an Agent
The problem you’ll hit
You start building agents. The first 5 work great. By agent 15, three teams are building agents differently. By agent 50, you have 50 snowflakes — each with its own error handling, state management, and orchestration patterns. Onboarding a new developer to an existing agent takes 2 weeks because there’s no shared structure.
This is the Kubernetes-before-Kubernetes problem. In 2010, every team deployed applications differently. Dockerfiles and Kubernetes YAML gave the industry a shared representation. The agent ecosystem is in its 2010 moment — everyone building, nobody agreeing on how to describe what they’ve built.
Depth levels
| Level | What it means | Who stops here |
|---|
| 1. Code | Agent is a Python script with LLM calls | OpenAI Agents, raw LangGraph, AutoGen, “build it myself” |
| 2. Config | Agent is a UI-configured flow or form | Copilot Studio, Salesforce Agentforce, n8n |
| 3. DSL | Agent is a declarative definition — compilable, versionable, lintable | — |
| 4. Compilable IR | Agent definition compiles to an intermediate representation that multiple runtimes can execute | Kore (ABL) |
Where competitors plateau
“Build it myself” and open-source frameworks live at Level 1. LangGraph gives you a graph abstraction, but your agent is still a Python program. It can’t be compiled, statically analyzed, or deployed without shipping the code. CrewAI and AutoGen are the same — code that calls LLMs, with a framework’s conventions but no formal representation. AI coding tools generate more code faster, but it’s still code — unvalidated, ungoverned, unversioned as a unit.
Enterprise incumbents live at Level 2. Copilot Studio and Salesforce Agentforce give you a visual builder. The agent exists as platform configuration — not portable, not composable, not programmable beyond what the UI exposes. When the UI can’t express your use case, you’re stuck.
AI-native startups live somewhere between 1 and 2. Sierra and Decagon have internal representations, but they’re opaque. You don’t see them, you don’t own them, you can’t export them.
Where Kore continues
ABL (Agent Blueprint Language) is a YAML-based, human-readable DSL that compiles to an intermediate representation.
What this means in practice:
- Readable: It’s YAML. Your engineers can read it, diff it, review it in a PR. No proprietary binary format, no opaque platform state.
- Compilable: The compiler catches errors before deployment — structural validation, dependency resolution, constraint checking. Your agents don’t fail at runtime because of a misconfigured handoff target.
- Versionable: It lives in git. You get branching, history, rollback, and CI/CD — the same workflow your engineering team already uses for application code.
- Exportable: You can take your agent definitions with you. They’re files, not platform state.
- SDK-friendly: Kore provides SDKs that read, write, and manipulate ABL — so tooling, IDE extensions, and AI assistants can work with the representation natively.
- Portable: Because the YAML is a structured, well-defined format, it can be transpiled to other frameworks. ABL doesn’t lock you into Kore’s runtime — it locks you into a good representation that happens to run best on Kore.
- Spectrum-spanning: A single ABL definition can express pure flows, pure reasoning loops, and hybrid patterns (flow with reasoning within steps). You don’t choose a paradigm at framework-selection time — you choose it per agent, per step.
AI-native authoring with Arch: Kore’s built-in AI architect, Arch, understands ABL natively. It doesn’t generate code — it programs the platform’s abstractions. Ask Arch to “add a guardrail that redacts PII before tool calls” and it modifies the ABL definition, not a Python file. Ask it to “add a handoff to the billing agent with escalation” and it composes the orchestration primitives. Arch works through the same MCP tools, CLI, and SDK interfaces that any AI assistant can use — meaning your team can also use Claude Code, Copilot, or any AI tool to program ABL through the platform’s API.
The analogy: ABL is to agents what Terraform is to infrastructure. Terraform didn’t win because HashiCorp’s cloud was better — it won because the representation (HCL) was better than clicking through consoles or writing bash scripts. The representation is the platform.
”But what about visual development?”
Nothing is wrong with visual development — for the right problems. The issue is when the visual builder is the source of truth rather than a view.
In Copilot Studio and n8n, the canvas is the agent. When the canvas can’t express something, you can’t build it. In Kore, the visual is a projection of the DSL — you can build visually, drop to YAML when you need to, and have Arch or any AI assistant author either layer. Three authoring modes, one representation: visual for exploration, AI for velocity, YAML for precision. The visual never becomes a ceiling because it’s not the representation.
Quantified: Copilot Studio’s visual builder exposes ~30 node types. ABL’s DSL supports 15+ execution constructs, each with 5-20 configurable parameters, composable in arbitrary combinations. The visual surface covers the common cases; the DSL covers all of them.
Dimension 2: ORCHESTRATE — The Runtime Engine
The problem you’ll hit
Your agent works in a single-turn conversation. Then a customer asks a multi-step question that requires gathering information, calling two tools, checking a policy, and handing off to a specialist agent. Your agent either falls apart or you spend 3 months building a state machine that handles the 47 edge cases of “what happens when step 3 fails halfway through.”
This is the “game engine” problem. Every indie studio can build a game from scratch. But Unreal Engine exists because the physics, rendering, collision detection, audio, networking, and asset pipeline are 10,000 decisions that compound — and getting any one of them wrong ruins the player experience. Agent orchestration is the same: the runtime is an engine with dozens of interacting subsystems, and the quality of each subsystem determines whether your agents feel intelligent or fragile.
Depth levels
| Level | What it means | Who stops here |
|---|
| 1. LLM wrapper | Send prompt, get response | OpenAI Agents, basic LangGraph, “build it myself” |
| 2. Chain/graph | Sequence LLM calls with routing | LangGraph, CrewAI, n8n |
| 3. Flow engine | Structured execution with conditions and branching | Copilot Studio, Salesforce Agentforce |
| 4. Full orchestration engine | Multi-agent, multi-pattern, with runtime primitives for every execution concern | Kore |
Where competitors plateau
“Build it myself” and OpenAI Agents give you an LLM with tool calling and a handoff primitive. That’s it. Session management, memory, constraint evaluation, gather logic, response processing — all your problem. AI can generate each piece, but each piece is isolated — no shared execution model, no compiler validation, no guaranteed interop between the pieces.
LangGraph gets to Level 2 — you can build complex graphs with state. But every orchestration concern (handoff validation, cycle detection, session draining, memory windowing) is custom code you write inside graph nodes. The framework gives you a graph; the runtime intelligence is on you.
Copilot Studio and Salesforce reach Level 3 for simple flows. But their flow engines are designed for business process automation, not agent reasoning. When you need an agent to reason within a flow step, or dynamically decide to skip steps, or delegate to another agent mid-flow, you’ve exceeded what the flow engine was built for.
Sierra and Decagon likely have deep runtimes internally — but you can’t see them, configure them, or extend them. You get the behavior they chose to expose.
Where Kore continues — 20+ runtime primitives
This is where the depth gap is widest. Kore’s runtime isn’t a wrapper around LLM calls — it’s a full execution engine with purpose-built primitives for every concern an agent encounters at runtime:
Multi-Agent Orchestration (5 primitives)
- Handoff: Transfer conversation to another agent with allow-lists, self-handoff detection, cycle detection, and target validation — not just “call another agent” but a safe, auditable transfer protocol
- Delegation: Stack-based delegation where a parent agent dispatches to a child and resumes when the child completes — with state preservation across the delegation boundary
- Supervision: Supervisor agents that monitor, intervene, and override child agent behavior based on policy
- Agent Networks: Topology-aware multi-agent configurations where agents discover each other’s capabilities
- Escalation: Structured paths to human-in-the-loop, supervisor agents, or external workflow systems — not just “give up,” but graceful degradation with context transfer
Execution Intelligence (6 primitives)
- Expression Language (CEL): Runtime evaluation of conditions, constraints, and dynamic values — not hardcoded if/else but a proper expression engine
- Constraints: Declarative rules that bound agent behavior (token limits, time limits, topic boundaries) — evaluated continuously, not just at the start
- Completion Detection: Automatic detection of when an agent’s goal is satisfied, with configurable criteria — not “the LLM said it’s done” but structured evaluation
- Intelligent Gather: Information collection with validation, contextual extraction, re-prompting, and type coercion — 15+ field types, conditional requirements, disambiguation. Not “ask a question and parse the answer” but a full form engine that works in natural language.
- Response Processing Pipeline: Multi-stage processing of agent output before delivery — not just “return what the LLM said” but a pipeline with transformation, enrichment, filtering, and formatting stages
- Automatic Policy Enforcement: Policies evaluated at execution boundaries (before tool calls, before responses, at handoff points) — not bolted on after the fact
Session & Memory (4 primitives)
- Session Management: Full lifecycle — creation, persistence, resumption, draining, timeout. State preserved across handoffs and delegations.
- Threaded Conversations: Context isolation within a session — parallel conversation threads that don’t bleed into each other
- Memory Management: Sliding windows, summarization, scoped recall. Not “dump everything into context” but intelligent context budgeting.
- Decision Traces: Every branch point, every condition evaluation, every policy check recorded — not logging, but a structured decision history that can be replayed
Knowledge & Retrieval (3 primitives)
- Context-Driven RAG: Knowledge base retrieval that understands the agent’s current context, goal, and conversation state — not just “embed the query and search”
- Knowledge Graph Topologies: Domain-aware mappings that understand entity relationships — so retrieval follows the structure of the domain, not just vector similarity
- Connections: Managed integrations with auth lifecycle, protocol abstraction (REST, GraphQL, MCP, gRPC), and credential rotation — not just “call this URL”
Arch in orchestration: When you need to compose these primitives, Arch understands how they interact. “Add a delegation to the research agent, but with a 30-second timeout and escalation to a human if it fails” — Arch knows this requires a delegation primitive, a constraint, and an escalation path, and it composes them correctly in ABL. With raw code, your AI assistant generates plausible-looking orchestration that breaks at edge cases it’s never seen. With ABL, the compiler catches errors before they reach production.
Quantified: Kore’s runtime engine provides 20+ purpose-built execution primitives. LangGraph provides 3 (nodes, edges, state). Copilot Studio provides ~10 (mostly flow control). OpenAI Agents provides 4 (completion, tool use, handoff, guardrails). The gap isn’t incremental — it’s categorical.
The analogy: Competitors give you a scripting language and say “build your game.” Kore gives you a game engine with physics, collision, AI pathfinding, audio, networking, and an asset pipeline already built, tested, and integrated. You write the game logic; the engine handles the 10,000 runtime concerns beneath it.
The problem you’ll hit
Your agent calls 3 tools and it works great. Then you need to add a tool behind your customer’s SSO. Then a tool that uses OAuth2 with token refresh. Then a tool that speaks MCP instead of REST. Then a tool that needs exactly-once execution semantics because it processes payments.
By tool 15, you’ve built a tool management platform. You didn’t plan to. You didn’t budget for it. But every tool has its own auth, its own protocol, its own failure modes, and its own execution requirements.
This is the “API gateway” problem of the agent era. In the microservices wave, every team started by calling services directly. Then they needed auth, rate limiting, circuit breaking, protocol translation, and service discovery — and API gateways (Kong, Apigee) emerged because managing the mesh was harder than making the calls.
Depth levels
| Level | What it means | Who stops here |
|---|
| 1. Function calling | LLM emits a function call, you execute it | OpenAI, LangGraph, AutoGen, CrewAI, “build it myself” |
| 2. Tool registry | Tools are registered with schemas and descriptions | Copilot Studio, n8n |
| 3. Managed tools | Platform handles auth, versioning, permissions per tool | Salesforce Agentforce, Glean |
| 4. Tool ecosystem | Multi-protocol, multi-auth, dynamic discovery, execution policies, credential lifecycle | Kore |
Where competitors plateau
“Build it myself” and open-source live at Level 1. You write a Python function, register it with the framework, and handle everything yourself — auth, retries, timeouts, credential rotation, error mapping. AI can generate each integration, but each is a standalone piece of code with its own auth handling, its own error paths, and no shared credential management.
Copilot Studio and n8n reach Level 2. Tools are registered centrally, but auth management is basic (API keys, simple OAuth) and protocol support is whatever connectors exist.
Salesforce reaches Level 3 — deep integration within the Salesforce ecosystem. MuleSoft connectors, Salesforce auth. But step outside that ecosystem and you’re back to Level 1.
Where Kore continues
- Protocol abstraction: REST, GraphQL, MCP, gRPC — the agent doesn’t know or care which protocol the tool speaks. The connection layer handles translation.
- Auth diversity: OAuth2 with refresh, API key rotation, mutual TLS, token exchange — per-connection credential lifecycle management, not “paste your API key here.”
- Dynamic discovery: Tools are registered, versioned, and discoverable at runtime. Agents can discover new tools without redeployment.
- Execution policies: Per-tool configuration for retry strategy, timeout, circuit breaking, idempotency requirements. A payment tool gets exactly-once semantics; a weather tool gets retry-on-failure.
- Credential lifecycle: Tokens rotate, keys expire, OAuth grants are revoked. The platform manages this continuously, not at configuration time.
Quantified: A typical enterprise agent ecosystem involves 15-30 tool integrations across 3-5 auth patterns and 2-3 protocols. Managing this with Level 1 primitives means your team writes and maintains 15-30 custom integration layers. At Level 4, they configure 15-30 connections.
Dimension 4: GOVERN — Guardrails, Compliance, and Policy
The problem you’ll hit
Your agent goes to production. Week 1, it leaks a customer’s email address in a response to a different customer. Week 2, it generates a response that violates your industry’s regulatory language. Week 3, a prompt injection bypasses your content filter because the filter only runs on the final response, not on intermediate reasoning steps.
You realize you need guardrails. So you add a PII check. Then a content filter. Then a topic boundary. Then your compliance team asks: “Can different agents have different policies? Can we audit which guardrail fired and why? Can we add our own custom check without modifying the platform?”
This is the “firewall” evolution. First-generation firewalls were packet filters — simple rules on IP and port. Then came stateful inspection, deep packet inspection, application-layer firewalls, WAFs, and zero-trust architectures. Each generation emerged because the previous one couldn’t handle the next class of threats. Agent governance is on the same trajectory.
Depth levels
| Level | What it means | Who stops here |
|---|
| 1. Single check | Call a moderation API on the output | OpenAI (moderation endpoint), most open-source, “build it myself” |
| 2. Multiple checks | Run several guardrails (PII, content, topic) | Guardrails AI, NeMo Guardrails, basic platform implementations |
| 3. Policy engine | Checks are scoped per agent/project, with configurable actions | Salesforce Einstein Trust Layer |
| 4. Governance framework | Multi-tier cascade, pluggable providers, streaming-aware, scoped policies, multiple action types, extensible | Kore |
Where competitors plateau
“Build it myself” and OpenAI provide a moderation endpoint or a single check. One check, one action (block), no scoping. AI can generate additional checks, but each is standalone — no cascade ordering, no cost optimization, no policy scoping.
Open-source guardrail libraries (Guardrails AI, NeMo) give you Level 2 — multiple checks in a pipeline. But scoping, policy management, action variety, and streaming support are your responsibility.
Salesforce Einstein Trust Layer reaches Level 3 for Salesforce-hosted agents. But it’s tied to the Salesforce ecosystem and the policy model is coarse-grained.
Sierra and Decagon have internal governance — but it’s opaque. You trust their policies, you don’t define your own.
Where Kore continues
- Multi-tier cascade: Tier 1 (regex/CEL — microseconds, zero cost) catches obvious violations before Tier 2 (classification models — milliseconds, low cost) before Tier 3 (LLM judges — seconds, higher cost). You’re not burning $0.03 per message on an LLM content check when a regex catches “SSN: \d-\d-\d” in microseconds.
- 5 action types: Block, redact, reask, escalate, fix — per rule, not globally. A PII violation redacts; a topic violation reasks; a safety violation escalates. Different responses for different severities.
- Policy scoping: Guardrail policies scoped at project level, agent level, or conversation level. A customer-facing agent has strict PII rules; an internal analytics agent has relaxed ones. Same platform, different policies.
- Pluggable providers: Add Lakera, a custom ML model, a hosted moderation API — as providers within the governance framework. The cascade, scoping, and action logic remain the same regardless of which provider runs the check.
- Streaming-aware: Guardrails evaluate during token emission, not after the complete response. Violations are caught mid-stream, not after the customer has already read the problematic content.
- Inline enforcement: Guardrails fire at execution boundaries — before tool calls, before responses, at handoff points — not as a post-processing step bolted onto the output.
Quantified: Kore’s governance engine supports 3 evaluation tiers, 5 action types, 4+ scoping levels, and pluggable providers — yielding 60+ unique policy configurations per guardrail rule. A single-check moderation API gives you 1.
Arch in governance: Tell Arch “this agent handles healthcare data — add HIPAA-appropriate guardrails” and it configures the right combination of PII detection (Tier 1 regex for SSN/MRN patterns), content classification (Tier 2 model for PHI categories), and LLM evaluation (Tier 3 for contextual compliance) — with redact actions on PII, block on PHI disclosure, and escalate on ambiguous cases. Configuring this manually requires understanding 3 tiers, 5 action types, and provider-specific parameters. Arch composes them because it understands the governance framework natively.
”But why can’t I just use a hosted guardrail service?”
You can. And you should — as a provider within a governance framework.
Lakera gives you a toxicity check. Guardrails AI gives you output validation. These are valuable. But they’re individual checks, not a governance system. You still need:
- Orchestration: Which checks run, in what order, and when do you short-circuit?
- Policy scoping: Which checks apply to which agents?
- Action framework: What happens when a check fires? Block? Redact? Reask?
- Cascade economics: Why run an LLM judge (0.03)whenaregex(0.00) catches the same violation?
- Streaming integration: How do you enforce checks during token emission, not after?
A hosted guardrail service is a provider. Kore’s governance engine is the framework those providers plug into. Using Lakera with Kore is a feature, not a competing choice.
Dimension 5: OBSERVE — Tracing, Decisions, and Replay
The problem you’ll hit
It’s 2 AM. An agent told a customer their refund was processed when it wasn’t. Your on-call engineer opens the logs and sees: “200 OK” from the LLM, “200 OK” from the tool call. Everything looks fine. But the agent made a wrong decision — it interpreted “refund initiated” as “refund completed” and told the customer accordingly.
Your logs can’t help because the problem isn’t an error — it’s a decision. The agent reasoned, chose, and acted. The reasoning was wrong. To debug it, you need to see every decision the agent made, what context it had when it made each one, which guardrails evaluated, and what the alternatives were.
This is the “distributed tracing” evolution. Microservices had the same problem — a request fails, but the failure is 7 services deep and the root cause is a timeout in service 4 that propagated as a wrong default in service 6. Jaeger and Zipkin solved this with distributed traces. Agent observability needs the same thing — but for reasoning chains, not HTTP calls.
Depth levels
| Level | What it means | Who stops here |
|---|
| 1. Logs | Print statements, basic request/response logging | Open-source frameworks, “build it myself” |
| 2. LLM metrics | Token counts, latency, cost tracking per call | OpenAI dashboard, LangSmith, Arize |
| 3. Chain traces | Trace the sequence of LLM calls and tool invocations | LangSmith, LangFuse |
| 4. Agent execution traces | Full decision tree with guardrail evaluations, policy checks, handoff decisions, expression evaluations, and session-level replay | Kore |
Where competitors plateau
“Build it myself” and open-source give you stdout. AI can generate logging code, but it doesn’t know what’s worth tracing in an agent execution — because agent-level tracing (guardrail cascades, handoff validation, policy evaluations) is a domain concept that doesn’t exist in general-purpose logging.
LangSmith and Arize reach Level 3 — excellent for tracing LLM call chains. But they trace what the LLM did, not what the agent decided. They don’t capture guardrail evaluations, policy scoping decisions, handoff validation, or expression language evaluations — because those concepts don’t exist in their model.
Salesforce and Copilot Studio provide platform-level metrics (conversation counts, resolution rates) but not agent-level execution traces. You see the outcome, not the reasoning path.
Sierra and Decagon have internal observability — but you’re trusting their debugging, not doing your own. When something goes wrong, you file a ticket.
Where Kore continues
- Structured trace events: Every execution step emits a typed trace event — not a log line, but a structured record with step type, inputs, outputs, duration, and decision context.
- Span trees: Traces are hierarchical. A conversation contains turns, turns contain agent steps, steps contain tool calls and guardrail evaluations. You can zoom from “this conversation went wrong” to “this specific guardrail evaluation at minute 3 returned an unexpected result.”
- Decision replay: Take any trace and replay the decision — see exactly what context the agent had, what options it considered, and why it chose what it chose.
- Real-time session subscription: Watch an agent’s execution in real time — not after the fact, but as it happens. Critical for supervised deployments and live debugging.
- Guardrail-aware traces: Traces capture not just “guardrail fired” but which tier, which provider, what the input was, what action was taken, and what the alternative response would have been.
- Export-friendly: Kore’s traces can export to Datadog (for infrastructure), LangSmith (for LLM-level detail), or any OpenTelemetry-compatible system. Agent-level observability augments, not replaces, your existing observability stack.
Quantified: A single agent turn in Kore generates 8-15 trace events (LLM call, tool resolution, guardrail tiers, policy evaluation, expression checks, response processing, completion detection). LangSmith captures 1-3 (LLM call, tool call, chain step). The difference is 5-10x more observability surface per interaction.
Arch in observability: When an agent misbehaves, Arch can analyze the execution trace — not just the logs. “Why did the refund agent tell the customer their refund was processed?” Arch walks the span tree: the tool call returned “initiated,” the response processing pipeline didn’t have a status-mapping stage, and the LLM inferred “completed” from “initiated.” Arch can then propose the fix: add a response processing rule or a constraint that validates tool output status codes before generating customer-facing language. The debugging loop — from trace to diagnosis to fix — stays within the platform.
”But why can’t I use hosted observability (Datadog, LangSmith)?”
You should — for the layers they’re designed for. But agent observability is different from LLM observability.
LangSmith traces a chain of LLM calls. It doesn’t know about:
- Guardrail decisions and their cascade logic
- Tool execution with auth context and connection metadata
- Handoff validation between agents (cycle detection, allow-lists)
- Flow-step transitions and expression evaluations
- Policy scoping decisions (which rules applied and why)
These concepts don’t exist in LangSmith’s model because LangSmith was built for LLM chains, not agent execution engines. Kore’s tracing understands the agent execution model natively — and can export the LLM-level data to LangSmith for the prompt debugging use case.
The analogy: Using LangSmith for agent observability is like using application performance monitoring (APM) for debugging business logic. APM tells you the HTTP call took 200ms. It doesn’t tell you why the business rule made the wrong decision. You need both — but the agent-level layer has to understand the agent.
Dimension 6: SCALE — Organizational, Not Just Computational
The problem you’ll hit
Your first 3 agents are built by one team. They share tools, share guardrail policies, share deployment configs. It works. Then a second team builds agents. Then a third. By team 5, you discover that team 2’s guardrail change broke team 4’s agent. Team 3 deployed a tool update that team 1’s agent depends on, and the schema change wasn’t backwards-compatible.
This is the “monolith to microservices” problem all over again — but for agents. The solution isn’t just “deploy more instances.” It’s isolation, versioning, dependency management, and controlled sharing across organizational boundaries.
Depth levels
| Level | What it means | Who stops here |
|---|
| 1. Single agent | One agent, one deployment | Open-source frameworks, “build it myself” |
| 2. Multi-agent, single team | Multiple agents, shared config, one team manages | OpenAI Agents, n8n |
| 3. Platform-managed | Platform handles deployment, but limited isolation | Salesforce Agentforce, Copilot Studio |
| 4. Organizational scale | Project-level isolation, versioned deployments, dependency management, multi-team governance | Kore |
Where competitors plateau
“Build it myself” and open-source are structurally single-team. LangGraph can orchestrate multiple agents in a single graph, but there’s no concept of organizational boundaries, deployment isolation, or shared governance. AI coding tools can generate more agents, but each is an independent project — no shared tool registry, no shared governance, no isolation guarantees.
OpenAI Agents scales computationally (their API handles the load) but not organizationally. There’s no project isolation, no deployment versioning, no multi-team guardrail management.
Salesforce and Copilot Studio have organizational concepts (Salesforce orgs, Azure tenants) but their agent management is coarse-grained. Deploying 50 agents across 10 teams with per-team policies and shared tool registries isn’t something the platform was architected for — because it was architected for CRM and productivity, with agents added on top.
Where Kore continues
- Project-level isolation: Agents, tools, guardrails, and deployments are scoped to projects. Team A’s guardrail change cannot affect Team B’s agents.
- Versioned deployments: Agents are compiled, versioned, and deployed as units. Roll back to yesterday’s version while preserving active sessions.
- Multi-agent orchestration at scale: Handoff, delegation, and supervision work across organizational boundaries with explicit permission models — agent A can delegate to agent B only if the project configuration allows it.
- Shared resource governance: Tools and knowledge bases can be shared across projects with permission controls — reusable without uncontrolled coupling.
- Compilation as a scale primitive: Because agents compile to IR, the platform can validate dependencies, detect breaking changes, and enforce compatibility before deployment — not after a production incident.
Quantified: Kore’s architecture supports the organizational complexity of 50+ agents across 10+ teams with per-project isolation, shared tooling, and independent deployment lifecycles. Open-source frameworks support 1 team. Enterprise incumbents support organization-wide policies but not project-level granularity.
The problem you’ll hit
Your engineers use AI coding tools to build agents. It works — for the first agent. Then they need a second agent with different guardrails. The AI generates a new codebase. Then a third agent that hands off to the first. The AI generates the handoff logic from scratch, with no knowledge of the first agent’s session format, state model, or escalation paths.
By agent 10, you have 10 independently generated codebases. Each works in isolation. None work together. Your AI coding tool is incredibly productive at generating code — but it has no understanding of the system these agents need to operate within. Every agent is a green-field project to the AI, even though it’s part of a connected platform.
This is the “IDE vs. platform” problem. An IDE makes you productive at writing code. A platform makes the code productive in production. AI coding tools are extraordinary IDEs. But they’re not platforms — and they can’t be, because they don’t understand your agent infrastructure, governance policies, or orchestration patterns.
Depth levels
| Level | What it means | Who stops here |
|---|
| 1. AI writes code | AI generates Python/JS that calls LLM APIs | Claude Code + OpenAI SDK, Cursor + LangGraph |
| 2. AI uses templates | AI fills in templates or scaffolds from examples | Copilot Studio wizards, Salesforce agent builder |
| 3. AI calls platform APIs | AI interacts with the platform through APIs/CLI | Salesforce CLI, basic API integrations |
| 4. AI programs platform abstractions | AI understands the domain model, composes primitives, validates through the compiler, and operates across the full lifecycle (design, build, test, deploy, debug) | Kore (ABL + Arch + MCP + CLI) |
Where competitors plateau
“Build it myself” with AI coding tools lives at Level 1. Claude Code can generate a LangGraph agent, but it doesn’t understand your guardrail policies, your tool registry, your handoff topology, or your compliance requirements. Every generation starts from zero context. The AI is brilliant at code; it’s blind to the system.
Enterprise incumbents reach Level 2-3. Copilot Studio has wizards that guide you through configuration. Salesforce has CLI tools. But the AI operates within narrow rails — it can fill forms and call APIs, not compose orchestration primitives or reason about multi-agent topologies.
AI-native startups don’t expose this layer at all. Sierra and Decagon manage everything internally. OpenAI’s API is programmable but has no lifecycle abstractions — there’s no deployment, no governance, no orchestration to program against.
Where Kore continues — AI across the full lifecycle
Kore isn’t just AI-compatible — it’s AI-programmable by design. Every layer of the platform exposes interfaces that AI assistants can reason about, compose, and validate:
Design phase — AI as architect
- Arch understands the domain: “I need an agent network for customer support with tier-1 triage, tier-2 specialist, and human escalation.” Arch designs the topology, defines the handoff policies, and configures the guardrails — as ABL definitions, not as code.
- Any AI assistant (Claude Code, Copilot, Cursor) can interact with ABL through MCP tools and the CLI — Arch is the built-in option, but the interfaces are open.
Build phase — AI as programmer
- AI programs abstractions, not implementations. “Add a gather step that collects the customer’s order number, validates it against the orders API, and re-prompts if invalid” — this is an ABL construct with 3 parameters, not 50 lines of code with error handling, retry logic, and state management.
- The compiler validates what AI produces. If AI generates an invalid handoff target or an impossible constraint combination, the compiler catches it — before deployment, not in production.
- Quantified: A typical agent feature (add a tool with auth, a guardrail, and a handoff path) requires ~15 lines of ABL vs. ~200-400 lines of framework code. AI makes fewer mistakes in 15 lines. The compiler catches the ones it does make.
Test phase — AI as tester
- Arch can generate test scenarios from the agent definition: “Given this gather step, what are the validation edge cases? What happens if the orders API times out? What if the customer provides a wrong format?”
- The platform’s test harness runs these against the compiled IR — not against a running service, but against the execution model itself.
Deploy phase — AI as operator
- CLI and MCP tools for deployment, versioning, and rollback. “Deploy this agent to staging, run 100 synthetic conversations, compare guardrail trigger rates to production, promote if within tolerance” — automatable through the same interfaces AI uses.
- Arch can monitor deployment health and recommend rollback based on trace analysis.
Debug phase — AI as diagnostician
- When agents misbehave, Arch reads execution traces (not logs) and diagnoses issues at the decision level. “The agent handed off to billing instead of support because the CEL expression evaluated customer.tier as ‘premium’ when it should have been ‘enterprise’” — this diagnosis requires understanding the execution model, not just reading output.
- Arch proposes fixes as ABL modifications — closing the loop from diagnosis to solution within the platform.
The flywheel: AI gets better because the abstractions are better
Here’s what most people miss: AI coding tools are only as good as what they’re programming.
When AI writes raw Python against LLM APIs, it generates plausible code that works in the common case and breaks in edge cases — because there are thousands of edge cases in agent orchestration that don’t appear in training data. The AI doesn’t know about your session draining logic, your guardrail cascade ordering, or your credential rotation schedule.
When AI writes ABL, it’s composing well-defined primitives with known semantics. The compiler enforces constraints. The runtime handles edge cases. The AI doesn’t need to know about session draining — the platform handles it. The AI doesn’t need to generate guardrail cascade logic — it declares which guardrails apply and the governance engine executes the cascade.
Better abstractions make AI more productive and more reliable. This is the compounding advantage: as the platform’s primitives get richer, AI can do more with less — and the gap between “AI + platform” and “AI + raw code” widens.
The analogy: AI writing agents in raw code is like AI writing assembly language — technically possible, impressively capable, and completely the wrong level of abstraction. AI writing ABL is like AI writing in a high-level language with a type system, a compiler, and a standard library. The output is more reliable, more composable, and more maintainable — because the abstraction is doing the heavy lifting.
The Lock-in Inversion
Every platform choice is a lock-in. The question isn’t whether you’re locked in — it’s what you’re locked into and what you’re locked out of.
| Choice | Locked into | Locked out of |
|---|
| ”Build it myself” (AI coding tools) | Maintaining generated code forever | Governance, observability, orchestration, compliance, organizational scale — you build everything from scratch, every time |
| Open-source (LangGraph, CrewAI, AutoGen) | DIY for everything beyond orchestration | Governance, observability, compliance, organizational scale — unless you build it yourself |
| Enterprise incumbents (Copilot Studio, Salesforce) | Their ecosystem (Azure, Salesforce) | Deep agent capabilities, runtime primitives, flexible representation |
| AI-native startups (Sierra, Decagon) | Their managed service | Deployment control, runtime configuration, extensibility |
| Kore (ABL) | A YAML-based DSL and runtime | Nothing — governance, observability, orchestration, and compliance are included by default |
The ABL lock-in question — answered directly
“ABL is proprietary. Am I locked in?”
- It’s YAML. Not a binary format, not platform state, not a UI configuration. Human-readable, diffable, reviewable in a PR.
- It’s exportable. Your agent definitions are files you own. Export them, store them, version them in your own git repository.
- It’s SDK-friendly. Kore provides SDKs that read, write, and manipulate ABL — enabling third-party tooling, custom pipelines, and integration with your existing development workflows.
- It’s portable. The structured YAML can be transpiled to other frameworks. Moving off Kore doesn’t mean rewriting from scratch — it means transforming a well-defined representation.
The deeper question: What’s the alternative?
- LangGraph locks you into a graph paradigm — if your use case doesn’t fit a graph, you fight the framework.
- CrewAI locks you into a crew/role paradigm — if your use case doesn’t fit crew collaboration, you contort the model.
- AutoGen locks you into a conversation loop paradigm — if your use case needs deterministic flows, you build escape hatches.
- Copilot Studio locks you into Azure and a visual builder — if you need something the builder can’t express, you stop.
- Salesforce locks you into the Salesforce ecosystem — deep if you’re all-in, dead-end if you’re not.
ABL locks you into a representation that spans all paradigms — flows, reasoning, hybrid — with a compiler that validates and a runtime that executes. The lock-in is real. But it’s a lock-in that includes governance, observability, orchestration, and compliance from day one.
The analogy: Nobody calls SQL “lock-in” even though it’s a proprietary abstraction over storage. Because the abstraction is worth it — it’s readable, portable, tooling-rich, and better than the alternative (hand-written file I/O). ABL is the same argument for agents.
Summary: The Depth Map
| Dimension | ”Build It Myself” | Open-Source | Enterprise Incumbents | AI-Native Startups | Kore |
|---|
| BUILD (representation) | L1: Generated code | L1: Framework code | L2: Config/Visual | L1-2: Opaque | L4: Compilable YAML DSL with IR |
| ORCHESTRATE (runtime) | L1: Custom code | L1-2: Wrapper/Graph | L3: Flow engine | L1-2: Managed wrapper | L4: 20+ runtime primitives |
| DEPLOY (tools/auth) | L1: DIY per tool | L1: DIY | L2-3: Ecosystem-bound | L2: Managed (narrow) | L4: Multi-protocol, multi-auth, dynamic |
| GOVERN (guardrails) | L1: Single check | L1: Single check | L2-3: Multiple / Policy | L2: Internal | L4: 3-tier cascade, 5 actions, scoped |
| OBSERVE (tracing) | L1: Logs | L1: Logs | L2: Metrics | L2: Internal | L4: Decision traces, span trees, replay |
| SCALE (organizational) | L1: Single team | L1: Single team | L3: Platform-managed | L2: Managed | L4: Project isolation, versioned deployment |
| AI-PROGRAMMABLE (lifecycle) | L1: AI writes code | L1: AI writes code | L2-3: AI fills forms/APIs | L1: Not exposed | L4: AI programs abstractions (Arch + MCP + CLI) |
Every competitor has depth in 1-2 dimensions. Kore has depth in all 7.
Salesforce is deep in ecosystem integration but shallow in agent runtime. LangGraph is deep in orchestration flexibility but shallow in governance and observability. Sierra is deep in LLM-native UX but shallow in customer control. OpenAI is deep in model capability but shallow in everything around the model. AI coding tools are deep in code generation but shallow in every production concern.
And Kore doesn’t compete with AI coding tools — it makes them better. Arch is the built-in AI architect that understands the platform natively. But the same MCP tools, CLI, and SDK interfaces are open to any AI assistant. Use Claude Code to author ABL. Use Cursor to explore agent topologies. Use Copilot to generate test scenarios. The platform’s abstractions are the AI’s workspace — and the better the abstractions, the better the AI performs.
The platform that wins is the one that’s deep everywhere — because production agents need all seven dimensions working together, not one dimension working brilliantly and six held together with glue code.