Guardrails - Kore.ai Docs

Guardrails are safety checks that evaluate agent inputs and outputs to detect harmful, non-compliant, or malformed content. Unlike constraints (which enforce business rules), guardrails protect against safety and quality violations at the content level. The GUARDRAILS: block defines named guardrail rules.

Overview

ABL guardrails use a three-tier evaluation model:

CEL-based (Tier 1) — fast, deterministic expression checks.
Model-based (Tier 2) — pre-trained safety classification models (for example, OpenAI moderation).
LLM-based (Tier 3) — natural language checks evaluated by an LLM.

Each guardrail specifies an application point (when to check), a check expression or prompt, and an action to take when the check fails.

GUARDRAILS:
  profanity_filter:
    kind: input
    check: not_contains_blocked_words(input)
    action: block
    message: "Your message was blocked. Please keep the conversation respectful."
    priority: 1

  pii_output_prevention:
    kind: output
    check: not_contains_ssn(response)
    action: redact
    message: "Sensitive information has been redacted."
    priority: 0

Application points

The kind property determines when the guardrail is evaluated during the agent’s processing pipeline.

Kind	Evaluation point
`input`	Before the user’s message reaches the LLM.
`output`	After the LLM generates a response, before it is sent to the user.
`both`	Evaluated on both input and output.
`tool_input`	Before parameters are sent to a tool call.
`tool_output`	After a tool returns its result, before the result enters the LLM context.
`handoff`	Before context is passed to another agent during a handoff.

Guardrail properties

Property	Type	Required	Default	Description
`name`	`string`	Yes	—	Unique identifier for the guardrail (the YAML key).
`kind`	`string`	Yes	—	Application point. See Application points.
`check`	`string`	No	—	CEL expression to evaluate (Tier 1). Omit for model-based or LLM-based.
`action`	`string`	Yes	—	Action when the check fails. See Actions.
`message`	`string`	No	—	Human-readable message displayed or logged when the guardrail triggers.
`priority`	`number`	No	`100`	Evaluation priority. Lower values are evaluated first.
`provider`	`string`	No	—	Model provider name for Tier 2 checks (for example. `openai_moderation`).
`category`	`string`	No	—	Safety taxonomy category for Tier 2 (for example. `hate`, `violence`).
`threshold`	`number`	No	—	Score threshold (0.0—1.0) for model-based checks.
`llm_check`	`string`	No	—	Natural language prompt for Tier 3 LLM-based checks.
`severity_actions`	`object`	No	—	Per-severity action overrides. See Graduated actions.
`fix_strategy`	`string`	No	—	Fix strategy when `action: fix`. See Fix strategies.
`fix_expression`	`string`	No	—	CEL expression for the `custom` fix strategy.
`max_reasks`	`number`	No	`2`	Maximum reask attempts when `action: reask`.
`filter_min_length`	`number`	No	—	Minimum content length after filtering. Below this threshold, block instead.
`streaming`	`boolean`	No	`false`	Enable mid-stream evaluation for streaming responses.
`streaming_interval`	`string`	No	—	Streaming evaluation granularity. See Streaming evaluation.

Actions

The action property determines the runtime behavior when a guardrail check fails.

Action	Behavior
`block`	Reject the content entirely. For input, the user message is discarded. For output, the response is withheld.
`warn`	Allow the content through but emit a warning event. The `message` is logged, not sent to the user.
`redact`	Replace the offending content with a redaction marker and continue. The sanitized content is passed through.
`escalate`	Trigger human escalation for review. The content is held pending human decision.
`fix`	Automatically repair the content using a fix strategy. See Fix strategies.
`reask`	Reject the LLM output and re-prompt with the guardrail’s message appended as additional guidance.
`filter`	Remove the offending portions while preserving the rest of the content.

Three-tier implementation

Tier 1: CEL-based checks

CEL (Common Expression Language) checks are fast, deterministic rules evaluated without calling an external model. Use the check property with a CEL expression.

GUARDRAILS:
  length_limit:
    kind: output
    check: length(response) < 10000
    action: warn
    message: "Response exceeds recommended length."

  ssn_detection:
    kind: input
    check: not_matches_pattern(input, "\\b\\d{3}-\\d{2}-\\d{4}\\b")
    action: redact
    message: "SSN detected and redacted."

Tier 2: Model-based checks

Model-based checks use a pre-trained classification model to score content. You specify a provider, an optional category, and a threshold.

GUARDRAILS:
  toxicity_detection:
    kind: input
    provider: openai_moderation
    category: hate
    threshold: 0.7
    action: block
    message: "Content flagged for hateful language."

Tier 3: LLM-based checks

LLM-based checks use a natural language prompt evaluated by an LLM. Use the llm_check property with a descriptive prompt.

GUARDRAILS:
  medical_advice_check:
    kind: output
    llm_check: "Does this response provide specific medical diagnoses or prescribe medication? Answer YES or NO."
    action: block
    message: "I'm not able to provide medical diagnoses. Please consult a healthcare professional."

Fix strategies

When action: fix, the fix_strategy property determines how content is repaired.

Strategy	Behavior
`truncate`	Truncate content to the maximum allowed length.
`strip_html`	Remove HTML tags from the content.
`redact_pii`	Detect and replace PII patterns with redaction markers.
`normalize`	Normalize whitespace, encoding, and special characters.
`custom`	Apply a custom CEL expression defined in `fix_expression`.

Example: fix with truncation

GUARDRAILS:
  response_length:
    kind: output
    check: length(response) <= 5000
    action: fix
    fix_strategy: truncate
    message: "Response was trimmed to fit the maximum length."

Example: custom fix expression

GUARDRAILS:
  normalize_whitespace:
    kind: output
    check: not_contains_excessive_whitespace(response)
    action: fix
    fix_strategy: custom
    fix_expression: "collapse_whitespace(response)"

Graduated actions

Use severity_actions to apply different actions based on the severity of the violation. The keys are severity labels and the values are action names.

GUARDRAILS:
  content_safety:
    kind: output
    provider: openai_moderation
    threshold: 0.5
    action: warn
    severity_actions:
      low: warn
      medium: reask
      high: block
    message: "Content flagged by safety model."

Streaming evaluation

For streaming responses, guardrails can evaluate content as it is generated rather than waiting for the complete response.

Property	Values	Description
`streaming`	`true`, `false`	Enable mid-stream evaluation.
`streaming_interval`	`token`, `sentence`, `chunk_size`	Granularity of streaming evaluation.

GUARDRAILS:
  realtime_safety:
    kind: output
    provider: openai_moderation
    threshold: 0.8
    action: block
    streaming: true
    streaming_interval: sentence
    message: "Response generation halted due to safety concern."

When a streaming guardrail triggers, the response generation is halted at the current point and the message is sent to the user.

Reask behavior

When action: reask, the runtime rejects the LLM output, appends the guardrail’s message as additional guidance, and re-prompts. The max_reasks property controls how many times this can happen before falling back to a block.

GUARDRAILS:
  factual_grounding:
    kind: output
    llm_check: "Does this response make claims not supported by the provided context?"
    action: reask
    max_reasks: 3
    message: "Stick to information from the provided context. Do not make unsupported claims."

Priority and evaluation order

Guardrails are evaluated in order of priority (lower values first). When multiple guardrails have the same priority, they are evaluated in declaration order. A block action from any guardrail stops further evaluation. warn actions do not stop evaluation; all subsequent guardrails continue to run.

Built-in guardrail templates

ABL ships a set of built-in, CEL-based (Tier 1) guardrail templates focused on prompt-injection and secret-leak protection:

Template	Kind	Detects	Action
`detect_instruction_override`	input	Attempts to override or ignore system instructions	`warn`
`detect_role_manipulation`	input	Attempts to manipulate the AI’s role or persona	`warn`
`detect_system_prompt_extraction`	input	Attempts to extract the system prompt	`warn`
`detect_encoding_tricks`	input	Encoding-based obfuscation (base64, rot13, hex)	`warn`
`detect_credential_leak`	output	Leaked credentials, API keys, or tokens in output	`redact`

You can also author your own guardrails for domain-specific concerns (account-number masking, SSN redaction, profanity, etc.) using the tiers described above.

Complete example

GUARDRAILS:
  account_number_masking:
    kind: output
    check: not_contains_full_account_number(response)
    action: redact
    message: "Account numbers are masked. Only the last 4 digits are displayed."
    priority: 0

  credential_input:
    kind: input
    check: not_contains_credentials(input)
    action: redact
    message: "Please never share passwords or PINs in this chat."
    priority: 0

  credit_card_detection:
    kind: input
    check: not_matches_pattern(input, "\\b\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{4}\\b")
    action: redact
    message: "Credit card number redacted for your security."

  toxicity_check:
    kind: output
    check: toxicity_score(response) < 0.5
    action: block
    message: "Response blocked due to potential harmful content."
    priority: 1

Memory & Constraints — business rule enforcement (distinct from content safety)
Expressions & functions — CEL expression syntax for check properties
Multi-Agent & Supervisor — ESCALATE action for human review

​Overview

​Application points

​Guardrail properties

​Actions

​Three-tier implementation

​Tier 1: CEL-based checks

​Tier 2: Model-based checks

​Tier 3: LLM-based checks

​Fix strategies

​Example: fix with truncation

​Example: custom fix expression

​Graduated actions

​Streaming evaluation

​Reask behavior

​Priority and evaluation order

​Built-in guardrail templates

​Complete example

​Related pages