GUARDRAILS: block defines named guardrail rules.
Overview
ABL guardrails use a three-tier evaluation model:- CEL-based (Tier 1) — fast, deterministic expression checks.
- Model-based (Tier 2) — pre-trained safety classification models (e.g., OpenAI moderation).
- LLM-based (Tier 3) — natural language checks evaluated by an LLM.
Application points
Thekind property determines when the guardrail is evaluated during the agent’s processing pipeline.
| Kind | Evaluation point |
|---|---|
input | Before the user’s message reaches the LLM. |
output | After the LLM generates a response, before it is sent to the user. |
both | Evaluated on both input and output. |
tool_input | Before parameters are sent to a tool call. |
tool_output | After a tool returns its result, before the result enters the LLM context. |
handoff | Before context is passed to another agent during a handoff. |
Guardrail properties
| Property | Type | Required | Default | Description |
|---|---|---|---|---|
name | string | Yes | — | Unique identifier for the guardrail (the YAML key). |
kind | string | Yes | — | Application point. See Application points. |
check | string | No | — | CEL expression to evaluate (Tier 1). Omit for model-based or LLM-based. |
action | string | Yes | — | Action when the check fails. See Actions. |
message | string | No | — | Human-readable message displayed or logged when the guardrail triggers. |
priority | number | No | 100 | Evaluation priority. Lower values are evaluated first. |
provider | string | No | — | Model provider name for Tier 2 checks (e.g., openai_moderation). |
category | string | No | — | Safety taxonomy category for Tier 2 (e.g., hate, violence). |
threshold | number | No | — | Score threshold (0.0—1.0) for model-based checks. |
llm_check | string | No | — | Natural language prompt for Tier 3 LLM-based checks. |
severity_actions | object | No | — | Per-severity action overrides. See Graduated actions. |
fix_strategy | string | No | — | Fix strategy when action: fix. See Fix strategies. |
fix_expression | string | No | — | CEL expression for the custom fix strategy. |
max_reasks | number | No | 2 | Maximum reask attempts when action: reask. |
filter_min_length | number | No | — | Minimum content length after filtering. Below this threshold, block instead. |
streaming | boolean | No | false | Enable mid-stream evaluation for streaming responses. |
streaming_interval | string | No | — | Streaming evaluation granularity. See Streaming evaluation. |
Actions
Theaction property determines the runtime behavior when a guardrail check fails.
| Action | Behavior |
|---|---|
block | Reject the content entirely. For input, the user message is discarded. For output, the response is withheld. |
warn | Allow the content through but emit a warning event. The message is logged, not sent to the user. |
redact | Replace the offending content with a redaction marker and continue. The sanitized content is passed through. |
escalate | Trigger human escalation for review. The content is held pending human decision. |
fix | Automatically repair the content using a fix strategy. See Fix strategies. |
reask | Reject the LLM output and re-prompt with the guardrail’s message appended as additional guidance. |
filter | Remove the offending portions while preserving the rest of the content. |
Three-tier implementation
Tier 1: CEL-based checks
CEL (Common Expression Language) checks are fast, deterministic rules evaluated without calling an external model. Use thecheck property with a CEL expression.
Tier 2: Model-based checks
Model-based checks use a pre-trained classification model to score content. You specify aprovider, an optional category, and a threshold.
Tier 3: LLM-based checks
LLM-based checks use a natural language prompt evaluated by an LLM. Use thellm_check property with a descriptive prompt.
Fix strategies
Whenaction: fix, the fix_strategy property determines how content is repaired.
| Strategy | Behavior |
|---|---|
truncate | Truncate content to the maximum allowed length. |
strip_html | Remove HTML tags from the content. |
redact_pii | Detect and replace PII patterns with redaction markers. |
normalize | Normalize whitespace, encoding, and special characters. |
custom | Apply a custom CEL expression defined in fix_expression. |
Example: fix with truncation
Example: custom fix expression
Graduated actions
Useseverity_actions to apply different actions based on the severity of the violation. The keys are severity labels and the values are action names.
Streaming evaluation
For streaming responses, guardrails can evaluate content as it is generated rather than waiting for the complete response. | Property | Values | Description | | -------------------- | --------------------------------- | ------------------------------------ | ----------------------------- | |streaming | true | false | Enable mid-stream evaluation. |
| streaming_interval | token, sentence, chunk_size | Granularity of streaming evaluation. |
message is sent to the user.
Reask behavior
Whenaction: reask, the runtime rejects the LLM output, appends the guardrail’s message as additional guidance, and re-prompts. The max_reasks property controls how many times this can happen before falling back to a block.
Priority and evaluation order
Guardrails are evaluated in order ofpriority (lower values first). When multiple guardrails have the same priority, they are evaluated in declaration order.
A block action from any guardrail stops further evaluation. warn actions do not stop evaluation; all subsequent guardrails continue to run.
Built-in guardrail templates
ABL provides five built-in guardrail templates that you can reference by convention:| Template | Kind | Check | Action |
|---|---|---|---|
account_number_masking | output | Full account numbers in response | redact |
credential_input | input | Passwords, PINs, security codes | redact |
ssn_protection | input | SSN patterns | redact |
profanity_filter | input | Blocked words list | block |
harmful_content_detection | both | Harmful instruction patterns | escalate |
Complete example
Related pages
- Memory & Constraints — business rule enforcement (distinct from content safety)
- Expressions & functions — CEL expression syntax for
checkproperties - Multi-Agent & Supervisor — ESCALATE action for human review