How We Designed SafeClaw's Policy Engine

Authensor Team · 2026-02-13

How We Designed SafeClaw's Policy Engine

The policy engine is where SafeClaw's safety guarantees are defined. It takes an action -- a type like filesystem.write and a resource like /home/user/.env -- and returns a decision: allow, deny, or require_approval. This post covers how we designed it, the trade-offs we made, and the problems we solved along the way.

First-Match-Wins

SafeClaw evaluates policy rules in order. The first rule whose condition matches the action determines the effect. If no rule matches, the defaultEffect applies -- and that is deny.

We chose first-match-wins over other evaluation models (most-specific-wins, priority-based, accumulative) for three reasons:

Predictability. Given a policy and an action, you can trace the evaluation by reading rules top to bottom. There is no ambiguity about which rule "wins" when multiple rules could match.

Fail-safe ordering. Broad deny rules at the top of the list catch everything. Specific allow rules below them carve out exceptions. If a new action type appears that nobody anticipated, it hits the broad deny rule and is blocked. This is the natural ordering for deny-by-default systems.

Debuggability. When a user asks "why was this action denied?", the answer is always "it matched rule N" or "no rule matched, so the default deny applied." The simulation feature uses this same logic to show exactly which rule would match for any hypothetical action.

Condition Language

Each rule has a condition that defines when it matches. Conditions support two boolean combinators:

any -- matches if any sub-predicate is true (logical OR)
all -- matches if every sub-predicate is true (logical AND)

Each predicate operates on a field (either action.type or action.resource) with an operator and a value:

| Operator | Behavior |

|----------|----------|

| eq | Exact string match |

| startsWith | Prefix match |

| contains | Substring match |

| matches | Regular expression match (with ReDoS protection) |

| in | Value is one of a list |

This gives teams enough expressiveness to write rules like "require approval for any filesystem action on paths containing .env" or "allow code execution only for commands starting with npm test" without needing a full programming language.

Here is an example rule that blocks write access to credential files:

``json


{
  "id": "deny-credential-writes",
  "effect": "deny",
  "description": "Block writes to credential files",
  "condition": {
    "all": [
      { "field": "action.type", "operator": "startsWith", "value": "filesystem." },
      { "field": "action.resource", "operator": "matches", "value": "\\.(env|pem|key|credentials)$" }
    ]
  }
}



ReDoS Protection

The matches operator compiles user-supplied strings into regular expressions. This is a known attack surface: a carefully crafted regex pattern with nested quantifiers can cause catastrophic backtracking, consuming CPU for minutes or hours on certain inputs.

We solved this with static analysis in src/validate.js. Before any user-supplied pattern is compiled into a RegExp, the safeRegex function checks it against known ReDoS patterns:

(a+)+ -- nested quantifiers on the same group

(a*)+ -- star inside a repeated group

(?:a+)+ -- non-capturing group variant

(a|b+)+ -- alternation with quantifier inside a repeated group

Quantifier-brace combinations like (a+){2,}

If the pattern matches any of these signatures, compilation is rejected and the rule evaluation returns false (no match). The regex never executes. This is a conservative approach -- it might reject some safe patterns that look structurally similar to ReDoS patterns. We prefer false negatives (a rule that does not match when it could) over allowing a regex that can hang the policy engine.



Time-Based Rules

Some policies need to vary by time. A team might want to allow broader agent access during business hours when humans are available to respond to issues, and restrict access during off-hours. Or a temporary exception might be needed for a specific project that should auto-expire.

SafeClaw's rules support two time constraints:

Schedule. The

schedule field accepts hoursUtc (a two-element array defining a UTC hour range) and daysOfWeek (an array of day numbers, 0=Sunday through 6=Saturday). A rule with "hoursUtc": [9, 17], "daysOfWeek": [1,2,3,4,5]

 only matches during weekday business hours UTC. The hour range supports wrapping past midnight.

Expiry. The

expiresAt

 field accepts an ISO 8601 timestamp. After that timestamp, the rule is skipped during evaluation. This is useful for temporary exceptions: "allow this agent to access the production database until Friday at 5pm."

The filterActiveRules function evaluates both constraints before the rule is tested against the action. Expired and out-of-schedule rules are excluded from the active rule set, not from the policy file. The rule still exists in the policy for auditability; it just does not participate in evaluation.



Versioning and Rollback

Every time a policy is saved, SafeClaw auto-versions it. The current policy file is backed up with a version suffix (e.g., policy.json.v3), and the new policy gets the next version number. You can:



List all versions with their timestamps and rule counts
Inspect any previous version
Rollback to a previous version (which creates a new version, preserving the rollback in the history)

This means policy changes are never destructive. If a rule change causes unexpected denials, you can revert in seconds. The version history also serves as an audit trail of policy evolution over time.

Simulation Mode

Before applying a policy change, you can simulate how it would handle specific actions. The simulatePolicy function takes a policy object, an action type, and a resource, and returns which rule would match and what the effect would be -- without actually evaluating the action against the live system.



From the CLI:

bash
safeclaw run --dry-run "deploy the application"



This shows the task configuration, simulates the policy against likely action types, and reports what the policy would decide -- all without starting the agent.

From the dashboard, the Policy Editor tab includes a simulation panel where you can test arbitrary actions against your current rules.

The Default Policy

SafeClaw ships with a default policy that we believe is the right starting point for most teams. It has four rules:

Allow safe reads (safe.read.*)

Require approval for file operations (filesystem., code.)

Require approval for network requests (network.*)


Require approval for secrets, payments, and MCP tools

The default effect is deny`. This means an agent using an unclassified tool or triggering an unknown action type is blocked by default. Teams then add rules to permit the specific actions their workflows require.

For detailed policy syntax and examples, see the policy rule syntax documentation.