How We Designed SafeClaw's Policy Engine
How We Designed SafeClaw's Policy Engine
The policy engine is where SafeClaw's safety guarantees are defined. It takes an action -- a type like filesystem.write and a resource like /home/user/.env -- and returns a decision: allow, deny, or require_approval. This post covers how we designed it, the trade-offs we made, and the problems we solved along the way.
First-Match-Wins
SafeClaw evaluates policy rules in order. The first rule whose condition matches the action determines the effect. If no rule matches, the defaultEffect applies -- and that is deny.
We chose first-match-wins over other evaluation models (most-specific-wins, priority-based, accumulative) for three reasons:
Condition Language
Each rule has a condition that defines when it matches. Conditions support two boolean combinators:
any-- matches if any sub-predicate is true (logical OR)all-- matches if every sub-predicate is true (logical AND)
Each predicate operates on a field (either action.type or action.resource) with an operator and a value:
| Operator | Behavior |
|----------|----------|
| eq | Exact string match |
| startsWith | Prefix match |
| contains | Substring match |
| matches | Regular expression match (with ReDoS protection) |
| in | Value is one of a list |
This gives teams enough expressiveness to write rules like "require approval for any filesystem action on paths containing .env" or "allow code execution only for commands starting with npm test" without needing a full programming language.
Here is an example rule that blocks write access to credential files:
``json
{
"id": "deny-credential-writes",
"effect": "deny",
"description": "Block writes to credential files",
"condition": {
"all": [
{ "field": "action.type", "operator": "startsWith", "value": "filesystem." },
{ "field": "action.resource", "operator": "matches", "value": "\\.(env|pem|key|credentials)$" }
]
}
}
`
ReDoS Protection
The
matches operator compiles user-supplied strings into regular expressions. This is a known attack surface: a carefully crafted regex pattern with nested quantifiers can cause catastrophic backtracking, consuming CPU for minutes or hours on certain inputs.
We solved this with static analysis in
src/validate.js. Before any user-supplied pattern is compiled into a RegExp, the safeRegex function checks it against known ReDoS patterns:
(a+)+ -- nested quantifiers on the same group
(a*)+ -- star inside a repeated group
(?:a+)+ -- non-capturing group variant
(a|b+)+ -- alternation with quantifier inside a repeated group
Quantifier-brace combinations like (a+){2,}
If the pattern matches any of these signatures, compilation is rejected and the rule evaluation returns
false (no match). The regex never executes. This is a conservative approach -- it might reject some safe patterns that look structurally similar to ReDoS patterns. We prefer false negatives (a rule that does not match when it could) over allowing a regex that can hang the policy engine.
Time-Based Rules
Some policies need to vary by time. A team might want to allow broader agent access during business hours when humans are available to respond to issues, and restrict access during off-hours. Or a temporary exception might be needed for a specific project that should auto-expire.
SafeClaw's rules support two time constraints:
Schedule. The schedule field accepts hoursUtc (a two-element array defining a UTC hour range) and daysOfWeek (an array of day numbers, 0=Sunday through 6=Saturday). A rule with "hoursUtc": [9, 17], "daysOfWeek": [1,2,3,4,5] only matches during weekday business hours UTC. The hour range supports wrapping past midnight.
Expiry. The expiresAt field accepts an ISO 8601 timestamp. After that timestamp, the rule is skipped during evaluation. This is useful for temporary exceptions: "allow this agent to access the production database until Friday at 5pm."
The
filterActiveRules function evaluates both constraints before the rule is tested against the action. Expired and out-of-schedule rules are excluded from the active rule set, not from the policy file. The rule still exists in the policy for auditability; it just does not participate in evaluation.
Versioning and Rollback
Every time a policy is saved, SafeClaw auto-versions it. The current policy file is backed up with a version suffix (e.g.,
policy.json.v3), and the new policy gets the next version number. You can:
- List all versions with their timestamps and rule counts
- Inspect any previous version
- Rollback to a previous version (which creates a new version, preserving the rollback in the history)
This means policy changes are never destructive. If a rule change causes unexpected denials, you can revert in seconds. The version history also serves as an audit trail of policy evolution over time.
Simulation Mode
Before applying a policy change, you can simulate how it would handle specific actions. The
simulatePolicy function takes a policy object, an action type, and a resource, and returns which rule would match and what the effect would be -- without actually evaluating the action against the live system.
From the CLI:
`bash
safeclaw run --dry-run "deploy the application"
`
This shows the task configuration, simulates the policy against likely action types, and reports what the policy would decide -- all without starting the agent.
From the dashboard, the Policy Editor tab includes a simulation panel where you can test arbitrary actions against your current rules.
The Default Policy
SafeClaw ships with a default policy that we believe is the right starting point for most teams. It has four rules:
Allow safe reads ( safe.read.*)
Require approval for file operations ( filesystem., code.)
Require approval for network requests ( network.*)
Require approval for secrets, payments, and MCP tools
The default effect is
deny`. This means an agent using an unclassified tool or triggering an unknown action type is blocked by default. Teams then add rules to permit the specific actions their workflows require.
For detailed policy syntax and examples, see the policy rule syntax documentation.