SafeClaw Architecture: How Action Gating Works Under the Hood

Authensor Team · 2026-02-13

SafeClaw Architecture: How Action Gating Works Under the Hood

SafeClaw gates every action an AI agent attempts to take. This post walks through the four core components that make that possible: the gateway, the classifier, the policy engine, and the audit chain. If you want to understand exactly what happens between your agent deciding to act and that action executing, this is the post.

The Request Lifecycle

When an AI agent running through SafeClaw wants to execute a tool call, the following sequence occurs in milliseconds:


Agent requests tool call (e.g., Bash({ command: "npm install express" }))
  -> Gateway intercepts via PreToolUse hook
  -> Classifier maps tool name + input to action type + resource
  -> Local pre-filter: safe reads pass immediately (no network call)
  -> Policy evaluation: Authensor control plane checks action against policy
  -> Decision: allow / deny / require_approval
  -> Audit entry appended with SHA-256 hash chain
  -> Action executes (or doesn't)



Every component in this chain is a pure function or a stateless hook. There is no hidden state, no ambient authority, no "the agent earned trust" escalation path.

Component 1: The Gateway

The gateway (src/gateway.js) is a PreToolUse hook that the Claude Agent SDK calls before every tool execution. It is the single enforcement point. If the gateway denies an action, the tool call never runs.



The gateway's responsibilities are narrow by design:

Receive the tool name and input from the SDK
Pass them to the classifier
Check workspace path enforcement (if workspace scoping is enabled)
Short-circuit safe reads via the local pre-filter
Build an action envelope and send it to the Authensor control plane
Handle the three decision outcomes: allow, deny, require_approval

For require_approval: poll the control plane, send notifications (SMS, webhooks, SSE), and wait for human resolution


Append an audit entry for every decision, regardless of outcome

The gateway fails closed. If the Authensor control plane is unreachable and there is no cached allow decision, the action is denied. This is not configurable. We chose fail-closed because the alternative -- fail-open -- means an agent with a network blip gets unrestricted access to your file system.

Component 2: The Classifier

The classifier (src/classifier.js) is a pure function that maps SDK tool names and their inputs to Authensor action types. For example:



| SDK Tool | Action Type | Resource |
|----------|------------|----------|

| Bash | code.exec | The command string (sanitized) |

| Write | filesystem.write | The file path |

| WebFetch | network.http | The URL |

| mcp__github__create_issue | mcp.github.create_issue | JSON summary of input |

The classifier also runs risk signal detection. This is a set of pattern matchers that tag suspicious commands with advisory signals like obfuscated_execution (base64-decode piped to bash), credential_adjacent (touching .ssh/id_rsa), broad_destructive (rm -rf against system paths), pipe_to_external (piping data to curl/nc), and persistence_mechanism (modifying crontab or shell rc files).

Risk signals never change policy decisions. They are metadata for the human reviewer. When you see an approval request in the dashboard tagged with credential_adjacent, you know to look more carefully.

Critically, the classifier sanitizes everything before it leaves your machine. API keys, tokens, and credentials are redacted using pattern matching against known secret formats (Anthropic keys, OpenAI keys, GitHub PATs, Bearer tokens, and generic KEY=value pairs). Only action metadata -- the action type and a truncated, sanitized resource string -- is sent to the control plane.



Component 3: The Policy Engine

The policy engine (src/policy.js) handles rule evaluation, versioning, time-based rules, and simulation. For full documentation, see our policy engine architecture page.

Policy evaluation follows first-match-wins semantics. Rules are evaluated in order. The first rule whose condition matches the action determines the effect. If no rule matches, the defaultEffect applies -- and that default is deny.

Conditions support boolean combinators (any, all) and five operators: eq, startsWith, contains, matches (regex), and in. Regex patterns are validated for ReDoS safety before execution using static analysis that rejects nested quantifiers.

Rules can have time constraints. The schedule field supports UTC hour ranges and day-of-week filters. The expiresAt field auto-disables rules after a timestamp. This lets teams create temporary exceptions ("allow network access during business hours this week") without forgetting to revoke them.



Every policy save is auto-versioned with a backup. You can list versions, inspect any previous version, and rollback with a single command.

Component 4: The Audit Chain

The audit ledger (src/audit.js) is an append-only JSONL file where every gateway decision is recorded. Each entry includes the timestamp, tool name, action type, resource, outcome, receipt ID, task ID, profile name, decision source, and risk signals.

What makes it tamper-proof is the hash chain. Every entry includes a prevHash field containing the SHA-256 hash of the previous line's raw JSON. The first entry chains from a GENESIS sentinel. To verify the chain, safeclaw audit verify re-hashes every line and confirms each prevHash matches. If a single entry is modified, inserted, or deleted, the chain breaks.



This design means you get cryptographic proof of your agent's complete action history. For compliance scenarios -- SOC 2, HIPAA, EU AI Act -- you can export this ledger as evidence that every agent action was evaluated and recorded.

How It All Fits Together

The architecture is deliberately minimal. The gateway is a hook. The classifier is a pure function. The policy engine is stateless rule evaluation. The audit chain is append-only writes. There is no ORM, no database, no message queue, no microservice mesh. It is 25 source files, zero third-party runtime dependencies, and 446 tests.

We built it this way because security tooling that is too complex to audit is not security tooling. Every line of SafeClaw's client is open source and readable in an afternoon.

bash
npx @authensor/safeclaw

That is the entire install. The architecture does the rest.