Why We Built SafeClaw: The Problem Nobody Was Solving

Authensor Team · 2026-02-13

Why We Built SafeClaw: The Problem Nobody Was Solving

In late January 2026, we sat down and listed every tool an AI agent could call on a developer's machine. The list was alarming: file writes, file deletions, arbitrary shell commands, network requests, package installations, git operations, SSH access, cron job creation. All of it ungated. All of it executing the instant the model decided to act.

The industry had invested enormously in prompt safety, output filtering, and content moderation. But nobody had built the obvious thing: a gate between the agent's intent and its execution. A checkpoint that asks "should this actually happen?" before rm -rf runs against your project root.

That gap is why Authensor exists, and it is why we built SafeClaw.

The Moment It Clicked

We were running an early Claude agent on a codebase cleanup task. The agent decided to "organize" a directory by moving files into subdirectories. Reasonable intent. But it moved configuration files that broke the build, deleted a .env file it classified as "temporary," and ran npm install with a package name it hallucinated. Three destructive actions, executed in seconds, with zero human confirmation.

Nothing malicious happened. The model was trying to help. But "trying to help" with unrestricted file system and shell access is the core problem. The risk is not adversarial AI. The risk is competent AI making confident mistakes at machine speed.

What Existed Before

We looked at every option:

Docker/sandboxing isolates the environment but does not distinguish between safe and dangerous actions inside the container. The agent can still rm -rf /workspace from inside the sandbox.
Prompt engineering ("never delete files") is a suggestion, not enforcement. Models do not reliably follow system prompts under all conditions.
IAM/RBAC systems were built for human users authenticating to cloud services, not for intercepting tool calls from an SDK running on localhost.
Guardrails libraries focus on LLM input/output filtering, not on the actions an agent takes between those boundaries.

None of these solve the fundamental problem: every tool call an AI agent makes should be classified, evaluated against a policy, and either allowed, denied, or held for human approval before it executes. That is action-level gating, and it did not exist.

Our Design Principles

We established three rules before writing a single line of code:

1. Deny by default. If a policy does not explicitly allow an action, it does not happen. This is the only safe default when the principal is a non-deterministic language model. We wrote about this in depth in our deny-by-default philosophy. 2. Gate at the action layer, not the prompt layer. We intercept the actual tool call -- Bash({ command: "rm -rf /tmp" }) -- not the text the model generates. This means gating works regardless of prompt injection, jailbreaks, or model behavior changes. The action is the truth; the prompt is a suggestion. 3. Zero trust for agent principals. The agent is not a trusted user. It is an untrusted principal that happens to be very capable. Every action it requests goes through the same evaluation pipeline: classify the action type, extract the resource, check the policy, and enforce the decision. No exceptions, no "trust after N successful actions," no implicit escalation.

What SafeClaw Does

SafeClaw is an open-source client that sits between your AI agent and the actions it wants to take. When the agent calls a tool -- write a file, execute a shell command, make a network request -- SafeClaw's gateway intercepts it, classifies it into an action type like filesystem.write or code.exec, and checks it against your policy.

The policy has three possible outcomes: allow (the action proceeds immediately), deny (the action is blocked and the agent is told why), or require_approval (the action is held while you review it in the dashboard, on your phone, or via CLI).

Every decision is logged to a tamper-proof audit ledger with SHA-256 hash chaining. You can verify the integrity of the entire chain with a single command: safeclaw audit verify.

You can install it right now:

``bash


npx @authensor/safeclaw

A browser wizard walks you through provider setup in under sixty seconds. The default policy requires approval for file writes, shell commands, and network requests. Safe read operations pass through locally without any network call.

Why This Matters Now

AI agents are getting more capable every month. They are writing code, managing infrastructure, handling customer data, and operating across multi-agent systems. The capability curve is steep. The safety tooling curve has been flat.

We believe every team running AI agents needs action-level gating. Not eventually. Now. SafeClaw is our contribution to that effort: fully open source, MIT licensed, zero third-party dependencies, and designed to be the default safety layer for any agent framework.

The problem nobody was solving is now solved. The question is whether teams will adopt gating before or after their first incident.

We would rather it be before.