Designing a Rate Limiter for AI Agent Safety

Authensor Team · 2026-02-13

Designing a Rate Limiter for AI Agent Safety

A runaway AI agent can execute hundreds of file operations per second. Without constraints, a single bad prompt can trigger a cascade of deletions, rewrites, or API calls that overwhelm your system before you even notice. Rate limiting is one of the simplest and most effective safety mechanisms in SafeClaw, and we put significant thought into getting it right.

Why Rate Limiting Matters for Agents

Rate limiting is a well-understood concept in web services — prevent abuse by capping request throughput. But AI agents present a different challenge. The "user" is an autonomous program that doesn't slow down when things go wrong. In fact, agents often speed up when encountering errors, retrying operations in tight loops.

We've seen agents attempt to install the same npm package 47 times in a row because each attempt failed due to a network error the agent couldn't detect. Without a rate limiter, those 47 attempts happened in under 3 seconds.

SafeClaw's rate limiter exists to create breathing room — to ensure that there's always time for a human to notice and intervene when an agent goes off the rails.

Our Design: Token Bucket with Action Categories

We chose a token bucket algorithm because it naturally handles burst patterns. AI agents don't execute actions at a constant rate — they tend to work in bursts (write several files quickly, then pause to think). A strict fixed-window limiter would either be too restrictive during bursts or too permissive during sustained activity.

Our implementation adds a twist: separate buckets per action category. File operations, shell commands, network requests, and package installations each have independent rate limits. This means an agent can write files at full speed without consuming its budget for network requests.

The default configuration looks like this:


file_operations: 30/minute, burst: 10
shell_commands: 15/minute, burst: 5
network_requests: 10/minute, burst: 3
package_installs: 5/minute, burst: 2

These defaults are deliberately conservative. We'd rather have users relax limits for their specific workflow than discover they were too loose after an incident.

Adaptive Rate Limiting

Static rate limits work well for steady-state operations, but what about escalating risk? We added an adaptive component: when SafeClaw's risk signal detection system reports elevated risk scores, the rate limiter automatically tightens.

If the session risk score crosses a warning threshold, rate limits are halved. If it crosses a critical threshold, limits are reduced to one action per category per minute — effectively forcing human approval for every action.

This adaptive behavior means the rate limiter isn't just preventing volume-based abuse. It's actively responding to the quality of the agent's behavior.

What Happens When the Limit Is Hit

When an agent exceeds its rate limit, SafeClaw queues the action and returns a structured wait signal to the agent framework. Well-behaved agents (Claude Code, Cursor, Copilot) interpret this as a backpressure signal and slow down naturally.

For agents that don't respect backpressure, SafeClaw holds the action in a pending state until the bucket refills. The agent's tool call simply takes longer to return — from the agent's perspective, the operation was just slow.

We chose this approach over hard rejection because it's more compatible with the variety of agent frameworks in the ecosystem. Some handle errors gracefully; all of them handle latency.

Configuration

Rate limits are fully configurable per policy profile. You can set different limits for different projects, disable rate limiting for trusted operations, or create custom action categories with their own budgets. See the full configuration reference in our documentation.

The implementation is open source on GitHub. We welcome feedback from the community on defaults and edge cases.

Rate limiting is not glamorous, but it's one of those features that prevents catastrophic outcomes. A few milliseconds of delay is a small price for the guarantee that no agent can run away unchecked.