17 Apr 2026

Blast Radius by Design: Three Kill Switches and a Sandbox

#safety#autonomy#governance#architecture

If an AI system can take actions that affect the world, the most important question isn't "how smart is it?" It's "how do you stop it?"

Most agent frameworks treat stopping as an afterthought. You can kill the process, but what happens to in-flight tool calls? You can cancel a run, but does the agent respect the cancel signal? You can disable a feature, but can you disable just one bad execution path without taking down the whole system?

Emily's Project Helios answers these questions structurally. There are three kill switch levels, a command sandbox, and an authorized-execution path. Blast radius is a design input, not a retrofit.

Kill Switch Level 1: Global

AUTONOMOUS_PULSE_ENABLED=false

This is the big red button. One environment variable. When it's off, the autonomous pulse scheduler stops firing and no new steps are claimed. In-flight work completes or times out via lease expiration; no new work starts.

When to use it: You've noticed something is wrong and you don't know yet what. You want all autonomy paused while you investigate. One variable flip, system-wide quiet.

What it doesn't do: It doesn't kill the chat endpoint, memory operations, or anything else user-facing. Synchronous work continues normally. Only the autonomous loop stops.

Kill Switch Level 2: Task-Level

POST /helios/tasks/{task_id}/pause

A single task can be paused without affecting any other task. When paused, the worker won't claim new steps from that task. Steps already in flight complete or expire.

When to use it: You've identified a specific task that's misbehaving — maybe it's looping, maybe its verifications are failing in a confusing way, maybe the work is no longer needed. You can stop just that task while the rest of the autonomous system keeps working.

Why this matters: The global switch is heavy. Most problems are localized. Being able to stop one task surgically means you use the heavy switch rarely.

Kill Switch Level 3: Emergency Database Update

UPDATE tasks SET status='killed', kill_switch_reason='...' WHERE id='...';

The last-resort switch. Direct database write. Bypasses all API paths. Any code reading the task's status sees the killed state.

When to use it: You've lost faith in the API paths. Maybe the worker is stuck, the pause endpoint isn't responding, or you're seeing behavior the system shouldn't be able to produce. When in doubt, write to the database.

Why have this at all: Because kill switches you can't reach under adversarial conditions aren't kill switches. The database write is the authoritative source of truth — if a task's row says killed, no layer can override it.

Why three levels, not one

Three levels exist because failure modes differ:

Global: "something is wrong somewhere, stop everything"
Task: "I know what's wrong, stop just this"
Emergency: "I don't trust the normal path, go directly to the source"

A single switch would force you to trade between blast radius and surgical precision. Three switches let you match the kill mechanism to the problem.

The command sandbox

Kill switches stop new work. But what about the work that's already happening? If an autonomous step can run arbitrary shell commands, the damage happens before you can kill it.

Emily's command_validator.py gates command execution with an allowlist. Commands that aren't on the allowlist don't run — they're rejected at the sandbox boundary before any effect is possible.

The allowlist lives at an authorized execution path: /www/NO/emily_v2/bin/authorized/. Commands invoked from autonomous steps must resolve to binaries in that directory. If they don't, the sandbox rejects them.

This is not perfect. Any sandbox has failure modes. But it means "curl to a pastebin and execute" or "rm -rf the world" aren't in the reachable set without explicit authorization.

Lease-based timeouts

Beyond explicit kill switches, Helios uses leases as a passive safety mechanism. Every claimed step has a lease that expires after 5 minutes. If a worker crashes holding a lease, the TaskReaper scans for expired leases every 5 minutes and releases them.

This means no step can be owned indefinitely by a dead worker. Stalled work clears itself. Crash recovery is automatic.

The implication for blast radius: a single malfunctioning worker can hold at most a few steps, for at most 5 minutes. After that, the steps are released and the system can choose whether to retry or abandon them.

The defense-in-depth stack

Put together, the safety stack is:

Authorization — who can create autonomous tasks (clone_safety.py gates, JWT)
Validation at creation — invalid task shapes rejected before they exist
Command sandbox — only allowlisted commands can execute
Verification — post-conditions verified deterministically; failures fail loud
Leases — no step held indefinitely
Kill switches — three levels of stop
Audit trail — task_events logs what happened, always

You have to fail through all of these to get unbounded damage. And even then, the per-user database isolation means damage is contained to a single user's scope.

The philosophical move

There's a pattern here. Safety isn't implemented by asking the AI to be safe. It's implemented by making unsafe actions structurally impossible or structurally recoverable.

"Ask the AI to be safe" is the LLM-driven agent framework pattern. It works until it doesn't.

"Make unsafe actions impossible at the boundary" is the Emily pattern. It works because it doesn't depend on the AI's judgment — the boundary enforces the rule regardless of what the AI tries.

The commercial frame

For anyone evaluating Emily for regulated or high-stakes environments: the safety surface isn't a policy document. It's code. You can read it. You can audit it. You can reason about what can and cannot happen.

Policy documents fail audits. Structural guarantees pass them.

That's blast-radius-by-design: engineering safety into the substrate, not hoping it emerges from the behavior.

Part of the Emily OS architecture philosophy series.