My AI Runs Shell Commands Unattended — So Before Launch I Had It Build a Blacklist to Stop Itself From Wiping a Customer's Server

The whole point of Newton is an AI agent that sits on your own server and actually does the work — not just chats back. It opens its own terminal, runs commands, edits files, deploys things. And to let it work continuously without poking me every ten seconds, I run it in a mode called dontAsk: "don't stop to ask permission for each command, just go." But the moment I got ready to put that on a customer's real machine, I sat with one ugly question for a long time — what happens if, one day, it mistypes rm -rf /? Who's there to stop it in time?

Why Newton has to run in "don't ask" mode

Most AI coding tools ask before every command — "I'm about to run npm install, okay?" You click Yes, then it runs. Safe. But that model can't carry the thing Newton is trying to be.

Newton is built so Tim (my AI agent) can carry a whole job from start to finish on its own. A single task might be 50 or 60 commands in a row. If you had to click Yes on every one, you'd be exhausted before lunch — and it would gut the core promise I've made since I first opened Newton for sale: "hire an AI to do the work, not to make you babysit it pressing buttons."

So I run Tim in dontAsk mode. It executes straight through, like an employee you trust to work without you standing behind them. But there's always another side to that coin — freedom comes with risk.

The nightmare I wanted to catch before a customer did

When I run Tim on my own box, I don't worry much. If it breaks something, I eat the cost. But the second I put it on a customer's machine I'm not watching, the math changes completely.

Picture it: a customer tells Tim "clear out the junk in my temp folder." Tim mis-parses the path by a hair — or the model has one blurry second — and composes rm -rf / instead of rm -rf ./tmp/. In dontAsk mode, it runs instantly. Nobody stops it. By the time anyone notices, the customer's server is wiped clean.

This isn't paranoia. AI does misfire. I've written before about the time one of my three engines silently forgot everything on a new session — a single wrong assumption can wreck a job. When the mistake is a misplaced task on a dashboard, you fix it. When the mistake is "delete the entire machine," that's a point of no return.

Before launch — I had Tim red-team itself

Before I shipped Newton v2 to real paying customers, I didn't rush the deploy. I had Tim run what's basically a red-team security audit on itself — I told it to "put on the attacker's hat" and find every way its own system could be broken or abused.

It tiered the risks: the most urgent ones (leaked tokens, path traversal) got fixed first, then on down the list. Most of them are closed now. But one risk I cared about more than the rest, because it came directly from Newton's own capability — the very fact that it can run commands without asking.

And the fix here couldn't be "just turn off dontAsk," because that is the product. Kill it and Newton becomes a chatbot that can't actually do anything. So I needed something else: let it keep working freely, but draw one line it can never, ever cross.

The fix — a deny-list of forbidden commands

So Tim built what's called a deny-list — a blacklist of "life-threatening" command patterns that the system intercepts before the command ever reaches the terminal. Every single command Tim wants to run in dontAsk mode passes through this gate first. If it matches a pattern on the blacklist, it's blocked on the spot — never runs — and instead I get a Telegram alert: "a dangerous command was just stopped."

A few examples of what's on the list:

rm -rf /            # wipe the machine's root
rm -rf ~            # nuke the entire home directory
mkfs                # reformat the disk
dd of=/dev/sda      # overwrite the raw disk
:(){ :|:& };:       # fork bomb — freezes the machine
> /dev/sda          # dump straight onto the disk

Here's the simplest way I think about it. It's like hiring a great employee, handing them the office keys, letting them work however they want without checking in on every little thing — except there's one drawer I keep locked, and I tell them: "Everything's fair game, except that drawer. Don't touch it." The deny-list is that locked drawer — almost total freedom, with one red line that can't be crossed.

The key detail is that it catches things at the layer before execution, not after. With these destructive commands, finding them in a log after the fact is meaningless — the data's already gone. That's different from ordinary data work, where I still have Tim keep snapshots it can restore from. A wipe-the-disk command has no undo. The only gate that matters is "never let it run in the first place."

Why I thought about this when no customer asked me to

When customers buy Newton, they're excited about "an AI that does the work for me." Not one of them has ever asked, "but what if the AI deletes my server?" To me, that's exactly the job of whoever builds the product — to think about the thing the customer hasn't thought of yet, and guard against it before it becomes bad news.

I hold to a simple rule: the best setup I run for myself is the one the customer should get. This deny-list doesn't only run on customer machines — it runs on every instance, including the servers that make me money every day. If I trust it to protect the boxes my income depends on, it's good enough to protect a customer's.

And I didn't write a single line of this deny-list myself. I just told Tim: "before we open this to customers, find the ways you could break a machine, and close them." It audited itself, listed the dangerous patterns, wrote the guard, tested that it actually blocks them, and reported back what it had locked down — like having a locksmith inspect their own house and add deadbolts where it's weak.

The lesson — the more you let an AI do, the sharper your red lines have to be

What I took away from this:

1. "Capability" and "safety" aren't a thing you trade one for the other. People assume that to let an AI run free you have to accept the risk, or that to be safe you have to approve everything by hand. Place the guardrail precisely and you get both — freedom inside a safe zone, hard-locked only at the spots where one mistake ends the game.

2. Prevent-before-it-happens beats clean-up-after, every time. For recoverable work, a snapshot or an undo is enough. For work that's permanently destructive, the only option is to stop it from happening. A deny-list at the pre-execution layer matters far more than reviewing a log later.

3. Letting the AI audit itself works unreasonably well. Telling Tim "put on the attacker's hat and find ways to break yourself" surfaced more holes than I'd have found on my own — because it knows its own system best, knows exactly where it's brittle.

4. Trust can't be sold with words. It has to be built into the system. I can tell a customer "Newton is safe" all day. What makes it true is a guard baked into the code — not a sentence on a sales page.

This is what I mean when I say Newton "actually does the work"

There are plenty of AIs out there that are brilliant and answer beautifully — but when it's time to actually do something on a real server, they can't touch anything. A human has to take the answer and run every step themselves. Newton is different: it can really do it. And precisely because it can, the deny-list kind of safety is the thing I had to think through most carefully — because it's the difference between "a trustworthy assistant" and "a time bomb."

If you want an AI agent like this of your own — sitting on your own private server, doing real work from writing code to building sites to running ads to handling the business, working continuously without you watching it, but with a guardrail against mistakes baked in from the first minute — Newton is open for new customers now. Every customer gets the same deny-list I use to protect my own servers every day, set up for you in under 10 minutes. The best setup I run for myself is the one you get. See how it works →

— Pond