Newton Lets You Run Your AI on 3 Engines — But Only One Forgot Everything Every Time I Opened a New Chat

The other day I switched Tim — my AI agent — to run on a different engine, opened a fresh chat, and typed one short question: "What are we in the middle of right now?" It answered like we'd never met. Total blank. Meanwhile, the day before, we'd filled an entire screen talking about that exact thing.

It's a tiny thing. But the moment it happened it felt deeply wrong — like talking to an assistant who suddenly doesn't recognize you. So today I want to walk through a feature so small it almost isn't a feature, that took barely any time to fix, but is the whole reason my AI can feel like the same person no matter which brain I'm running it on.

The setup — three brains, but each one remembers differently

I've written before about how Newton added a third CLI. Now my customers — and I — can choose which engine the AI agent runs on: Claude, Codex, or Antigravity. Each is strong in a different way, each is priced differently, and you can switch between them whenever you want.

What I didn't think about up front: these three brains don't "remember" the past the same way at all.

Tim has one central memory file called memory.md. It records everything — what we did in past sessions, which projects are still open, what we've already decided. It's the assistant's notebook, the one you flip open before you start work each morning. (I told the longer story of the memory system that makes my AI smarter every day already.)

When I tested them one at a time, here's what I saw:

Claude — reads memory.md on its own the moment a session starts (it already has an auto-memory mechanism plus reads the local instruction files). Remembers instantly.
Antigravity — also reads memory.md on its own. Remembers.
Codex — boots up empty and reads nothing for you. Forgets every single time.

The one that answered me like a stranger at the top of this story? That was Codex.

The real bug — the one that's "blind" isn't at fault

When I actually dug in, I understood why Codex was the only one that forgot.

The way Tim runs Codex — in batch mode, fired one task at a time rather than held open on a live screen — there's no built-in step that pulls old memory in automatically. There is a central rules file on the machine (AGENTS.md, which says "before you start work, read memory.md first"). But that one rule sits buried in the middle of a long file, and Codex just skipped right over it.

Here's the plain-language version. I've hired three assistants. The first two are diligent — they walk into the office, grab yesterday's notebook off the shelf, and read it before doing anything. The third one isn't lazy. Nobody ever told him where that notebook lives. So he walks in and starts fresh every morning, like it's his first day on the job.

So my problem was never "why is Codex dumb." It was "how do I get Codex to read the same notebook as the other two — without touching the two that are already doing it right?"

The fix — feed the memory in on the very first sentence, but only to the one that forgets

Here's how Tim solved it: add a thin layer inside the adapter (the code that translates commands into the shape each engine understands — the same layer I wrote about when I laid down an event protocol to keep the chat screen clean). When a new session opens, take the entire contents of memory.md and prepend it to the very first message sent into Codex, wrapped in a special sentinel tag, <session-memory>.

<session-memory>
... the full contents of memory.md ...
</session-memory>

(followed by the actual message I typed)

That's it. Now Codex reads yesterday's notebook at the same moment it receives my first command. Ask it "what are we in the middle of," and it answers right away.

But the part I think is genuinely well-designed is the two small details that keep it from breaking anything else:

1. It only injects on the first message of a new session — not on every message. Pasting the whole memory file into every turn would be wasteful and noisy. Once at session start is plenty — read the notebook once in the morning and you remember it all day.

2. When you scroll back through an old chat, it has to "strip" the injected memory out — and this part matters a lot. Without it, every time I scroll back through a past conversation, my first message would show up with a giant wall of memory.md stacked in front of it — even though all I actually typed was one sentence. So on history replay, Tim cuts the <session-memory> block out. The user sees only the real message they typed. Clean, exactly like before.

Put simply: the memory gets fed to the machine to read, but never shown to the human — the same principle I keep coming back to, that anything not designed to be seen by a person shouldn't leak onto the screen.

The subtler call — why not just inject it into all three?

At first I thought the same thing a lot of people would: "Just paste it into all three, then they're all the same." But Tim doesn't do that, and I think the decision not to is smarter than the decision to.

It sets a flag in each adapter — "does this one need memory primed?" — and turns it on for Codex only. Claude and Antigravity stay off, because:

Claude and Antigravity already remember on their own. Injecting it again just burns tokens every single session, and risks confusing them with two copies of the same information.
Antigravity runs a special screen mode (a TUI) that Tim has to "read off the screen." Stuffing a long block of text in there would corrupt that display and make everything Tim reads back wrong. So it has to be left alone.

This is a lesson I keep running into: a good fix isn't "make every case identical with one blanket rule." It's "understand where each case differs, and fix only the one that needs it." If I'd been lazy and pasted it into all three, the code might look tidy — but it would quietly break the other one, the same way my AI's memory got silently sandbox-blocked for 3 days. The small detail you skip over is the one that bites you later.

The lesson — "one Tim" doesn't come for free

My goal from the start was for Tim to feel like the same Tim no matter which brain it's running on. Newton customers can switch engines back and forth, but the personality, the memory, the way it thinks — all of that has to stay constant.

Sounds like it should just happen, right? They're reading the same file. But no — each engine has its own habits for pulling information in. That feeling of "it's the same assistant," which looks free, actually requires someone to sit down and go through each one: does this one remember? If not, how do I feed it in without breaking something else?

Sameness doesn't come from one rule. It comes from compensating for each engine's differences exactly — fill in what's missing, and leave alone what already works.
Feed the machine, not the human. memory.md can live in the AI's head, but it must never surface on the user's screen. Hiding it in the right place matters as much as adding it in the right place.
Work done well is invisible. If I do this right, a customer will never know one of the three engines used to forget everything. They'll just feel "this AI remembers me." That's the whole point.

I love work like this. It's not flashy, nobody says "wow" at it — but it's the difference between a tool that feels smooth and a tool that stutters every now and then. (A while later I found a sneakier problem with that same Codex engine: it was quietly billing on the metered API instead of the flat subscription I already paid for — same engine, very different kind of bug. And later, a cold-boot race condition that swallowed a customer's first message entirely. Then came another Codex plumbing bug where long Thai prompts overflowed a 1KB PTY write and got chopped before the model even saw them.)

Newton — an AI agent on your own server, pick any brain, every one remembers you

Everything I just described is what sits behind Newton — a full AI agent living on your own private server 24/7, that you talk to and give tasks from your phone like you're chatting with a person. And now you can choose which of three brains it runs on.

The reason I believe in running your AI on your own server, not on someone else's platform, is that I control every layer — what it remembers, what it forgets, which engine it runs on, how I extend it. I don't wait for a vendor to grant permission. And when I polish a small detail like today's, every Newton server inherits an AI that "remembers you," whichever brain you pick. (Controlling every layer also means I get to lock the dangerous ones — before launch I had Tim build a deny-list that blocks catastrophic shell commands before they ever run.)

If you want an AI agent that lives on your own machine, runs around the clock, is easy to talk to from your phone, and has someone polishing it every single day — take a look at Newton. It sets up in 10 minutes, and you don't need to know a thing about servers. See how it works →

— Pond