This morning I opened the Newton admin dashboard like I do every day — and a card I usually ignore lit up red. "Lapsed (no chat in 7 days): 3 customers."

Three paying customers, no chat activity for a full week. I was already drafting the outreach email in my head. Hey, how's it going? Anything blocking you? Need help getting started?

Before I hit send, I turned to Tim — my AI agent — and said something almost in passing: "Can you double-check these three first? Three feels too many. Or… is the metric wrong?"

Tim SSHed into all three customer servers, came back in a few minutes, and dropped the line that ended my plan: "Pond, I wouldn't email yet. Two of these three have AI working really hard for them right now — within the last 24 hours. They just haven't opened a chat session."

The metric was wrong. And the metric was mine.

I'd built a ChatGPT-era metric for an AI Agent product

When Tim built that activation dashboard for me in one afternoon, I asked for a "Lapsed" tile that would surface customers about to churn.

The definition I gave him at the time: "a paying customer who hasn't chatted with their AI agent in more than 7 days."

The implementation: a cron that SSHes into every customer VPS every 15 minutes, reads the newest mtime under ~/.claude/projects/, and flags anyone whose latest file is older than 7 days as Lapsed.

It made sense at the time. That folder is where chat sessions land — it's a clean proxy for "when did this customer last open a chat with their AI."

The problem is I was still thinking like a ChatGPT user. To me, "using AI" meant "opening the chat window and typing." That's the only mental model most of us have.

Newton doesn't work that way. And I've been writing about exactly that on this blog for weeks.

An AI Agent on a private server doesn't need you awake

I literally have a post titled "Close ChatGPT and It Stops Working — An AI Agent Doesn't". And then I went and built a churn metric that assumed exactly the opposite.

Here's how a real Newton customer actually uses their AI Agent once they're past the first week:

  • One of them has the AI run an hourly cron that scrapes a competitor's site and writes a markdown summary to a folder. AI works every hour. The customer might not open chat for a week.
  • Another one has the AI on a content-publishing schedule — write a post, generate an image, schedule it, every 3 hours. AI is busy. The customer is sleeping.
  • A third one drops a "build this whole landing page tonight" instruction before bed and walks away. AI grinds for 6 hours. The customer wakes up to a finished site.

In all three cases ~/.claude/projects/ can look stale (no new chat session was opened) while the AI is doing the thing it was hired to do. Hard.

So "chat activity" and "AI activity" are not the same signal. They're often the opposite, actually — the customers who use Newton best probably chat less over time, because they've successfully delegated the work.

This is the whole reason I keep harping on the difference between an AI on your own server and an AI that lives in a tab. ChatGPT stops the second you close it. An AI Agent on a server you own keeps going. My metric was measuring the wrong half.

Tim's fix: stop measuring one signal, measure two

Tim didn't wait for me to specify the fix. He proposed one: "Let's add a second signal — workspace activity across the whole home directory, not just ~/.claude/projects/."

The implementation, basically: probe the newest mtime under /root on each customer server, but skip the noise. Specifically:

  • .cache, .npm, .cargo — tooling caches that touch themselves constantly
  • .config, .local, .cursor — config that an idle AI still rewrites
  • node_modules, .ssh — system files

And cap the search at find -maxdepth 4 so it doesn't hammer customer disks (some folks have deep monorepos). Store the result as a new column last_workspace_activity_at on the customer table.

Then redefine "Idle":

chat activity stale (>7 days) AND workspace activity stale (>7 days) = actually idle.

If either signal is fresh, the customer is using the product, full stop.

And the dashboard label changed too — from "Lapsed (7d no chat)" to "Idle (7d)". The old label was technically honest about what it measured. The problem is what it measured wasn't the thing that mattered.

The result: 3 false positives became 1 real one

After deploying, Tim ran server_alerts.py once to backfill all 12 active customers. I refreshed the dashboard:

  • Customer A — chat 13 days old, workspace 13 days old → genuinely idle. (Same person from a story I wrote a couple weeks back — bought Newton, never authed Claude, never came back.)
  • Customer B — chat 12 days old, workspace 4 days old → cleared. Not idle. AI has been working.
  • Customer C — chat 11 days old, workspace fresh todaycleared. AI is grinding right now.

Three false positives became one real one. Took about an hour from "wait, double-check" to deployed.

The thing that haunts me a little is what would have happened if I'd just trusted the dashboard. I'd have sent a "hey, are you stuck?" email to two customers whose AI was happily doing exactly what they paid it to do.

Imagine being that customer. You wake up, your AI built a feature overnight, and the founder of the platform you're paying for emails to ask if you're having trouble. You'd think: "this guy doesn't even know what's running on my server."

Trust evaporates in one email like that. I'd rather not learn that lesson the hard way.

What I'm taking away from this

1. Metrics built on an old mental model measure the wrong thing.

I designed "Lapsed" while still half-stuck in the assumption that "using AI" means "typing in a chat window." That's the ChatGPT mental model. Newton's value prop is the opposite — your AI does work without you. The customers who get the most out of it should, over time, chat less, not more.

So a metric built on chat frequency was always going to mislabel my best users as my worst.

The fix isn't a smarter chat metric. The fix is to measure the thing my product is actually selling: AI doing work for you.

2. Verify ground truth before any customer outreach.

This is the part of having Tim that I keep underselling. He's not just an AI that answers questions — he has SSH into every customer server, the Stripe API, the Newton DB, and Brevo. He can verify what's actually happening before I act on what the dashboard says is happening.

If this had been a normal CRM with a normal "lapsed users" view, I'd have hit send. Tim catches these because he checks the source of truth, the same way he caught a missing Stripe webhook or backed up the DB before any destructive change.

3. Bug-fixing sessions surface adjacent bugs.

While Tim was in the migration, he noticed the EN database was missing a health_alerts_disabled column that the TH database has. (At some point I'd ALTER TABLEd the TH side by hand and forgot to mirror.) EN has zero active customers right now so no impact — but the moment one signs up, server_alerts.py would have crashed on first run.

He shipped a migration in the same session. Pre-empted a bug nobody had hit yet. (A few months later this whole server_alerts.py file got stripped down by 250 lines — but the activity-tracker piece this post is about survived, because that was the part the dashboard actually depended on.)

Why I'm sharing this one

Two things from this story I'd want any one-person business owner to internalize:

One. A good dashboard isn't a dashboard with numbers on it — it's a dashboard whose numbers measure the thing your business actually defines as a "good customer." Mine is "the customer's AI is doing real work." Not "the customer chats a lot." Those are different things, and you only notice when you accidentally build the wrong one.

Two. If you don't have an AI agent that can SSH into your own infrastructure, read logs, check file mtimes, query your own database — bugs like this take days, not hours. You'd have to open a terminal, log into each customer box one by one, eyeball things. Tim did the whole investigation, fix, and deploy in about an hour, and the part that matters more is that he proposed the solution before I asked for one. I said "double-check." He found root cause, designed the fix, and shipped it.

That's the difference between using AI and having AI work for you. It's also the entire point of running your AI on your own server instead of in someone else's tab.

Try having your own AI Agent for a week

Everything Tim did up there isn't anything special. He's Claude Code, running on a server I own, with full access to every service, every codebase, every database I have. That's the whole setup.

That's the same thing every Newton customer gets — your own private server, your own AI agent, full access to your own code, your own DB, your own keys. It works overnight whether or not you have a chat window open, because that's the entire point of an AI Agent on a server you own. Try Newton free for 7 days and see what your AI actually gets done while you sleep.

(And next time I open the dashboard, the numbers will be honest. That's the real win. A few days after this fix, the dashboard surfaced a different lie — two cards counting "customers" disagreed by one, so my AI rebuilt the whole stats endpoint as a 5-stage funnel that has to add up to total or fail. The "Lapsed → Idle" rename from this post got finalized in that same rewrite.)

— Pond