My Dashboard Said 11 Customers, the Other Card Said 12 — My AI Rebuilt the Whole Endpoint as a 5-Stage Funnel That Self-Validates

Last weekend I opened the Newton admin dashboard and noticed something funny — the "Customers" tile showed 11, and right next to it the "Active Servers" tile showed 12. Wait. How many customers do I actually have?

It was only off by one. But the moment your own dashboard starts contradicting itself, you can't trust any number on the page anymore. So I told Tim — my AI agent — to rip out the whole stats endpoint and rebuild it. About an hour later I had a 5-stage funnel where the numbers have to add up to the total, or the API itself fails. That self-test alone is worth the rebuild.

I Wasn't Even Trying to Touch the Dashboard

Honestly, I wasn't planning on refactoring anything that day. I'd asked Tim to finish chasing down a Stripe webhook gap that had been throwing trial-vs-paid status into a weird state.

After he shipped the fix, I refreshed the admin page out of habit. That's when the 11 vs 12 jumped out at me. Customers = 11. Active servers = 12. Do some customers have multiple servers? Did one customer slip out of the count?

So I asked Tim, plain English: "how can these two numbers disagree? I have fewer customers than servers?"

The Diagnosis: Two Queries, Two Different Definitions of "Customer"

Tim opened main.py, found the /api/stats endpoint, and came back with a clean answer:

The "customers" query filtered WHERE status IN ('active','cancelling') on the subscriptions table. That counts only customers who are currently paying. It does not count anyone in trial.

The "active_servers" query filtered WHERE status = 'active' on the servers table. That does include servers belonging to trial customers — because a trialing customer absolutely has a running server.

So the gap was a single trialing customer. They count as a server, but they don't count as a "customer." 11 vs 12. Mystery solved in one minute.

Then Tim flagged a second thing without me asking: "Pond, your customers table still has [email protected] in it from when I was testing the signup flow. Right now it's getting counted as a customer-without-a-subscription, which the dashboard labels 'abandoned.' But that's not a real customer."

The more I read his answer, the more I realized this wasn't a one-line fix. Every screen on my dashboard had its own quiet, slightly-different definition of customer. Sooner or later more of those definitions would drift apart, and I'd be making decisions on numbers that disagreed with each other.

The Two Iron Rules I Asked Tim to Enforce

So instead of the patch, I asked for a full rewrite of the endpoint, with two non-negotiable rules baked into the design:

Rule 1 — Every stage must be mutually exclusive. A given customer belongs to exactly one stage. No double-counting, no holes.

Rule 2 — The sum of every stage must equal total customers. If the sum doesn't reconcile, the API has a bug. Built-in self-test. No need to write a separate unit test for "did I split them correctly?" — the numbers tell on themselves.

Tim came back with a 5-stage funnel. Each customer is bucketed by their latest subscription status — a subquery with ORDER BY id DESC LIMIT 1 — so even people who have signed up, churned, and come back land in exactly one place:

Total — every email in the customers table, with demos filtered out via email NOT LIKE '%@jarvis-test.com'.
Abandon — signed up but never created a subscription. The "almost a customer" pile.
Trial — latest subscription status is trialing.
Paid — latest subscription status is active or cancelling (still paying, even if they've requested to cancel at end of period).
Churned — latest subscription is cancelled, past_due, or unpaid.

Tim also pulled the demo-email filter into a single DEMO_EMAIL_FILTER constant that every query reuses — so I can never again forget to apply it in one query and apply it in another.

The First Refresh After Deploy

I deployed, refreshed the admin page, and saw this:

Total: 15
Abandon: 2
Trial: 1
Paid: 11
Churned: 1

2 + 1 + 11 + 1 = 15. ✅ Sum reconciles. No one missing, no one double-counted.

The English Newton landing page (much smaller, just launched) ran the same query: 1/1/0/0/0 — sum = 1. ✅

What's funny is that I'd been staring at "Total = 11" for two months and never questioned it. Turns out I had collected 15 emails — 2 of them filled out the signup form and never even started a trial. I had no idea those 2 people existed. They had been quietly invisible to me the entire time. The funnel exposed them on the very first deploy.

Five Bonus Improvements I Didn't Ask For

I asked for a fix on one inconsistent number. Tim shipped five extra things in the same hour because the context was already loaded — refactoring a stats endpoint touches everything stats-related, so why not clean the whole shelf:

1. Trial conversion rate (30d) embedded inside the Trial card. Instead of being a separate tile across the page, it lives right under the trial count. Open the dashboard, look at "Trial: 1 · 75% convert" in one glance.

2. Cumulative + 30-day churn rate inside the Churned card. Same logic — the metric for a stage belongs in the card for that stage.

3. Activation denominator fixed. When Tim originally built the activation card, the denominator counted all servers — including ones in provisioning state (still being set up, can't chat yet). That's why my dashboard read "11/11 activated" while the Active Servers card read 12. Tim changed the denominator to status IN ('active','cancelling') only. Now activation reads 12/12, matching Active Servers exactly. No more contradiction.

4. Renamed "Lapsed (7d no chat)" → "Idle (7d)". This goes with the fix from a few days earlier where Tim discovered "lapsed" wasn't really measuring chat activity — customers were running long autonomous AI tasks and never typing. The renamed card matches the new measurement.

5. Auto-refresh on the admin page itself. Up to that point, /api/stats only ran once per page load. I'd leave the tab open for an hour and assume the numbers were live. Tim added a 30-second auto-refresh — but smart enough to skip when the tab is hidden (Page Visibility API) or when a modal is open (so it doesn't blow away a half-typed support reply). That last detail mattered. I would've been mad if it ate my drafts.

What I'm Taking Away From This

A dashboard whose numbers don't reconcile is a dashboard that's lying to you. Not because anyone is lazy — because each metric was written at a different time, by a different version of you (or your AI), with a slightly different definition of "customer." The drift is invisible until you happen to look at two cards side by side and notice they disagree. That's exactly how it surfaced for me.

"Mutually exclusive AND sum-to-total" is the cheapest self-test you can bake into any analytics endpoint. You don't need to write tests for it. The numbers test themselves on every page load. If they ever stop adding up, you have a bug — and you'll see it the moment you glance at the dashboard.

The best bonus features come from intentional refactors. If I'd patched the one query, I would've left four other quietly-broken things in place. Because I asked for a rebuild instead of a patch, Tim took the rest of the cleanup as part of the same loaded context. Five wins for the price of one.

Why a Generic SaaS Could Not Have Done This

If I tried to fix this with Mixpanel, ChartMogul, or any other analytics SaaS, I would have had to:

Instrument my app to fire events on every transition (signup, trial start, paid, cancel, etc).
Configure a custom funnel inside the SaaS UI.
Hope it lets me bucket customers by "their latest subscription status" — which is a Newton-specific business rule, not something a generic tool ships with.
Pay forever, monthly, for the privilege.

Tim just read my actual database schema directly. He knew the customers and subscriptions tables, knew which status values existed in production, and wrote five SQL queries inside one endpoint. No instrumentation. No third party. No subscription.

Most importantly — Tim recognized [email protected] as my own test account because he carries my full memory and Newton's full context across every session. No external SaaS would ever know which email is a real customer and which one is me poking the signup form during a deploy. That's the kind of judgment that only an AI sitting inside your business can make.

This is the same pattern I keep seeing — an AI agent with real access to your infrastructure outperforms any generic SaaS, not because the model is smarter, but because it can see things the SaaS literally cannot reach. (A more recent example of the same pattern: my AI built me a custom launch tracker in a day that pulls live MRR straight out of the SaaS database into every build-in-public tweet draft — something no Notion template or PM SaaS could ever do.)

If you run a small SaaS, an online store, or any online business — open your own dashboard right now and ask: do the numbers on this screen actually reconcile with each other? If two cards purporting to count the same thing disagree, you don't have a metrics problem, you have a definition problem. The answer isn't another paid analytics tool. The answer is a private AI agent with SSH and database access that can read your real schema, recognize your demo accounts, and rebuild the dashboard the way your business actually works. That's exactly why I built Newton — your own VPS with an AI agent already configured, ready to do this kind of work for you in minutes, not weeks.

— Pond