A Newton Customer Spoke Spanish, My Voice Typing Forced English — My AI Shipped a 24-Language Picker the Same Afternoon

A few weeks ago I asked my AI agent to add voice typing to Tim Chat because typing on my phone in the car was killing me. It shipped the same hour, I posted about it, customers loved it. End of story — I thought.

Then EN customers started actually using it, and one assumption I'd quietly baked into the very first commit blew up in my face.

Some of my customers don't actually want to speak English.

One customer — a Spanish-speaking indie hacker who runs his whole business through Tim Chat — opened the mic, started talking to his agent in Spanish, and the Web Speech API forced everything through en-US. Out came a transcript of garbled English that meant nothing. He pinged me in support. I pinged my AI in the morning. By that afternoon there was a searchable dropdown of 24 BCP-47 languages in the Newton customer dashboard, with the value auto-syncing into each customer's own VPS via SSH. Done.

Why Voice Typing Was Locked to One Language

Look at the original commit and the bug is obvious in five lines. When my AI first wired voice typing into Tim Chat, the language picker was this:

"If navigator.language starts with 'th' → use th-TH. Otherwise → en-US."

That made sense in my head at the time, because my mental model was: Newton TH customers speak Thai. Newton EN customers speak English. One assumption, never questioned, baked into the only line that mattered.

Reality is messier. The Newton EN side has European founders, LATAM builders, Asian solopreneurs — people whose English is fine for typing but who would rather speak in their first language and have the agent respond back in English. Meaning the right behavior is: let each customer choose.

Worse, the Web Speech API requires you to set lang before you call .start() on a webkitSpeechRecognition instance. The browser cannot auto-detect the spoken language at runtime. So a wrong default doesn't just produce a worse transcript — it silently produces a nonsense transcript, because the recognizer is forcing every phoneme through the wrong language model.

One Sentence To My AI

I opened Tim Chat and typed exactly this:

"An EN customer speaks Spanish and wants to pick his own voice typing language. How do we do this?"

Tim came back with two clarifying questions before writing a single line of code:

Existing TH customers all use th-TH today — should that behavior stay identical for them?
Where should the picker live — inside the chat window, or in the customer profile page?

I answered: TH stays exactly as it is, picker lives in the profile page. Don't litter the chat with a language selector that 99% of users will never touch.

That was the whole brief. The next 4 hours were Tim's.

What Tim Built in One Afternoon

He worked from the database up.

1. A new voice_lang column on the customers table. Stores a BCP-47 code like th-TH, es-ES, ja-JP, vi-VN. NULL means "voice typing off, fall back to browser default."

2. A backfill migration that set every existing TH-deployment customer to th-TH. This is the part I would have forgotten if I'd written it myself. Without that backfill, the moment the new column went live, every Thai customer's voice typing would have flipped to "off" because NULL is the default. By backfilling all current TH customers explicitly, their experience didn't change one pixel — they didn't even notice the feature shipped. Brand-new TH signups default to th-TH; brand-new EN signups default to NULL (forcing them to opt in once, by design, since EN is where the language ambiguity lives).

3. A searchable dropdown of 24 BCP-47 options in the customer dashboard. en-US, en-GB, es-ES, es-MX, pt-BR, fr-FR, de-DE, it-IT, ja-JP, ko-KR, zh-CN, zh-TW, vi-VN, id-ID, tl-PH, ar-SA, hi-IN, plus a long tail. Plus a "None" option at the top for customers who'd rather just turn voice typing off entirely.

4. Backend regex validation: [A-Za-z0-9-]{2,12}. This is the boring bit that's actually load-bearing — because the saved value gets injected into a shell command on a remote machine in step 5, and you do not want a customer to be able to POST en-US; rm -rf / through your API. The regex only allows letters, digits, and dashes, length 2–12. Every legal BCP-47 tag passes; every shell metacharacter dies at the door. Same instinct as the prompt-tightening I had to do when my AI was answering support tickets in dev jargon — bake the constraint in at the only place it actually matters, not by hoping every caller is well-behaved.

5. Auto-sync from the Newton control plane to the customer's own VPS over SSH on save. This is the part I love most. The Newton control plane and a customer's Tim Chat live on two completely different servers. So when a customer hits "Save" in the dashboard, the control plane:

Validates the new voice_lang against the regex.
Writes the value to its own database first.
SSHes into the customer's VPS.
seds (or appends, if missing) VOICE_LANG=es-ES into /opt/newton/.env.
Runs systemctl restart newton.

The customer never SSHes into anything. Never opens an .env. Never restarts a service. They pick from a dropdown, hit save, and within ten seconds their Tim Chat is recognizing the new language. The whole flow is invisible.

Live the Same Evening

I shipped it to the original Spanish customer first as a private beta. He replied the next morning: "Pond, this is unreal — I can finally just speak Spanish to my agent."

I rolled it out to the rest of the EN side that evening. Inside the first week, about 30% of active EN customers changed their voice_lang away from English. That number genuinely surprised me — I had thought maybe 5%, an edge-case feature for a handful of multilingual users. Instead, almost a third of my paying customer base had been quietly tolerating voice typing that worked in the wrong language for them.

The TH side? Untouched. No support tickets. No "what's this new dropdown?" questions. The backfill did its job — Thai customers literally do not know this feature exists, because for them nothing changed.

The Edge Case a Normal SaaS Would Never Ship

Here's the part I keep thinking about. If I were running on a generic AI chat SaaS and I said "your voice typing has to let me pick the language per user" — what's the realistic outcome?

Best case: "Thanks for the feedback, we've added it to our roadmap." Worst case: silence. Either way, six months go by, then twelve, then it never ships, because my one Spanish-speaking customer is an edge case to a SaaS serving 50,000 accounts. Their PM is rationally optimizing for the median user.

Newton doesn't have that problem, because Newton is single-tenant. Each customer's Tim Chat lives on their own VPS, and my AI agent has SSH plus database access to their server. So when a customer pings me with an oddly specific request, I don't have to weigh it against 50,000 other accounts. I just have to weigh it against my own time. And when my AI agent does the actual work, my own time is measured in one-line prompts.

This is the same pattern that's let me ship a bunch of things in an afternoon that a generic SaaS would have queued forever — Tim rewriting the FB scheduler from "auto-post on a timer" into "drop drafts I press the button on" after I changed my mind about how I wanted to publish, or tracking down a FB Marketing API error in 30 minutes by diffing two versions of the docs. Every one of those features came out of one user (me) hitting one specific pain. Not a roadmap. Not a vote.

What Newton Actually Is

Newton is not the biggest, smartest, or cheapest AI chat product on the market. What it is, instead, is an AI agent that's yours — running on a server you own, looking at the database that runs your business, and able to ship features you want without asking anyone's permission.

If you're a solopreneur who's tired of submitting feature requests into a black hole at some SaaS that will get back to you sometime next quarter — and you'd rather have an AI on your own server that can ship the feature this afternoon — that's exactly who Newton is for. The auto-provision system spins up your VPS and your AI agent in about ten minutes. After that, you just tell it what you want next, the same way I told mine "let customers pick their voice typing language" — and it builds.

— Pond