I Complained Typing on My Phone Was Slow — My AI Added Voice Typing to My Chat App in Under an Hour

I was sitting in my car the other day. One hand on the wheel, one hand on my phone, trying to type a message into Tim Chat for the fifth time after autocorrect butchered it again. So I just complained out loud to Tim — my AI agent — "typing on mobile is too slow, what do we do about this?"

Tim didn't hesitate: "I'll add voice typing for you."

Less than an hour later, there was a mic button sitting next to the input box in Tim Chat. One tap, speak in English, speak in Thai — it just typed it for me. This is the story of how that hour went, including a funny bug Tim found and fixed himself.

What Tim Chat Is, in One Paragraph

For anyone new — Tim Chat is my personal chat interface for talking to Tim, the AI agent that lives on my own server. I can open it on my phone or my laptop. When I type something, Tim actually does the work: writes code, deploys, replies to support tickets, runs ad campaigns, all the boring stuff. It's like ChatGPT, except it lives on my private server and has real access to my business — not stuck inside someone else's sandbox.

The one weak spot was always mobile. When I'm not at my laptop, I have to peck out instructions on a phone keyboard. Half the time I'd lose the thought before I finished typing it.

I Complained — Tim Shipped

I never told Tim what feature to build. I never said "use this library" or "make it look like this." I just complained about typing being slow. That was the entire spec.

Tim went off and made the calls itself:

Use the browser's built-in Web Speech API. It's free, it works in Chrome, no extra library to install, no API bill to pay.
Both languages out of the box. Pass lang: 'th-TH' for Thai or lang: 'en-US' for English. Same API, different parameter.
Mic button + a language toggle pill. Tap the mic, talk, the text shows up in the input box. Tap the pill to flip between EN and TH.

Inside one session, Tim wrote the code, committed it, pushed it, and deployed. Then sent me a message: "Refresh the page and tap the mic."

I refreshed. The mic button was there next to the send button. A small pill above it said "EN." I tapped, said something in English, watched the words appear. Tapped the pill, switched to Thai, spoke Thai, watched Thai words appear. Done.

This is exactly the line between a chatbot and a real agent. ChatGPT could happily explain the Web Speech API in five paragraphs. Tim just opened the file and built the thing.

The Funny Bug Tim Debugged Himself

The first version had a bug that made me laugh out loud. I'd say one short sentence and the input box would fill with three or four copies of it stacked back-to-back. "Check the server status" would come out as "Check the server status check the server status check the server status."

Tim went digging and figured out what was happening. Chrome's webkitSpeechRecognition has a weird design quirk. When you set continuous: true — which you need for long-form dictation — every time it fires an onresult event, it sends back the entire array of results since the session started, not just the new one.

So if I said sentence A, then sentence B, the second event arrived with [A, B] in it, not just B.

Tim's first version naively iterated from index 0 every time and appended each result to the textarea. So A got re-appended on every tick of the engine, turning a clean sentence into a stutter loop.

The fix was two lines:

Loop from event.resultIndex instead of 0, so only fresh results get read.
Track finalTranscript separately from interimTranscript, then on every event recompute the textarea as base + final + interim instead of appending.

One commit, redeploy, refresh — no more stutter. I didn't write a line of that. I just got a Telegram notification telling me what was found and what was changed.

The Small Details That Made It Actually Usable

This is the part that always separates a feature that gets used from a feature that gets ignored. Tim didn't stop at "mic button works." It quietly added the polish:

Default language picks itself. Reads navigator.language on first load — Thai phone defaults to th-TH, English laptop defaults to en-US — and persists the choice in localStorage so it remembers next time.
Red pulsing animation while recording. The mic button turns red and pulses so I know at a glance that it's actually listening, not silently dead.
Silent auto-restart on Chrome's ~30-second idle cutoff. Chrome will quietly kill a long speech session every 30 seconds or so. Tim wired up an automatic restart in the background. I can talk for five minutes straight and never notice the seam.
Clean stop on send. The moment I tap send, the mic shuts off. No leftover listening, no half-captured next sentence bleeding into nothing.

Drop any one of these and the feature becomes annoying. Skip the auto-restart and I'd be re-tapping the mic every 30 seconds — I'd give up on the third try. Skip the visual feedback and I'd send half-finished messages thinking it was still recording.

Why This Tiny Feature Is Actually a Big Deal

Someone might read this and shrug — "it's just voice typing, ChatGPT's app already has it." Sure. But that's not the point.

The point is: I complained one sentence in the car. Within an hour, a new feature was live in my product. A weird API quirk got found and fixed without me ever opening a code editor. Production was updated. I went from problem to solved with less effort than it would take me to file a Jira ticket.

That feedback loop is what changes everything. There's no spec doc, no sprint, no PM, no waiting for the next standup. The complaint is the spec. The agent goes off and ships.

And this isn't the first feature that was born from a complaint. Tim once swapped my entire transcription library to make it 5× faster after I muttered "this is slow." It once pushed a fix to six customer servers in under an hour because one customer's chat replies were getting cut off. It once argued for deleting a broken password field instead of fixing it, because deletion was the right answer.

(Postscript: a few weeks after this shipped, an EN customer who speaks Spanish complained that voice typing kept forcing him into English. Same loop — one sentence to Tim, and that afternoon there was a per-customer 24-language picker in the Newton dashboard that auto-syncs into each customer's own VPS over SSH.)

This is what I mean when I say "having AI work for you" instead of "using AI." Opening ChatGPT to ask a question is not the same thing as having an agent that can read your codebase, write the code, deploy it, and tell you when it's done.

Now I Talk to My AI Everywhere

I use Tim Chat on my phone way more often now. Driving — "check today's revenue." Walking — "draft a blog post about the Whisper swap." First thing in the morning, half-awake — "summarize last night's support tickets." Tim talks back. I never touch the keyboard.

It really does feel like the Iron Man moment people picture when they think about AI. A voice you can talk to anywhere, that knows your business, that can actually do things — not a chatbot that lists ten suggestions and bows out.

If you want your own AI agent that lives on your own server and ships features when you complain at it — that's exactly what Newton is. You don't set up a server, you don't install an AI, you don't wire any of this up. You sign up, the auto-provision system spins up your private VPS in about two minutes, and you log in to your own Tim Chat — voice typing already included. Then you can start complaining at your AI just like I do. See how it works →

— Pond