This happened last week. I muttered one sentence to my AI agent — "transcribe is slow, there has to be a faster way" — and went to do something else. Next day, a 79-minute Thai-language live that used to take ~5 hours to transcribe finished in about 90 minutes. Same server. Same CPU. No GPU bought. I didn't write a single line.
The Bottleneck Nobody Talks About
I cut short-form clips from my own livestreams almost every week. The lives are 60-90 minutes. The clips that actually go on TikTok, Instagram Reels, YouTube Shorts, and Facebook Reels are 30-60 seconds. To find which 30 seconds out of 90 minutes are worth posting, the audio has to be transcribed first. The AI then reads the transcript and picks the moments most likely to stop the scroll.
Transcription is the boring middle step that makes the whole pipeline work. The full one-button clip system falls over if this step is slow.
The setup I started with was the obvious one: openai-whisper running the small model on a plain 8-core CPU. No GPU. The official OpenAI library, the model everyone benchmarks against, on commodity hardware.
The problem: a 79-minute Thai live took roughly 5 hours to transcribe. By the time the clips were ready, the conversation had moved on — the audience was a week ahead.
I'd already tried the obvious workarounds. Smaller model (tiny) — barely faster, accuracy collapsed on Thai. Chunking the audio — too much overhead, transcripts didn't stitch well across boundaries. Renting a GPU — overkill for a job that runs once a week, and a recurring cost I didn't want.
One Sentence In, a Refactor Out
I didn't sit down to research alternatives. I didn't open Stack Overflow. I didn't ask the AI to "find a faster Whisper library and swap it in." I typed exactly this:
"transcribe is slow, there has to be a faster way."
That's it. That was the whole prompt. What happened next is the part that's worth describing carefully, because it's the difference between an "AI" and an AI agent:
- It went and looked. Searched for CPU-friendly alternatives to openai-whisper. Surfaced
faster-whisper, an implementation built on the CTranslate2 inference engine that supports quantization on CPU. - It benchmarked on my actual file. Not a synthetic test. It took the same 79-minute live I'd transcribed before and ran both libraries head-to-head, recording wall time and accuracy diffs.
- It tuned parameters. Tried multiple combinations and landed on
compute_type="int8"+beam_size=1+vad_filter=Trueas the configuration that hit the speed target without losing Thai accuracy. VAD (voice activity detection) skipped silent regions entirely instead of running the model on dead air. (Later on, that same VAD setting broke demo clips by deleting the silent wait between command and result — my AI patched that too.) - It rewrote the production code. Swapped the import, adjusted the API surface (the two libraries are close but not identical), and updated the rest of the clip-cutting pipeline to use the new interface.
- It committed and pushed. Clean diff, descriptive message, into the same Git repo my whole content stack lives in. Because everything I run is on my own server, the agent had write access to do this without me opening anything.
I sat on the terminal and watched it work. I never touched the keyboard.
The Numbers That Came Out the Other Side
Same 79-minute audio file. Same 8-core CPU. The new config:
- Wall time: ~5 hours → ~90 minutes (3-5× faster across multiple runs)
- Accuracy on Thai: essentially unchanged — diffing the two transcripts showed only minor variations, the kind you'd expect between any two runs
- RAM: nearly halved, because int8 quantization shrinks the model in memory
- Hardware cost: $0. No GPU, no cloud upgrade, no new server
Translated into the workflow that actually matters: I now kick off transcription before I sleep, and the pipeline that produces every short clip — selection, cutting, captioning — is done by morning. The week-old clip problem is just gone.
Why ChatGPT Couldn't Have Done This
If you typed "openai-whisper is slow, what should I do?" into ChatGPT, you'd get a perfectly reasonable answer: "Try faster-whisper. pip install faster-whisper, then change your code like this..."
And then you'd have to do every actual step yourself.
You'd SSH into your own server. You'd pip install. You'd open the file, find the import, swap it. You'd notice the API differs slightly, adjust calls. You'd run a benchmark, hopefully on a real file. You'd tune compute_type and beam_size and decide whether VAD is worth it. You'd commit. You'd push. And if any step broke, you'd debug it.
An AI agent doesn't tell you the answer — it does the work. It already has SSH into the server. It can install packages, edit code, run benchmarks, and ship commits without me being the bridge between "good idea" and "deployed code."
This is the cleanest example of the gap I have. A chatbot is advice. An agent is a teammate.
The Right Place to Optimize
Here's the part that quietly impressed me. I didn't tell the AI where to optimize. I just said it was slow.
It could have suggested any of these:
- Throw more hardware at it (rent a GPU)
- Reduce quality (use the
tinymodel) - Parallelize (split audio into chunks, run them concurrently)
- Replace the algorithm (swap libraries)
It picked the right one on the first try. Why? Because it knew context I never re-typed: I run my own server, I prefer one-time fixes over recurring costs, my budget is small, and Thai accuracy is non-negotiable. It already had the memory of how my business runs, so the choice wasn't generic — it was tuned to my constraints.
That's the second thing a chatbot can't do. Even if you described all of that in a single prompt, you'd have to describe it again next time. A persistent agent on your own server keeps the context across sessions, across months, across every "fix this" that comes up.
Small Wins Compound
This wasn't a flagship feature. It was a library swap. But the second-order effect is bigger than the change itself: every step of my content pipeline downstream of transcription is now 3-5× faster, which means I can ship more clips, ship them sooner, and not flinch when I think about going live again.
And it's not a one-off. Some weeks the AI finds a bug that's been hiding for a month. Some weeks it ships a hotfix to six customer servers. Some weeks it answers a support ticket end-to-end. Some weeks I just complain in the car and a new feature shows up in my chat app the same hour. Each individual job is small. By month-end, that's 60-80 hours back.
The pattern: anything repetitive, mechanical, and rule-bound — hand it to an agent. Keep your time for the parts that need taste.
Want an Agent That Quietly Upgrades Your System When You Complain?
If you run a business with your own server, repeat-weekly workflows, and you'd love an AI agent that watches the system and looks for places to optimize the moment you say something is slow — without you having to be a developer — that's exactly what I built Newton for. Your own private VPS, your own AI agent, set up in about 10 minutes. Over time it learns your code, your budget, and your priorities, then quietly improves things in the background. The next "this is slow" you mutter is the next thing it goes and fixes.
— Pond
