OmniVoice is surprisingly good
cloned my voice in ~30 seconds on an M4 Pro
I tried OmniVoice this afternoon because I saw a post on r/LocalLLaMA and got curious. It's a zero-shot voice cloning TTS model that claims 600+ languages. I recorded two short voice memos on my phone, one in Turkish and one in English, dropped them into the web UI, and 30 seconds later voila I had cloned versions of my voice saying things I never said. Each sample took one or two tries to get right.
Setup
Clone the repo and sync dependencies with uv:
git clone https://github.com/k2-fsa/OmniVoice.git
cd OmniVoice
uv sync
Then launch the demo:
omnivoice-demo --ip 0.0.0.0 --port 8001
A Gradio web UI opens up. You upload a reference audio, type its transcript, type whatever you want the clone to say, and hit generate. No training, no configuration, no API keys.
Voice cloning
I recorded two short samples of myself and used them as references.
Original voice
Cloned voice
The timbre is close. Not perfect, you can hear a synthetic texture on some consonants, but if I played this for someone who doesn't know me well, they might not notice. For zero training and ~30 seconds of inference, that's kind of wild.
Tongue twisters
I also wanted to see how it handles actually hard speech. Turkish tekerleme (tongue twisters) are genuinely difficult, the kind of thing that ties your mouth in knots. If the model can do these without producing intertwined sounds, it's doing something real.
It nails tongue twisters too. When I tried to record them myself I failed a couple of times. The model didn't.
What I actually care about
Voice cloning has been around. Usually you need minutes of clean audio, a training run, and patience. This needs 3–10 seconds of reference audio and about 30 seconds on an M4 Pro. No cloud API, no queue, no configuration. I ran uv sync and clicked a button.
The quality ceiling is probably higher with cleaner audio. These were casual phone recordings. But even with what I had, the results are good enough to be interesting.
Most voice cloning tools treat non-English as an afterthought. This doesn't. I tried it on Turkish mostly to clone my friends' voices and mess with them. They were all surprised.