The voice layer for modern apps and agents. Real-time, scalable voice APIs: TTS, STT, turn-taking & voice cloning. Devs: build → https://t.co/r5CdNClhI5gradium.aiJoined September 2025
Today we launch stt-translate and s2s-translate: real-time speech-to-text and speech-to-speech translation. They compete with gemini-3.5-live-translate and gpt-realtime-translate on latency and quality, while allowing you to speak in any voice from our catalog or one you clone. Try them for free today on gradium.ai/translate
Long flights always give me more ideas to think about what's missing around us.
Few prompts later, here's Scribble Story.
On-device fully local pipeline to convert scribblings into a short story you can listen to.
Using @GradiumAI Phonon and @Alibaba_Qwen
We upgraded Gradium TTS for the cases voice agents can't get wrong: phone numbers, codes, email addresses read back right the first time. Couple of examples: English: 97% on emails, top of the field. French: leads every competitor we benchmarked. Samples + methodology → gradium.ai/blog/gradium-t…
In this joint work with @kyutai_labs, we design a reward model for conversational dynamics to teach full-duplex models how a human behaves in conversation, using cues to know when to interrupt, backchannel or stay silent.
New paper: Multi-Faceted Interactivity Alignment in Full-Duplex Speech Models
We use RL to post-train speech models (Moshi and PersonaPlex) to talk more like a human: to know when to respond, when to wait, and when to nod along with “yeah”s and “okay”s when listening.
We'll be at @VivaTech next week showcasing our models. Come find us at Booth 7.2 | 2F13 with @awscloud all week, and on the @LaFrenchTech booth on Wednesday.
@neilzegh is giving two talks: Wed 17th, 5:20pm, @nvidia Stage 1 and on Fri, 10am, Théâtre AWS
Learn how to build an audiobook voice agent using Gradium and @pipecat_ai
Gradium's TTS handles the narration and Pipecat's built-in WebRTC transport delivers the audio to the browser.
Reasoning LLMs typically take 2-3 seconds to start emitting tokens. In a voice agent, that's 2-3 seconds of silence after the user finishes speaking.
The @MiniMax_AI team just shipped a community contribution to Gradbot with two models running in parallel. MiniMax-M2-her produces a short acknowledgement that starts streaming to TTS immediately, while MiniMax-M2.7 runs in the background reasoning and tool calls.
Thanks to @davidtaoweiji for this contribution. Checkout our readme for more details.
github.com/gradium-ai/gra…
A full house at the @joinhexa office in Paris yesterday.
Our CTO @olivierteboul joined the discussion by sharing why low latency matters for voice agents and how Gradium models support enterprise use cases for voice AI.
"I'd like to cancel my flight from Boston to..." You pause to check a date. The agent cuts in: "Got it, where to?" Now you're talking over it to finish your own sentence.
That's acoustic turn detection. Semantic VAD waits because it knows you're not done: gradium.ai/blog/semantic-…
Berlin was geht ab, Tavily ist jetzt in town! We're here with @GradiumAI showing off our new voice integration and hosting a hackathon alongside @nebiusai and @cursor_ai. You won't want to miss this one.
luma.com/juded1wb
The 100-token input padding is gone.
Short replies like "yes, that works" used to need filler before generation.
Now they don't, so voice agents return first audio much faster on the short turns that fill real conversation.
Gradium TTS is particularly good at transferring all characteristics including reverb, bandwidth (e.g. phone speech), even typical podcast mic saturation on plosives ("p" sound). All samples in the video just generated from 10 samples without any processing.
I'm just as surprised nobody in AI voice tech has realized a voice needs background and environmental noise to sound realistic
Even @ElevenLabs the leader in voice AI can not produce voice with background noise, or environment reverb sound
AI voices are always going to sound
3K Followers 5K FollowingCEO of Fusen. Connecting students with mentors, investors, and funding opportunities through our Fusen accelerators. @cklaus.bsky.social on Bluesky.
3K Followers 4K FollowingAPAC Maxi, GTM & Customer Success Lead @Farotrading | Brainchild @tradewithfaro | Econ major | Ex Govt of India | Ex HPAIR Asia Committee @harvard
28 Followers 633 FollowingLead Data Engineer @FlutterEnt | Big data, distributed systems & real-time platforms | AI/ML enthusiast building infra one node at a time ⚡