AI For Humans - The Newsletter
Posts
OpenAI FM Shows Off New Voice Toys from OpenAI 🗣️

OpenAI FM Shows Off New Voice Toys from OpenAI 🗣️

And why you may soon be talking with more of your apps

Today on AI For Humans The Newsletter!
OpenAI’s Voice API is very exciting & interesting
Zapier wants to connect all your apps to AI
And the top AI video of 2025?
Plus, our can’t-miss AI feature of the week!

Welcome back to the AI For Humans Newsletter!

Nearly a year ago, OpenAI’s Advanced Voice demo (remember this guy?) made me believe we'd soon be regularly talking to AIs.

But here we are, post Advanced Voice launch, and I’m not seeing the Cambrian explosion of use-cases I expected. I’ve recently started dictating to ChatGPT apps a lot more (which is faster than typing) but I'm still not regularly speaking with AI.

That’s about to change. Enter OpenAI’s new next-generation audio models.

Three new state-of-the-art audio models in the API:
🗣️ Two speech-to-text models—outperforming Whisper
💬 A new TTS model—you can instruct it *how* to speak
🤖 And the Agents SDK now supports audio, making it easy to build voice agents.
Try TTS now at .
— OpenAI Developers (@OpenAIDevs)
5:25 PM • Mar 20, 2025

OAI introduced two new speech-to-text models and one new text-to-speech model directly into their API and they are good. These models sound more life-like and even let you prompt emotions directly.

Most importantly, they’re available in the API now and are relatively cheap which will allow developers (the people who make apps & websites) to integrate them directly into the sorts of products you use every day, and potentially invent all sorts of new things.

Maybe more fun for the creatives out there, OpenAI has built a front-end for their new Text-To-Speech model called OpenAI FM. This sound board looking thing is actually a very powerful (and so far, free) audio prompting tool that allows for greater control over voice quality and performance. It will also allow you to download the outputs to drop directly into your AI video productions.

OpenAI FM Looks Like A Soundboard

If you've tried Eleven Labs, you know how challenging it can be to prompt natural-sounding emotions or emphasis. And, like AI image or video generators, it can often be a sort of slot machine where you keep trying until you get what you want.

With this new set of voice tools, OpenAI is directly allowing you to do that in a way we haven’t seen yet. And that, for those of us who are experimenting (and building) with new sorts of voice interactions, is very exciting.

just tried a few different prompts with @OpenAI new API via openai dot fm
super fun and I cannot wait to play more with this for the show and for <redacted>
— AI For Humans Show (@AIForHumansShow)
6:30 PM • Mar 20, 2025

It turns out they’ve actually updated the official Advanced Voice model as well (as of Tuesday 3/25/25) but, of course, Kevin has already found a way to break it.

Of course, OpenAI isn’t alone in pushing voice AI further forward. AI For Humans favorite Sesame AI has released an open-source version for devs, and we know Meta & Google are both very focused on voice interaction.

Go play around with OpenAI FM. It's free, surprisingly powerful, and honestly pretty fun. Give it a shot and let us know what weird (or useful) stuff you make.

3 Things To Know

Google Releases Gemini Co-Drawing
File this one under things I didn’t know I wanted: A novel drawing experience where you work in the same canvas as the AI. We’re rapidly approach AI generations that are real enough. The next breakthroughs may come from innovations in creative control, rather than pure believability.

Zapier Wants You To Create Your Own MCP
An interesting (albeit geeky) release from Zapier lets you build your own AI assistant that can connect to any of your Zapier-connected apps. The idea is to bridge the pain of every app having it’s own silo’d chat assistant. MCPs have been the talk of the AI town lately, here’s an explainer of what they do.

AI Video of the Year?
The team behind AI Or Die released a trailer for Karen, the story of a Karen finally reaching her breaking point. The team used Google’s Veo 2 for the video clips, Kling and Hedra for lip syncs, and Eleven Labs for speech generations.

It’s arguably the most entertaining AI video we’ve seen yet. Superb AI directing combined with a real knowledge of how to push today’s tools to their limit.

We 💛 This - Pikaframes

The team at Pika have been shipping all sorts of party tricks, but this one is our current favorite. We shared a bit on it a few weeks ago, but we keep seeing juicy new use-cases.

In-short, you provide it with a start + end frame, and it figures out what happens in-between with the aid of your text prompt. Runway has the same feature, but now only does very limited movements.

Two examples here: a pan-around to reveal a second character, and a second that racks focus by blurring the background of the first image!

Are you a creative or brand looking to go deeper with AI?
Join our community of collaborative creators on the AI4H Discord
Get exclusive access to all things AI4H on our Patreon
If you’re an org, consider booking Kevin & Gavin for your next event!