- AI For Humans - The Newsletter
- Posts
- Why AI Voice Interactions Are The Future... And The Messy Present
Why AI Voice Interactions Are The Future... And The Messy Present
OpenAI updated their Realtime (Advanced Voice) model as companies like Taco Bell struggle with deploying voice AI in the real world.

Today on AI For Humans: The Newsletter!
You’ll use your voice for everything… but when?
GPT models are rewriting biology
Plus, our favorite Nano Banana prompts!!
Welcome back to the AI For Humans newsletter!
Did you happen to hear about the viral video where a Taco Bell customer ‘ordered 18,000 cups of water’ at the AI-powered drive-thru? Or the one where a frustrated driver speeds off after being asked ‘what do you want to drink’ multiple times?
Both of these 10+ million view videos have resurfaced (btw, they’re nearly a year old & troll-ish) due to this story from the Wall Street Journal about Taco Bell’s difficulty with integrating AI into their drive-thru experience.

This quote from the article got me thinking, not only about the roll-out of new AI models but about what we’re working on at AndThen.
“We’re learning a lot, I’m going to be honest with you,” said Taco Bell Chief Digital and Technology Officer Dane Mathews.
Even Mathews said he has had mixed experiences with it. “I think like everybody, sometimes it lets me down, but sometimes it really surprises me,” he said.
Meanwhile, OpenAI introduced a update to their RealTime audio model which includes better function calling (when it knows to do something), more expressive voices and, ostensibly, better behavior for real world use cases.
The Realtime API is officially out of beta and ready for your production voice agents!
We’re also introducing gpt-realtime—our most advanced speech-to-speech model yet—plus new voices and API capabilities:
🔌 Remote MCPs
🖼️ Image input
📞 SIP phone calling
♻️ Reusable prompts— OpenAI Developers (@OpenAIDevs)
5:53 PM • Aug 28, 2025
In general, it’s a much improved system… more reliable, cheaper and clearly being pitched directly to businesses for the exact sort of thing Taco Bell is doing.
But… it’s not perfect and, as we talked about last week, for large scale businesses that can be a pretty big problem.
The Messy Present: Being Surprised Vs Being Let Down
AI is inherently (at least for now) a messy technology.
Unlike normal computers where you submit a query and, based on logical programming, it returns a reliable answer, LLMs interpret the request and use natural language to try to respond in the way it best knows how.
And, in general, it’s awesome! It has enabled entirely different ways of interacting with computers.
It’s more lifelike but also… more lifelike.
Do you know how to respond in any given situation? I certainly don’t. And I’m a real-life human being. LLMs are trying, in part, to understand how to answer a request and sometimes they fail.
This makes the edge cases MUCH more difficult to deal with and this edge is where large companies never want to be. Especially in the era of teenagers looking to find a way to break-through in social video.
HOWEVER… the ‘surpise’ part of the quote above should not be underrated.
Long time listeners/watchers of AI For Humans remember the genuine joy that Kevin & I got out of talking to our AI co-hosts. The best thing about this tech is that it brings conversation to life in a way that no other technology before it could.
The Future: Lifelike Voices That Understand Our Intent & Just Work
Building and deploying AI right now means you have to be at least ‘ok’ with the messy present but the great news is that by experimenting now, you’re not going to be stuck flat-footed when it gets better.
And AI Voice will get better. Way, WAY better.
Give Sesame’s six month old demo a try to see what I’m talking about.
Some of the largest companies in the world are betting on voice interaction as the potential future interface of not only AI but potentially as the main way we’ll interact with all computer & information systems. If you believe that AR glasses will be a thing, you believe in a world with a lot more AI voice input.
When will this all happen? It remains to be seen.
Text input is still really good but my theory is we’re going to see at least a 10x increase in voice interactions over the next five years. Voice feels natural and human (and we’re used to it) so as these AI systems get more human-like, it makes sense we’ll want to talk to them more.
One of the most fun things about building something creative in the voice AI space is that the present messiness can actually be a feature not a bug. If the character stutters or if the user gets it to do something we didn’t necessarily plan for, that’s a surprise discovery moment that can really delight.
More on AndThen soon & see you on Friday for the podcast!
- Gavin (and Kevin)
THIS WEEK: Nano Banana is all that. Tutorials & our first impressions. 👇
3 Things To Know About AI Today
A Specialized GPT Model Just Rebuilt Human Cells
In a weirdly underreported story, OpenAI and Retro Biosciences just used a slimmed-down GPT-4 model to re-engineer two key proteins involved in turning adult cells back into stem cells. 🤯
The AI-designed versions, called RetroSOX and RetroKLF, worked way better than the originals and are showing a 50x boost in stem-cell activity and even improving DNA repair. Importantly, the cells stayed stable and the results held up across different donors and cell types.
The bigger picture: this is consistent proof that AI can actually speed up breakthroughs in biology, not just write papers about it.
AIs Play The Social Reasoning Game ‘Werewolf’
AI Benchmarks can often be boring but this week a new one popped up that tested how well different models can play the game ‘Werewolf’.
🐺 Introducing the Werewolf Benchmark, an AI test for social reasoning under pressure.
Can models lead, bluff, and resist manipulation in live, adversarial play?
👉 We made 7 of the strongest LLMs, both open-source and closed-source, play 210 full games of Werewolf.
Below is
— Raphaël Dabadie🇫🇷 (@RaphaelDabadie)
5:00 PM • Aug 30, 2025
Games like this get to some of the messy interaction stuff I was mentioning above but also can give an early look at potential ways that super smart AI could manipulate us given enough brainpower.
A Slightly Technical 30 Minute Talk Everyone Should Watch
We’re big fans of AI educator and expert Bilawal Sadhu. You might have seen his work with Ted but he’s also very good at taking technical topics and making them accessible.
The talk below does a great job of walking through how new AI tools are enabling creators right now and a sense of where content as a whole is going in the future.
We 💛 This: The Best Nano Banana Prompts & Uses
Google Gemini’s Nano Banana AI image model has taken over the internet and the fact that it’s not only easy to use but FREE makes it even better.
One of the most fun & simple prompts is to use it as a style transfer (for clothing!) like I did for last week’s AI For Humans intro…

These were originally just screen grabs from our recording.
My prompt for the above was just:
use this photo and make the man into a <insert thing>, wearing a <insert clothing>, but keep the position of the man and the background the same
Some other fun things to try:
Kevin used it to see alternative looks at his favorite classic video games.
Our friends at GLIF have a video workflow to get Nano Banana to give yourself new hairstyles
Try on full outfits by collecting all clothing images & uploading a picture of yourself or flip that around and segment out all the outfits someone else is wearing
Play with ‘pose control’ by uploading an image plus a stick figure posing how you’d like the character to be posed
Point at things with an arrow on Google Maps and see that exact perspective
And SO much more. It’s a good time to be a creative. Share with us what you make!
Are you a creative or brand looking to go deeper with AI?
Join our community of collaborative creators on the AI4H Discord
Get exclusive access to all things AI4H on our Patreon
If you’re an org, consider booking Kevin & Gavin for your next event!