Why AI Voice Interactions Are The Future... And The Messy Present

OpenAI updated their Realtime (Advanced Voice) model as companies like Taco Bell struggle with deploying voice AI in the real world.

Today on AI For Humans: The Newsletter!
You’ll use your voice for everything… but when?
GPT models are rewriting biology
Plus, our favorite Nano Banana prompts!!

Welcome back to the AI For Humans newsletter!

Did you happen to hear about the viral video where a Taco Bell customer ‘ordered 18,000 cups of water’ at the AI-powered drive-thru? Or the one where a frustrated driver speeds off after being asked ‘what do you want to drink’ multiple times?

Both of these 10+ million view videos have resurfaced (btw, they’re nearly a year old & troll-ish) due to this story from the Wall Street Journal about Taco Bell’s difficulty with integrating AI into their drive-thru experience.

This quote from the article got me thinking, not only about the roll-out of new AI models but about what we’re working on at AndThen.

“We’re learning a lot, I’m going to be honest with you,” said Taco Bell Chief Digital and Technology Officer Dane Mathews.

Even Mathews said he has had mixed experiences with it. “I think like everybody, sometimes it lets me down, but sometimes it really surprises me,” he said.

- Wall Street Journal / Isabelle Bosquette

Meanwhile, OpenAI introduced a update to their RealTime audio model which includes better function calling (when it knows to do something), more expressive voices and, ostensibly, better behavior for real world use cases.

In general, it’s a much improved system… more reliable, cheaper and clearly being pitched directly to businesses for the exact sort of thing Taco Bell is doing.

But… it’s not perfect and, as we talked about last week, for large scale businesses that can be a pretty big problem.

The Messy Present: Being Surprised Vs Being Let Down

AI is inherently (at least for now) a messy technology.

Unlike normal computers where you submit a query and, based on logical programming, it returns a reliable answer, LLMs interpret the request and use natural language to try to respond in the way it best knows how.

And, in general, it’s awesome! It has enabled entirely different ways of interacting with computers.

It’s more lifelike but also… more lifelike.

Do you know how to respond in any given situation? I certainly don’t. And I’m a real-life human being. LLMs are trying, in part, to understand how to answer a request and sometimes they fail.

This makes the edge cases MUCH more difficult to deal with and this edge is where large companies never want to be. Especially in the era of teenagers looking to find a way to break-through in social video.

HOWEVER… the ‘surpise’ part of the quote above should not be underrated.

Long time listeners/watchers of AI For Humans remember the genuine joy that Kevin & I got out of talking to our AI co-hosts. The best thing about this tech is that it brings conversation to life in a way that no other technology before it could.

The Future: Lifelike Voices That Understand Our Intent & Just Work

Building and deploying AI right now means you have to be at least ‘ok’ with the messy present but the great news is that by experimenting now, you’re not going to be stuck flat-footed when it gets better.

And AI Voice will get better. Way, WAY better.

Give Sesame’s six month old demo a try to see what I’m talking about.

Some of the largest companies in the world are betting on voice interaction as the potential future interface of not only AI but potentially as the main way we’ll interact with all computer & information systems. If you believe that AR glasses will be a thing, you believe in a world with a lot more AI voice input.

When will this all happen? It remains to be seen.

Text input is still really good but my theory is we’re going to see at least a 10x increase in voice interactions over the next five years. Voice feels natural and human (and we’re used to it) so as these AI systems get more human-like, it makes sense we’ll want to talk to them more.

One of the most fun things about building something creative in the voice AI space is that the present messiness can actually be a feature not a bug. If the character stutters or if the user gets it to do something we didn’t necessarily plan for, that’s a surprise discovery moment that can really delight.

More on AndThen soon & see you on Friday for the podcast!

- Gavin (and Kevin)

THIS WEEK: Nano Banana is all that. Tutorials & our first impressions. 👇

3 Things To Know About AI Today

A Specialized GPT Model Just Rebuilt Human Cells

The AI-designed versions, called RetroSOX and RetroKLF, worked way better than the originals and are showing a 50x boost in stem-cell activity and even improving DNA repair. Importantly, the cells stayed stable and the results held up across different donors and cell types.

The bigger picture: this is consistent proof that AI can actually speed up breakthroughs in biology, not just write papers about it.

AIs Play The Social Reasoning Game ‘Werewolf’

AI Benchmarks can often be boring but this week a new one popped up that tested how well different models can play the game ‘Werewolf’.

Games like this get to some of the messy interaction stuff I was mentioning above but also can give an early look at potential ways that super smart AI could manipulate us given enough brainpower.

A Slightly Technical 30 Minute Talk Everyone Should Watch

We’re big fans of AI educator and expert Bilawal Sadhu. You might have seen his work with Ted but he’s also very good at taking technical topics and making them accessible.

The talk below does a great job of walking through how new AI tools are enabling creators right now and a sense of where content as a whole is going in the future.

We 💛 This: The Best Nano Banana Prompts & Uses

Google Gemini’s Nano Banana AI image model has taken over the internet and the fact that it’s not only easy to use but FREE makes it even better.

One of the most fun & simple prompts is to use it as a style transfer (for clothing!) like I did for last week’s AI For Humans intro…

These were originally just screen grabs from our recording.

My prompt for the above was just:

use this photo and make the man into a <insert thing>, wearing a <insert clothing>, but keep the position of the man and the background the same

Some other fun things to try:

And SO much more. It’s a good time to be a creative. Share with us what you make!

Are you a creative or brand looking to go deeper with AI?
Join our community of collaborative creators on the AI4H Discord
Get exclusive access to all things AI4H on our Patreon
If you’re an org, consider booking Kevin & Gavin for your next event!