Today on AI For Humans The Newsletter!
How does Grok 3 stack up?
ChatGPT just got a lot more spicy 🌶️
And Perplexity’s giving away Deep Research for free
Plus, our can’t-miss AI feature of the week!

Welcome back to the AI For Humans Newsletter!

Last night, Elon Musk (along with three other XAI engineers) demoed XAI’s newest model Grok 3 and the biggest takeaway? It seems like Elon’s giant cluster of NVIDIA chips has helped them closed the gap very quickly.

— # (#)

If you don’t have the time to watch the full demo above, the basics are:

Grok 3 benchmarks (according to XAI) well compared to current non-reasoning models, with significant gains over both GPT-4o & Claude Sonnet
Reasoning (like OpenAI’s o1 and o3 series) will be baked into the public release and those benchmarks are even better (but not necessarily against OpenAI’s full o3 model)
It will cost you extra. They mention SUPER GROK (which includes deep thinking and ‘big brain’ mode) in the live stream which sounds like it’s $40/month or $300+/year.

Former OpenAI & Tesla engineer Andrej Karpathy got early access and wrote up a VERY good post on the pros and cons of this model including some tests that it aced and some it flopped pretty hard on.

Karpathy’s Grok3 Pelican Tests Vs Other Models

Even more impressive and if you really want to nerd out, check out how an early version of Grok 3 (codenamed ‘chocolate’) surpassed all other AI models in ChatBot Arena, a website designed to test models against one another.

It’s not perfect as this early version does seem to have some issues with coding but if you have a paid X account the (non-reasoning) beta is there for you to try right now.

So…what does all this mean?

First and foremost, it means that XAI is playing on the same level as the other large models. OpenAI may still have a lead but now they have another significant competitor in the space (and specifically one who loves spending money to win)

The fact that Elon started so far behind (XAI hasn’t been doing this for nearly as long as OAI or Google or Anthropic) and they were able to get something this powerful this fast is a big deal. If Deepseek proved that small teams working within constraints can do something impressive, this proves that throwing a ton of money and compute can work too.

And, maybe most importantly, it means there is no slowing down when it comes to these frontier models. We’re not entirely convinced that there are enough customers to pay for all of these but let’s be honest… not our problem.

We’ll have a lot more on Grok 3 in this week’s show including some hands-on demos to see how it does in our AI For Humans benchmarks (if you know, you know).

See you Thursday!

-Gavin (and Kevin)

3 Things To Know

A Big Vibe Change for ChatGPT
A new update to the 4o model makes ChatGPT interactions more human and uninhibited than ever before. Not only does it unlock a new category of Hot Dog City (HDC) fan fiction, it does a much better job of mirroring the user’s tone and general vibe. It feels like the most human model we’ve seen yet, but we went from zero-to-witch-stuff as quickly as you’d imagine. Altman recently shared GPT-4.5 and 5 are on the way (see our bonus video on the subject), so things are just beginning to heat up.

The rocks can talk! … ok that’ll be quite enough rock

Perplexity Launches Their Own Deep Research
Hot on the heels of OpenAI’s Deep Research, Perplexity launched their own. It searches for 3ish minutes instead of OpenAI’s 10ish, and comes with 5 free queries per day, an absolute bargain compared to OpenAI’s $200/mo alternative. It scores an impressive 21.1% on the Humanity’s Last Exam benchmark, just few percentage points behind OpenAI.

Google’s Excellent Veo 2 Video Model is Now Available to Shorts Creators
Here’s the rundown from Google Deepmind. Via the in-app editor you can generate clips, and use them on their own or as a green screen background. While the model’s impressive, probably the most realistic video model available today, the in-app editing experience is not. At the moment you don’t have much control over motion in your generations, and fumbling through an edit just reminds you how great TikTok’s editor is.

We 💛 This - Topaz Project Starlight Beta

A big release from AI video upscaling tool Topaz promises to upscale bad footage to a level never before possible. The tests we’re seeing online are really impressive, and could aid in everything from juicing old footage, correcting issues that come up in a shoot, and remastering low res meme gifs.

Right now it’s slow (20 mins to process 10 seconds), it’s expensive (3 free videos a week), and it’s absolute magic. The sort of thing that ought to immediately earn a spot in the Dock of folks who regularly work with dodgy footage.

Are you a creative or brand looking to go deeper with AI?
Join our community of collaborative creators on the AI4H Discord
Get exclusive access to all things AI4H on our Patreon
If you’re an org, consider booking Kevin & Gavin for your next event!

The Latest Model Arrives from Musk's xAI

3 Things To Know

We 💛 This - Topaz Project Starlight Beta

Keep Reading

AI For Humans - The Newsletter