đź’Ş The Hits Keep Coming! | Daily Drip

🖼️ A POWERFUL New AI-Art Tech got revealed...

Happy Wednesday My-Humans.

Going to keep this one quick because we’re navigating some very fun podcast-recording hurdles. Case in point:

This is a massive video-file error, and not how we look before post-processing. FYI.

Today’s Daily Drip is STABLE DIFFUSION 3.

Stable Diffusion is a state-of-the-art text-to-image generation model developed by Stability AI. Basically, it creates highly realistic and detailed images from textual descriptions.

It iteratively refines an image of pure noise (going “step by step”), to gradually shape an image to match the given prompt, allowing users to generate a wide variety of images across different styles, genres, and subjects with unprecedented flexibility and quality.

Stability AI just released a white-paper (super technical doc) discussing what SD3 will offer and how it all works. We’ll get to the techy stuff, bit before that, BEHOLD THE WAFFLE-HIPPO!

And here are some other stunning samples from the doc:

Okay, so let’s quickly dive in. Stable Diffusion 3:

🌀 Uses Rectified Flows, which learn straight paths between data and noise distributions, allowing for faster sampling with fewer steps.

🕰️ Introduces new timestep samplers for training Rectified Flow models that focus more on perceptually relevant intermediate timesteps.

  • SD/SD2 would treat every step in the rendering process equally, this approach means the model devotes more time to the aspects of the image which matter most.

🎭 Employs a novel transformer architecture with separate weights for text and image tokens. Basically, a dedicated expert for rendering text and another for the “imagery” itself, instead of a messy “one size fits all” approach.

💪 SD3's largest model (8B params) outperforms current state-of-the-art models like DALL-E 3, SD-XL, and Pixart-α in human evaluations of visual quality, prompt adherence, and text rendering.

  • In the AI world's Olympic Games, SD3 takes home the gold medal in every category!

High-resolution samples from our 8B rectified flow model, showcasing its capabilities in typography, precise prompt following and spatial reasoning, attention to fine details, and high image quality across a wide variety of styles.

If you dig these daily drips, or there’s something you think we should feature/discss, let us know, and please consider sharing this with your friends and frienemies:

That’s all for now. See you lovelies tomorrow, unless you unsubscribe. Which, we totally get. But like, don’t, perhaps?

.k+g