Deepak Ramachandran
thesilverbail.bsky.social
Deepak Ramachandran
@thesilverbail.bsky.social
Many more cool examples that people are finding in this reddit thread (including visual story generation, funky editing and style changes, recontexualization and more !):

www.reddit.com/r/singularit...
From the singularity community on Reddit: Now Gemini can create visual stories with native image generation
Explore this post and more from the singularity community
www.reddit.com
March 12, 2025 at 5:38 PM
An example of reasoning in 'pixel space':
March 12, 2025 at 5:26 PM
Notably we pass the 'room without an elephant' test (medium.com/@avanib28264...)
March 12, 2025 at 5:13 PM
Image quality is not quite as high as our SOTA Imagen 3 model (see previous post :) ) but the ability to do reasoning in a combination of text and pixel space unlocks some amazing new capabilities like interleaved generation of text and images and just jamming crazy creative ideas with Gemini.
March 12, 2025 at 5:13 PM
This is such amazing work by @abeirami.bsky.social and collaborators. A deep investigation of a simple and practically important idea. Highly relevant to our own work and anywhere else RL is used for Gen AI.
February 3, 2025 at 10:45 PM
Lol....u'll always be my #1 Nathan. But your post did remind me that some collaborators of mine at Google did some research on content ecosystems :https://arxiv.org/abs/2309.06375
January 25, 2025 at 8:53 PM
For example, I've been really enjoying following this guys math channel : youtube.com/@oceansofmat... not as slick as 3b1b but still quite informative.
OceansofMath
I am a senior undergraduate student in Mathematics and Physics at the University of Utah. I LOVE MATH. I have tutored Math for the past 4 years... and decided to extend my teaching platform to Youtube...
youtube.com
January 25, 2025 at 5:51 PM
We even show you can do this without a specialized heatmap model if you have a good classifier for the badness you want to eliminate by fine-tuning. Simply use a pixel attribution technique like GRADCAM to generate the heatmap !
January 19, 2025 at 3:48 PM
Surprisingly effective. The problematic parts are changed but everything else remains the same in the fine-tuned model. This is different from an editing model, where 2 rounds of inference are needed to fix the problematic parts.
January 19, 2025 at 3:48 PM
Then you fine-tune using a combination of DRAFT (arxiv.org/html/2309.17...) and our custom region-aware fine-tuning objective.
January 19, 2025 at 3:48 PM
you generate a heatmap highlighting the problematic region (e.g. using our previous work on Rich Human Feedback for T2I): arxiv.org/pdf/2312.10240
January 19, 2025 at 3:48 PM
The idea is simple. If the image from the base model has a region that's (say) NSFW:
January 19, 2025 at 3:48 PM