Ezi
eziozoani.bsky.social
Ezi
@eziozoani.bsky.social
Reposted by Ezi
Adding a nice way to visualize the PPO objective to the rlhf book.
Core for policy-gradient is L is proportional to R*A (R=policy ratio, A = advantage).
PPO makes good actions more likely, up to a point.
PPO makes bad actions less likely, up to a point.
July 19, 2025 at 11:10 PM
Reposted by Ezi
This isn't fake news. One of the craziest AI research papers I've been on in a while. Weird ablations on RLVR shows that the Qwen 2.5 models can learn with literally random rewards, likely due to some funkiness in mid-training and the GRPO setup.
May 27, 2025 at 4:49 PM
Reposted by Ezi
These are charmingly called Dutchman's Breeches because they look like little pants

#NativePlants 🌿 #BloomScrolling #SignsOfSpring
April 18, 2025 at 4:11 AM
Reposted by Ezi
You can watch it NOW. Me and my friends worked really hard on this and we think it's very funny and hope you will too.
March 30, 2025 at 8:47 PM
Reposted by Ezi
When they tried to use AI to replace fashion models, they started with Black models first.

When they tried to use AI to replace musicians, they started with rappers. Not digital Taylor Swift, or virtual Miley Cyrus.

Even the NFT clowns tried to make an NFT rapper, whatever that means.

Not slick.
Buzzfeed Tells Investors It Will Use AI To Develop Black, Asian, Latine Content In "Authentic Voice"
Buzzfeed has told investors it will use AI to develop Black, Asian, Latino identity-based content to help corporate brands tap an "authentic voice" to sell products.
peopleofcolorintech.com
May 18, 2023 at 7:21 PM