AV1 Encoder I work on, on github @ https://github.com/gianni-rosato/svt-av1-psy
NOT A VALVE EMPLOYEE
my dog has more empathy and smarts about the world than these "people"
my dog has more empathy and smarts about the world than these "people"
but the feeling is 100% there :^)
but the feeling is 100% there :^)
Distilled Flux12B → Flux8.8B by replacing a modulation layer with a feedforward block.
Initialized with his PFP, and it persisted through training!
Here some comparison images
Distilled Flux12B → Flux8.8B by replacing a modulation layer with a feedforward block.
Initialized with his PFP, and it persisted through training!
Here some comparison images
During training optimization, the FF weights barely changed even when you train your model for a long time.
This highlights how initialization artifacts can sometimes remain visible in the final model.
left init, right trained
During training optimization, the FF weights barely changed even when you train your model for a long time.
This highlights how initialization artifacts can sometimes remain visible in the final model.
left init, right trained
And the kicker? He initialized the feedforward layer using his profile picture tensor.
And the kicker? He initialized the feedforward layer using his profile picture tensor.
Plus, a fascinating stuff during distillation: profile picture, used as initialization, persisted in the trained weights! Details below. 🧵
Plus, a fascinating stuff during distillation: profile picture, used as initialization, persisted in the trained weights! Details below. 🧵