davidgrangier.bsky.social
@davidgrangier.bsky.social
#ICLR #TrainBetterLM I am at ICLR, come to our posters for improved language model training!

Recycle gradients for faster neural net training with AdEMAmix iclr.cc/virtual/2025... (Fri Apr 25, 10 am).

1/3
April 21, 2025 at 11:55 PM