Quanquan Gu
banner
quanquangu.bsky.social
Quanquan Gu
@quanquangu.bsky.social
Professor @UCLA, Research Scientist @ByteDance | Recent work: SPIN, SPPO, DPLM 1/2, GPM, MARS | Opinions are my own
Reposted by Quanquan Gu
To better interpret the plot, draw a horizontal line representing a specific target validation loss. Find the points where this line intersects the curves for AdamW and MARS, which will allow you to determine how much speedup, in terms of training tokens, MARS achieves compared to AdamW.
December 5, 2024 at 2:54 AM
Just added you.
December 3, 2024 at 11:49 PM
Just added you! Welcome!
December 3, 2024 at 1:17 AM
Just added you.
December 2, 2024 at 9:53 PM
Just added you.
December 1, 2024 at 12:35 AM
Just added you!
November 30, 2024 at 4:38 AM
Just added you!
November 29, 2024 at 10:47 PM
Just added you.
November 29, 2024 at 10:32 PM
Just added you!
November 28, 2024 at 7:29 PM
Just added you!
November 28, 2024 at 7:16 PM
Just added you.
November 28, 2024 at 7:14 PM
Anyone using their real name and interested is welcome!
November 28, 2024 at 2:44 AM
Just added you. Welcome!
November 28, 2024 at 1:48 AM
MARS is a unified framework that can be integrated with various precondition techniques. So it can be applied to PSGD. I believe @hessianfree.bsky.social has implemented MARS-PSGD.
November 28, 2024 at 1:48 AM
Just added you!
November 28, 2024 at 1:44 AM
Just added you.
November 28, 2024 at 1:43 AM
Done!
November 28, 2024 at 1:42 AM
Just added you.
November 28, 2024 at 1:42 AM
Just added you!
November 28, 2024 at 1:42 AM
Just added you!
November 28, 2024 at 1:42 AM