Matteo Pagliardini
banner
matpagliardini.bsky.social
Matteo Pagliardini
@matpagliardini.bsky.social
PhD student in ML at EPFL 🇨🇭working with Martin Jaggi & François Fleuret. Previously Apple MLR (intern). https://mpagli.github.io/
Congrats! How important is scale for it to work? In your previous maze work it was clear a recurrent algo could solve the task. The recurrent state could be used as a scratchpad, each iteration decreasing the loss further. Language feels different, with many local minima along the recurrent path.
February 11, 2025 at 7:09 AM
Interesting loss curves. I’m not familiar enough with the task to know whether the spikes are expected, but would be curious to see the grad norm.
February 9, 2025 at 8:06 PM
Which task?
February 9, 2025 at 6:41 PM
Let’s also call on the silent crowd—me included—to start sharing more. Let’s be the change we want to see. You disagree with the political agenda of X? Protest by sharing your latest work/thoughts on Bsky.
February 8, 2025 at 10:53 AM
In my quick test on a small (120m) model trained on 14B tokens, the difference in the end is not so significant. Maybe the gap widens when training on less data, closer to chinchilla optimal, or for larger models… I’m team ReLU…
December 3, 2024 at 8:17 AM
Let o1 write a review and ask the non-expert human reviewer to verify its claims/refine the review.
November 26, 2024 at 6:07 PM
A wise man once told me a paper should not have more than one table. Of course there can be exceptions, but minimizing the number of tables is something I always have in mind when writing. Isolate one or two key messages from the table and convey them with graphs.
November 24, 2024 at 4:17 PM
👋
November 24, 2024 at 11:04 AM