David Alvarez-Melis
dmelis.bsky.social
David Alvarez-Melis
@dmelis.bsky.social
Professoring at Harvard || Researching at MSR || Previously: MIT CSAIL, NYU, IBM Research, ITAM
🚨 New preprint! TL;DR: Backtracking is not the "holy grail" for smarter LLMs.

It’s praised for helping models “fix mistakes” and improve reasoning—but is it really the best use of test-time compute? 🤔
April 11, 2025 at 4:29 PM
Reposted by David Alvarez-Melis
🎉Microsoft Research New England is hiring a predoctoral research assistant to work with @nancybaym.bsky.social, Tarleton Gillespie, and @marylgray.bsky.social on issues related to the dynamics of technology and society. 🎉

socialmediacollective.org/2025/01/22/s...
Seeking a Sociotechnical Systems Research Assistant (aka “Pre-Doc”)
Apply here: (NOTE: Application Portal opens February 3, 2025) Deadline: March 3, 2025. (Late or incomplete applications will not be considered.)  NOTE: Unfortunately, applicants must be eligib…
socialmediacollective.org
February 13, 2025 at 4:25 PM
Reposted by David Alvarez-Melis
Transformer LMs get pretty far by acting like ngram models, so why do they learn syntax? A new paper by sunnytqin.bsky.social, me, and @dmelis.bsky.social illuminates grammar learning in a whirlwind tour of generalization, grokking, training dynamics, memorization, and random variation. #mlsky #nlp
Sometimes I am a Tree: Data Drives Unstable Hierarchical Generalization
Language models (LMs), like other neural networks, often favor shortcut heuristics based on surface-level patterns. Although LMs behave like n-gram models early in training, they must eventually learn...
arxiv.org
December 20, 2024 at 5:56 PM