Yoav Artzi
banner
yoavartzi.com
Yoav Artzi
@yoavartzi.com
LM/NLP/ML researcher ¯\_(ツ)_/¯

yoavartzi.com / associate professor @ Cornell CS + Cornell Tech campus @ NYC / nlp.cornell.edu / associate faculty director @ arXiv.org / researcher @ ASAPP / starting @colmweb.org / building RecNet.io
This is maybe counterintuitive to the original intention of just index the chaos to make it accessible. I guess that ideal of search softened a long time ago
November 10, 2025 at 3:57 PM
That's definitely part of it, because this digestions has deeper history. Search engine indexing also seems just easier, so companies opt to it, even pre AI-overview-everything
November 10, 2025 at 3:57 PM
Re peer-rev --> pre-print servers: arXiv is a simple uniform place to store. Indexing engines love it, so if you want something to be searchable, nothing is better. To make things worse, at times it seems like journals/proceedings almost play a game of hide-and-seek with PDFs
November 10, 2025 at 3:50 PM
Re position papers: I don't think anyone can deny how effective some of these papers became for citations counts
November 10, 2025 at 3:50 PM
Is this all just a big practical joke for ChatGPT? I have been told god doesn't play dice with the world, but I guess AGI does :)
November 6, 2025 at 8:52 PM
It's a Thursday though ....
November 5, 2025 at 2:17 PM
All available here:
lm-class.org

ChangeLog here:
lm-class.org/CHANGELOG.md
LM-class
LM-class is an education resource for contemporary language modeling, broadly construed.
lm-class.org
November 3, 2025 at 3:54 PM
This kind of ad-hoc adaptation is hard in general of LLMs, but you can post-train to it for some degree
arxiv.org/abs/2508.06482

I suspect contemporary ASR models have the same backbone, so maybe applicable too

More broadly, there is a lot of interesting stuff to do in this space of adaptation
Post-training for Efficient Communication via Convention Formation
Humans communicate with increasing efficiency in multi-turn interactions, by adapting their language and forming ad-hoc conventions. In contrast, prior work shows that LLMs do not naturally show this ...
arxiv.org
November 3, 2025 at 3:50 PM
Wild
October 28, 2025 at 1:51 PM
There's the legit gaming, which is just optimizing for the metrics and breaking them. Then there's the really fake stuff, like citation rings. You would thing citation translate to bitcoins with the level of creativity and effort that people put into it
October 27, 2025 at 6:41 PM
The top citer has >1k papers, with a PhD from 2007. That's one hell of a steady rate ¯\_(ツ)_/¯
October 27, 2025 at 6:39 PM
It's pretty crazy how the entire citation game has been manipulated. It's enough to give a quick look at Semantic Scholar for Bengio, who GScholar just gave 1M citations. SScholar gave 0.5M, but it's not only the number, it's the top citers
October 27, 2025 at 6:36 PM
It definitely doesn't seem to hold in process, which lacks any similar regulation or structure. The (sci-fi-ish?) argument is that one cannot disentangle deployment/impact from development (i.e., one cannot shut it down).
October 23, 2025 at 2:53 PM
The analogy sounds great, but are you sure it really holds?

Public buy-in aside. Development vs. deployment is clearly distinguished in vaccines, both in being built into the process and in delaying impact to deployment. Does the same hold for so-pronounced ASI development?
October 23, 2025 at 2:53 PM
Indeed a bizarre mix, but say more about why the (very very short) letter is bonkers.... pretty please
October 23, 2025 at 2:13 PM
Can you really post fast enough? 🏓
October 19, 2025 at 11:31 PM