Lightnews — Scholar-powered news

Yoav Artzi

@yoavartzi.com

5.6K followers 310 following 540 posts

LM/NLP/ML researcher ¯\_(ツ)_/¯

yoavartzi.com / associate professor @ Cornell CS + Cornell Tech campus @ NYC / nlp.cornell.edu / associate faculty director @ arXiv.org / researcher @ ASAPP / starting @colmweb.org / building RecNet.io

Posts Replies Media Videos

Yoav Artzi

@yoavartzi.com

This is maybe counterintuitive to the original intention of just index the chaos to make it accessible. I guess that ideal of search softened a long time ago

November 10, 2025 at 3:57 PM

Yoav Artzi

@yoavartzi.com

That's definitely part of it, because this digestions has deeper history. Search engine indexing also seems just easier, so companies opt to it, even pre AI-overview-everything

November 10, 2025 at 3:57 PM

Yoav Artzi

@yoavartzi.com

Re peer-rev --> pre-print servers: arXiv is a simple uniform place to store. Indexing engines love it, so if you want something to be searchable, nothing is better. To make things worse, at times it seems like journals/proceedings almost play a game of hide-and-seek with PDFs

November 10, 2025 at 3:50 PM

Yoav Artzi

@yoavartzi.com

Re position papers: I don't think anyone can deny how effective some of these papers became for citations counts

November 10, 2025 at 3:50 PM

Yoav Artzi

@yoavartzi.com

Is this all just a big practical joke for ChatGPT? I have been told god doesn't play dice with the world, but I guess AGI does :)

November 6, 2025 at 8:52 PM

Yoav Artzi

@yoavartzi.com

It's a Thursday though ....

November 5, 2025 at 2:17 PM

Yoav Artzi

@yoavartzi.com

All available here:
lm-class.org

ChangeLog here:
lm-class.org/CHANGELOG.md

LM-class

LM-class is an education resource for contemporary language modeling, broadly construed.

lm-class.org

November 3, 2025 at 3:54 PM

Yoav Artzi

@yoavartzi.com

This kind of ad-hoc adaptation is hard in general of LLMs, but you can post-train to it for some degree
arxiv.org/abs/2508.06482

I suspect contemporary ASR models have the same backbone, so maybe applicable too

More broadly, there is a lot of interesting stuff to do in this space of adaptation

Post-training for Efficient Communication via Convention Formation

Humans communicate with increasing efficiency in multi-turn interactions, by adapting their language and forming ad-hoc conventions. In contrast, prior work shows that LLMs do not naturally show this ...

arxiv.org

November 3, 2025 at 3:50 PM

Yoav Artzi

@yoavartzi.com

Wild

October 28, 2025 at 1:51 PM

Yoav Artzi

@yoavartzi.com

There's the legit gaming, which is just optimizing for the metrics and breaking them. Then there's the really fake stuff, like citation rings. You would thing citation translate to bitcoins with the level of creativity and effort that people put into it

October 27, 2025 at 6:41 PM

Yoav Artzi

@yoavartzi.com

The top citer has >1k papers, with a PhD from 2007. That's one hell of a steady rate ¯\_(ツ)_/¯

October 27, 2025 at 6:39 PM

Yoav Artzi

@yoavartzi.com

It's pretty crazy how the entire citation game has been manipulated. It's enough to give a quick look at Semantic Scholar for Bengio, who GScholar just gave 1M citations. SScholar gave 0.5M, but it's not only the number, it's the top citers

October 27, 2025 at 6:36 PM

Yoav Artzi

@yoavartzi.com

It definitely doesn't seem to hold in process, which lacks any similar regulation or structure. The (sci-fi-ish?) argument is that one cannot disentangle deployment/impact from development (i.e., one cannot shut it down).

October 23, 2025 at 2:53 PM

Yoav Artzi

@yoavartzi.com

The analogy sounds great, but are you sure it really holds?

Public buy-in aside. Development vs. deployment is clearly distinguished in vaccines, both in being built into the process and in delaying impact to deployment. Does the same hold for so-pronounced ASI development?

October 23, 2025 at 2:53 PM