Lightnews — Scholar-powered news

Sireesh Gururaja

@siree.sh

October 21, 2025 at 4:22 PM

Sireesh Gururaja

@siree.sh

Coming soon (6pm!) to the #ACL poster session: how do experts work with collections of documents, and do LLMs do those things?

tl;dr: only sometimes! While we have good tools for things like information extraction, the way that experts read documents goes deeper - come to our poster to learn more!

Screenshot of paper title "Beyond Text: Characterizing Domain Expert Needs in Document Research"

July 28, 2025 at 3:26 PM

Sireesh Gururaja

@siree.sh

Yeah, I think I read this (and got burned) the same way :/

D&B also had additional spots that implied our original reading, like the dates page:

Screenshot of the full paper submission date for the datasets and benchmarks track. The date is described as "Datasets and Benchmarks - Full Paper Submission and Co-author Registration"

May 14, 2025 at 11:41 PM

Sireesh Gururaja

@siree.sh

Hey, if it's good enough for the guy that founded the town...

Screenshot from the Wikipedia article "Name of Pittsburgh", describing how John Forbes (a Scotsman) may have pronounced Pittsburgh similar to Edinburgh.

May 2, 2025 at 3:06 PM

Sireesh Gururaja

@siree.sh

Research going at the same pace, but!

Car salesman meme, with salesman advertising how many bolded numbers can fit into a table.

December 20, 2024 at 5:29 PM

Sireesh Gururaja

@siree.sh

When I started on ARL project that funds my PhD, the thing we were supposed to build was a "MaterialsGPT".

What is a MaterialsGPT? Where does that idea come from? I got to spend a lot of time thinking about that second question with @davidthewid.bsky.social and Lucy Suchman (!) working on this:

The abstract of a paper titled "Basic Research, Lethal Effects: Military AI Research Funding as Enlistment".

In the context of unprecedented U.S. Department of Defense (DoD) budgets, this paper examines the recent history of DoD funding for academic research in algorithmically based warfighting. We draw from a corpus of DoD grant solicitations from 2007 to 2023, focusing on those addressed to researchers in the field of artificial intelligence (AI). Considering the implications of DoD funding for academic research, the paper proceeds through three analytic sections. In the first, we offer a critical examination of the distinction between basic and applied research, showing how funding calls framed as basic research nonetheless enlist researchers in a war fighting agenda. In the second, we offer a diachronic analysis of the corpus, showing how a 'one small problem' caveat, in which affirmation of progress in military technologies is qualified by acknowledgement of outstanding problems, becomes justification for additional investments in research. We close with an analysis of DoD aspirations based on a subset of Defense Advanced Research Projects Agency (DARPA) grant solicitations for the use of AI in battlefield applications. Taken together, we argue that grant solicitations work as a vehicle for the mutual enlistment of DoD funding agencies and the academic AI research community in setting research agendas. The trope of basic research in this context offers shelter from significant moral questions that military applications of one's research would raise, by obscuring the connections that implicate researchers in U.S. militarism.

December 17, 2024 at 2:33 PM

Sireesh Gururaja

@siree.sh

Fully dislocated my shoulder going down some icy steps, so there will be no winter fishing for me this year :/ now gazing longingly at pictures of the last time I was out

December 8, 2024 at 4:05 PM

Sireesh Gururaja

@siree.sh

These years have also raised existential concerns about the incentives that drive the community, peer review, research under limited compute budgets, and even the place of a *CL community.

Thought clouds depicting questions we've heard about the field: how big a deal is GPT-4, really? Were things always this fast-paced? Why is everything PyTorch/Huggingface?

October 12, 2023 at 2:06 PM

Sireesh Gururaja

@siree.sh

What about LLMs? The last few years have intensified these trends: the community has grown immensely. As models grow better and NLP becomes more public-facing, failures in benchmarking become evident. Centralization on individual models has grown.

October 12, 2023 at 2:05 PM

Sireesh Gururaja

@siree.sh

Neural NLP increased the sharing of toolkits or library code across labs and even across subfields, with libraries like PyTorch and Tensorflow. Pretraining extended this to the sharing of models, too, with Hugging Face being the biggest example.

Line graph showing mentions of software libraries in *CL papers. Libraries show cyclical use, with a "successor" library rising past the previous dominant library shortly after its peak. We see this pattern ith Theano, Tensorflow, Pytorch, and Hugging Face particularly.

October 12, 2023 at 2:05 PM

Sireesh Gururaja

@siree.sh

The rise of statistical NLP in the early 2000s was another such cycle. But major methodological shifts come with major cultural shifts as well. Statistical NLP introduced a culture laser-focused on benchmarks and saw the end of a small, “high trust” research community.

A chart showing the number of unique researchers publishing in *CL venues. The number has increased from 715 in 1980, to 17,829 in 2022.

October 12, 2023 at 2:04 PM

Sireesh Gururaja

@siree.sh

Participants describe cycles of research: a breakthrough, a flurry of work exploiting the new method, then a slower wave of work exploring extensions or limitations. This pattern is not new– for instance, we heard about this with SVMs, RNNs, and BERT!

An image depicting typical attitudes during the two phases of research.

October 12, 2023 at 2:03 PM

Sireesh Gururaja

@siree.sh

These years have also raised existential concerns about the incentives that drive the community, peer review, research under limited compute budgets, and even the place of a *CL community.

October 12, 2023 at 2:02 PM

Sireesh Gururaja

@siree.sh

What about LLMs? The last few years have intensified these trends: the community has grown immensely. As models grow better and NLP becomes more public-facing, failures in benchmarking become evident. Centralization on individual models has grown.

October 12, 2023 at 2:02 PM

Sireesh Gururaja

@siree.sh

We conducted long-form interviews with established NLP researchers, which reveal larger trends and forces that have been shaping the NLP research community since the 1980s.

A timeline of developments in natural language processing, below a chart showing citations of popular papers and mentions of common methods.

October 12, 2023 at 1:59 PM

Sireesh Gururaja

@siree.sh

We all know that “recently large language models have”, “large language models are”, and “large language models can.” But *why* LLMs? How did we get here? (where is “here”?) What forces are shaping NLP, and how recent are they, actually?

To appear at EMNLP 2023: arxiv.org/abs/2310.07715

Screenshot of paper title: "To Build Our Future, We Must Know Our Past: Contextualizing Paradigm Shifts in Natural Language Processing"

October 12, 2023 at 1:59 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news