Sireesh Gururaja
@siree.sh
PhD student @ltiatcmu.bsky.social. Working on NLP that centers worker agency. Otherwise: coffee, fly fishing, and keeping peach pits around, for...some reason
https://siree.sh
https://siree.sh
Pinned
Sireesh Gururaja
@siree.sh
· Dec 17
When I started on ARL project that funds my PhD, the thing we were supposed to build was a "MaterialsGPT".
What is a MaterialsGPT? Where does that idea come from? I got to spend a lot of time thinking about that second question with @davidthewid.bsky.social and Lucy Suchman (!) working on this:
What is a MaterialsGPT? Where does that idea come from? I got to spend a lot of time thinking about that second question with @davidthewid.bsky.social and Lucy Suchman (!) working on this:
Reposted by Sireesh Gururaja
Please respond to this survey if you have changed or have thought about changing your name in academic publishing! For any reason, whether it be transition, recognizability, marriage, privacy, immigration, cultural reasons, etc.
Please RT for reach :)
Please RT for reach :)
We're surveying researchers about name changes in academic publishing.
If you've changed your name and dealt with updating publications, we want to hear your experience. Any reason counts: transition, marriage, cultural reasons, etc.
forms.cloud.microsoft/e/E0XXBmZdEP
If you've changed your name and dealt with updating publications, we want to hear your experience. Any reason counts: transition, marriage, cultural reasons, etc.
forms.cloud.microsoft/e/E0XXBmZdEP
November 10, 2025 at 3:11 PM
Please respond to this survey if you have changed or have thought about changing your name in academic publishing! For any reason, whether it be transition, recognizability, marriage, privacy, immigration, cultural reasons, etc.
Please RT for reach :)
Please RT for reach :)
love the construction of this dataset, it feels very much of a piece with other work like the reversal curse. LLMs clearly don't "learn" or "know" in the ways we do, whether relation reversal or simple compositionality, and I love how clearly this work demonstrates that.
Can LLMs accurately aggregate information over long, information-dense texts? Not yet…
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
November 7, 2025 at 8:16 PM
love the construction of this dataset, it feels very much of a piece with other work like the reversal curse. LLMs clearly don't "learn" or "know" in the ways we do, whether relation reversal or simple compositionality, and I love how clearly this work demonstrates that.
My name is Ozymandias, King of Kings!
Look upon my works, which achieve state-of-the art performance across a diverse suite of benchmarks
Look upon my works, which achieve state-of-the art performance across a diverse suite of benchmarks
My name is Ozymandias, King of Kings!
My voice is my passport. Verify me
My voice is my passport. Verify me
My name is Ozymandias, king of kings!
Any similarity to any persons, living or dead, is purely coincidental.
Any similarity to any persons, living or dead, is purely coincidental.
November 6, 2025 at 4:18 AM
My name is Ozymandias, King of Kings!
Look upon my works, which achieve state-of-the art performance across a diverse suite of benchmarks
Look upon my works, which achieve state-of-the art performance across a diverse suite of benchmarks
Reposted by Sireesh Gururaja
✨I’m on the academic job market ✨
I’m a PhD candidate at @hcii.cmu.edu studying tech, labor, and resistance 👩🏻💻💪🏽💥
I research how workers and communities contest harmful sociotechnical systems and shape alternative futures through everyday resistance and collective action
More info: cella.io
I’m a PhD candidate at @hcii.cmu.edu studying tech, labor, and resistance 👩🏻💻💪🏽💥
I research how workers and communities contest harmful sociotechnical systems and shape alternative futures through everyday resistance and collective action
More info: cella.io
Cella M. Sum –
cella.io
October 9, 2025 at 2:39 PM
✨I’m on the academic job market ✨
I’m a PhD candidate at @hcii.cmu.edu studying tech, labor, and resistance 👩🏻💻💪🏽💥
I research how workers and communities contest harmful sociotechnical systems and shape alternative futures through everyday resistance and collective action
More info: cella.io
I’m a PhD candidate at @hcii.cmu.edu studying tech, labor, and resistance 👩🏻💻💪🏽💥
I research how workers and communities contest harmful sociotechnical systems and shape alternative futures through everyday resistance and collective action
More info: cella.io
Reposted by Sireesh Gururaja
How to not do computational humanities:
(1) Lay out a question/hypothesis about a complex, cultural domain.
(2) Compute numbers that were inspired by (1) but without sufficiently formalizing (1) so as to meaningfully link to it.
(3) Interpret numbers to mean whatever your prior vibes were.
(1) Lay out a question/hypothesis about a complex, cultural domain.
(2) Compute numbers that were inspired by (1) but without sufficiently formalizing (1) so as to meaningfully link to it.
(3) Interpret numbers to mean whatever your prior vibes were.
September 7, 2025 at 7:29 PM
How to not do computational humanities:
(1) Lay out a question/hypothesis about a complex, cultural domain.
(2) Compute numbers that were inspired by (1) but without sufficiently formalizing (1) so as to meaningfully link to it.
(3) Interpret numbers to mean whatever your prior vibes were.
(1) Lay out a question/hypothesis about a complex, cultural domain.
(2) Compute numbers that were inspired by (1) but without sufficiently formalizing (1) so as to meaningfully link to it.
(3) Interpret numbers to mean whatever your prior vibes were.
So excited to be TAing this course! So much of the knowledge you have as a PhD student is expected to be gained by osmosis, and it leads to some odd holes and gaps. This course should fix most of that problem!
I'm excited cause I'm teaching/coordinating a new unique class, where we teach new PhD students all the "soft" skills of research, incl. ideation, reviewing, presenting, interviewing, advising, etc.
Each lecture is taught by a different LTI prof! It takes a village! maartensap.com/11705/Fall20...
Each lecture is taught by a different LTI prof! It takes a village! maartensap.com/11705/Fall20...
August 25, 2025 at 7:31 PM
So excited to be TAing this course! So much of the knowledge you have as a PhD student is expected to be gained by osmosis, and it leads to some odd holes and gaps. This course should fix most of that problem!
The military keynesianism of our day (though it's of course possible that military keynesianism is the military keynesianism of our day, and this is just _another_ one)
The AI infrastructure build-out is so gigantic that in the past 6 months, it contributed more to the growth of the U.S. economy than /all of consumer spending/
The 'magnificent 7' spent more than $100 billion on data centers and the like in the past three months *alone*
www.wsj.com/tech/ai/sili...
The 'magnificent 7' spent more than $100 billion on data centers and the like in the past three months *alone*
www.wsj.com/tech/ai/sili...
August 1, 2025 at 1:06 PM
The military keynesianism of our day (though it's of course possible that military keynesianism is the military keynesianism of our day, and this is just _another_ one)
Coming soon (6pm!) to the #ACL poster session: how do experts work with collections of documents, and do LLMs do those things?
tl;dr: only sometimes! While we have good tools for things like information extraction, the way that experts read documents goes deeper - come to our poster to learn more!
tl;dr: only sometimes! While we have good tools for things like information extraction, the way that experts read documents goes deeper - come to our poster to learn more!
July 28, 2025 at 3:26 PM
Coming soon (6pm!) to the #ACL poster session: how do experts work with collections of documents, and do LLMs do those things?
tl;dr: only sometimes! While we have good tools for things like information extraction, the way that experts read documents goes deeper - come to our poster to learn more!
tl;dr: only sometimes! While we have good tools for things like information extraction, the way that experts read documents goes deeper - come to our poster to learn more!
Reposted by Sireesh Gururaja
Gently, I would like to say: When people tell you that they would appreciate a feature that does something automatically, it's not responsive to that concern to explain that by going through several steps for every individual instance, they can get the same result in each instance.
June 12, 2025 at 1:45 PM
Gently, I would like to say: When people tell you that they would appreciate a feature that does something automatically, it's not responsive to that concern to explain that by going through several steps for every individual instance, they can get the same result in each instance.
Reposted by Sireesh Gururaja
OpenAI has effectively conned people into thinking that Chatbots & AI "Assistants" are The FEWTCHA of AI. Friends, they are most likely *not.* Neither are the big cloud-based Generative AI services.
Small, purpose-fit, on-device models that make your existing activities easier/better? There you go.
Small, purpose-fit, on-device models that make your existing activities easier/better? There you go.
June 10, 2025 at 2:54 PM
OpenAI has effectively conned people into thinking that Chatbots & AI "Assistants" are The FEWTCHA of AI. Friends, they are most likely *not.* Neither are the big cloud-based Generative AI services.
Small, purpose-fit, on-device models that make your existing activities easier/better? There you go.
Small, purpose-fit, on-device models that make your existing activities easier/better? There you go.
Reposted by Sireesh Gururaja
When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:
🧵1/9
🧵1/9
June 9, 2025 at 1:47 PM
When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:
🧵1/9
🧵1/9
Reposted by Sireesh Gururaja
I for one am grateful for the opportunity to meditate on the meaning of “scientific artifact” at 2:15am
Do folks really find the ARR checklist valuable enough to justify that a paper submission takes this much effort?
May 20, 2025 at 1:08 PM
I for one am grateful for the opportunity to meditate on the meaning of “scientific artifact” at 2:15am
Great points in the replies here re:civility being an unequally applied, generally awful standard.
It's also a great example of how common formulations of "sentiment analysis" or "toxicity detection" in NLP that are non-contextual lead to systems that stop sounding good with even slight scrutiny.
It's also a great example of how common formulations of "sentiment analysis" or "toxicity detection" in NLP that are non-contextual lead to systems that stop sounding good with even slight scrutiny.
May 3, 2025 at 8:29 PM
Great points in the replies here re:civility being an unequally applied, generally awful standard.
It's also a great example of how common formulations of "sentiment analysis" or "toxicity detection" in NLP that are non-contextual lead to systems that stop sounding good with even slight scrutiny.
It's also a great example of how common formulations of "sentiment analysis" or "toxicity detection" in NLP that are non-contextual lead to systems that stop sounding good with even slight scrutiny.
AI as another pivot to video is a thought I've also had, and this article is such a great articulation.
The way "AI" is framed, pitched, and deployed by big players reflects at best ignorance, and at worst a real contempt for the social nature and humanity of our jobs and online spaces
The way "AI" is framed, pitched, and deployed by big players reflects at best ignorance, and at worst a real contempt for the social nature and humanity of our jobs and online spaces
I've seen a lot of people compare AI products to the rise of the internet and warn that if we don't start using them we'll get left behind. But I don't think that's what happening. I think AI is "pivot to video" 2.0. My latest for @spitfirenews.com
spitfirenews.com/p/ai-is-pivo...
spitfirenews.com/p/ai-is-pivo...
AI is 'pivot to video' 2.0
Plus, join my first member Q+A!
spitfirenews.com
May 1, 2025 at 7:57 PM
AI as another pivot to video is a thought I've also had, and this article is such a great articulation.
The way "AI" is framed, pitched, and deployed by big players reflects at best ignorance, and at worst a real contempt for the social nature and humanity of our jobs and online spaces
The way "AI" is framed, pitched, and deployed by big players reflects at best ignorance, and at worst a real contempt for the social nature and humanity of our jobs and online spaces
This talk was such a joy to do! If you'd like to read the paper, it's here: arxiv.org/abs/2411.17840.
Thank you for having us, @patrickbriansmith.bsky.social!
Thank you for having us, @patrickbriansmith.bsky.social!
May 1, 2025 at 5:12 PM
This talk was such a joy to do! If you'd like to read the paper, it's here: arxiv.org/abs/2411.17840.
Thank you for having us, @patrickbriansmith.bsky.social!
Thank you for having us, @patrickbriansmith.bsky.social!
If you're at NAACL this week (or just want to keep track), I have a feed for you: bsky.app/profile/did:...
Currently pulling everyone that mentions NAACL, posts a link from the ACL Anthology, or has NAACL in their username. Happy conferencing!
Currently pulling everyone that mentions NAACL, posts a link from the ACL Anthology, or has NAACL in their username. Happy conferencing!
April 29, 2025 at 6:07 PM
If you're at NAACL this week (or just want to keep track), I have a feed for you: bsky.app/profile/did:...
Currently pulling everyone that mentions NAACL, posts a link from the ACL Anthology, or has NAACL in their username. Happy conferencing!
Currently pulling everyone that mentions NAACL, posts a link from the ACL Anthology, or has NAACL in their username. Happy conferencing!
Reposted by Sireesh Gururaja
Ever trusted a metric that works great on average, only for it to fail in your specific use case?
In our #NAACL2025 paper (w/ @841io.bsky.social), we show why global evaluations are not enough and why context matters more than you think.
📄 aclanthology.org/2025.finding...
#NLP #Evaluation
(🧵1/9)
In our #NAACL2025 paper (w/ @841io.bsky.social), we show why global evaluations are not enough and why context matters more than you think.
📄 aclanthology.org/2025.finding...
#NLP #Evaluation
(🧵1/9)
April 29, 2025 at 5:10 PM
Ever trusted a metric that works great on average, only for it to fail in your specific use case?
In our #NAACL2025 paper (w/ @841io.bsky.social), we show why global evaluations are not enough and why context matters more than you think.
📄 aclanthology.org/2025.finding...
#NLP #Evaluation
(🧵1/9)
In our #NAACL2025 paper (w/ @841io.bsky.social), we show why global evaluations are not enough and why context matters more than you think.
📄 aclanthology.org/2025.finding...
#NLP #Evaluation
(🧵1/9)
Reposted by Sireesh Gururaja
🚀 Excited to share a new interp+agents paper: 🐭🐱 MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools appearing at #NAACL2025
This was work done @msftresearch.bsky.social last summer with Jason Eisner, Justin Svegliato, Ben Van Durme, Yu Su, and Sam Thomson
1/🧵
This was work done @msftresearch.bsky.social last summer with Jason Eisner, Justin Svegliato, Ben Van Durme, Yu Su, and Sam Thomson
1/🧵
April 29, 2025 at 1:41 PM
🚀 Excited to share a new interp+agents paper: 🐭🐱 MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools appearing at #NAACL2025
This was work done @msftresearch.bsky.social last summer with Jason Eisner, Justin Svegliato, Ben Van Durme, Yu Su, and Sam Thomson
1/🧵
This was work done @msftresearch.bsky.social last summer with Jason Eisner, Justin Svegliato, Ben Van Durme, Yu Su, and Sam Thomson
1/🧵
Reposted by Sireesh Gururaja
This is absurdly great, but I haven't read a single news article about it. A fully open source, offline-first alternative to Notion that's a collab between the French and German governments because they want to host docs securely and on their own terms. THIS is what Europe should be doing.
Docs
Docs: Your new companion to collaborate on documents efficiently, intuitively, and securely.
docs.numerique.gouv.fr
March 16, 2025 at 11:03 PM
This is absurdly great, but I haven't read a single news article about it. A fully open source, offline-first alternative to Notion that's a collab between the French and German governments because they want to host docs securely and on their own terms. THIS is what Europe should be doing.
Reposted by Sireesh Gururaja
one thing in AI is not new -- people taking one small part of a job, mischaracterizing it, ignoring all the other stuff, and then assume the AI can do the whole job on its own
If you have zero education, but learn how to ask AI models the right questions , in many jobs you will be able to outperform someone with an advanced degree, but who is unwilling to use Large Language Models.
Just takes a smartphone, curiosity to experiment and a mindset to learn.
Just takes a smartphone, curiosity to experiment and a mindset to learn.
February 17, 2025 at 11:30 PM
one thing in AI is not new -- people taking one small part of a job, mischaracterizing it, ignoring all the other stuff, and then assume the AI can do the whole job on its own
Reposted by Sireesh Gururaja
Have these people met … society? Read a book? Listened to music? Regurgitating esoteric facts isn’t intelligence.
This is more like humanity’s last stand at jeopardy
www.nytimes.com/2025/01/23/t...
This is more like humanity’s last stand at jeopardy
www.nytimes.com/2025/01/23/t...
A Test So Hard No AI System Can Pass It — Yet
The creators of a new test called “Humanity’s Last Exam” argue we may soon lose the ability to create tests hard enough for A.I. models.
www.nytimes.com
January 25, 2025 at 6:15 PM
Have these people met … society? Read a book? Listened to music? Regurgitating esoteric facts isn’t intelligence.
This is more like humanity’s last stand at jeopardy
www.nytimes.com/2025/01/23/t...
This is more like humanity’s last stand at jeopardy
www.nytimes.com/2025/01/23/t...
Reposted by Sireesh Gururaja
[[ Now that it's not 1st anymore and more people might pay attention, sharing this again. ]]
Hello *friends* doing ai + culture work, send across your burning questions about research, career, field, etc. It is anonymous :)
Hello *friends* doing ai + culture work, send across your burning questions about research, career, field, etc. It is anonymous :)
🎉students studying AI and culture🎉
if you had an audience with senior folks from computer science, social science, and the humanities studying AI and culture, what questions would you ask them about the field, careers, skill sets, etc?
students only!
forms.gle/tZY5tJW2gqVd...
if you had an audience with senior folks from computer science, social science, and the humanities studying AI and culture, what questions would you ask them about the field, careers, skill sets, etc?
students only!
forms.gle/tZY5tJW2gqVd...
January 6, 2025 at 6:02 AM
[[ Now that it's not 1st anymore and more people might pay attention, sharing this again. ]]
Hello *friends* doing ai + culture work, send across your burning questions about research, career, field, etc. It is anonymous :)
Hello *friends* doing ai + culture work, send across your burning questions about research, career, field, etc. It is anonymous :)
When I started on ARL project that funds my PhD, the thing we were supposed to build was a "MaterialsGPT".
What is a MaterialsGPT? Where does that idea come from? I got to spend a lot of time thinking about that second question with @davidthewid.bsky.social and Lucy Suchman (!) working on this:
What is a MaterialsGPT? Where does that idea come from? I got to spend a lot of time thinking about that second question with @davidthewid.bsky.social and Lucy Suchman (!) working on this:
December 17, 2024 at 2:33 PM
When I started on ARL project that funds my PhD, the thing we were supposed to build was a "MaterialsGPT".
What is a MaterialsGPT? Where does that idea come from? I got to spend a lot of time thinking about that second question with @davidthewid.bsky.social and Lucy Suchman (!) working on this:
What is a MaterialsGPT? Where does that idea come from? I got to spend a lot of time thinking about that second question with @davidthewid.bsky.social and Lucy Suchman (!) working on this:
As someone whose PhD would not have been possible without military funding, this paper was a privilege to write.
(Even outside of continuing to work with @davidthewid.bsky.social, and getting to work with Lucy Suchman (!))
(Even outside of continuing to work with @davidthewid.bsky.social, and getting to work with Lucy Suchman (!))
📢 NEW Paper!
@siree.sh, Lucy Suchman, and I examine a corpus of 7,000 US Military grant solicitations to ask what the world’s largest military wants with to do with AI, by looking at what it seeks to fund. #STS
📄: arxiv.org/pdf/2411.17840
We find…
@siree.sh, Lucy Suchman, and I examine a corpus of 7,000 US Military grant solicitations to ask what the world’s largest military wants with to do with AI, by looking at what it seeks to fund. #STS
📄: arxiv.org/pdf/2411.17840
We find…
December 11, 2024 at 3:04 PM
As someone whose PhD would not have been possible without military funding, this paper was a privilege to write.
(Even outside of continuing to work with @davidthewid.bsky.social, and getting to work with Lucy Suchman (!))
(Even outside of continuing to work with @davidthewid.bsky.social, and getting to work with Lucy Suchman (!))
Fully dislocated my shoulder going down some icy steps, so there will be no winter fishing for me this year :/ now gazing longingly at pictures of the last time I was out
December 8, 2024 at 4:05 PM
Fully dislocated my shoulder going down some icy steps, so there will be no winter fishing for me this year :/ now gazing longingly at pictures of the last time I was out