Anirudh Khatry
@anirudhkhatry.bsky.social
CS PhD @utaustin.bsky.social
Pinned
Anirudh Khatry
@anirudhkhatry.bsky.social
· Apr 23
🚀Meet CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]
Reposted by Anirudh Khatry
#COLM2025 was one of my favorite conferences -- a really high fraction of interesting papers and people, but small enough to see everything!
Thank you to the organizers for putting it together!
Thank you to the organizers for putting it together!
October 13, 2025 at 12:40 AM
#COLM2025 was one of my favorite conferences -- a really high fraction of interesting papers and people, but small enough to see everything!
Thank you to the organizers for putting it together!
Thank you to the organizers for putting it together!
Reposted by Anirudh Khatry
How good are LLMs at 🔭 scientific computing and visualization 🔭?
AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results.
SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results.
SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
June 2, 2025 at 3:42 PM
How good are LLMs at 🔭 scientific computing and visualization 🔭?
AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results.
SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results.
SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
Reposted by Anirudh Khatry
We’ve started a podcast! @awsto.bsky.social and @samps.phd host “Current Continuation,” a little interview series with PL researchers. The first two episodes are with @ranjitjhala.bsky.social and @satnam6502.bsky.social. sigplan.org/cc/
Current Continuation
sigplan.org
June 2, 2025 at 3:19 PM
We’ve started a podcast! @awsto.bsky.social and @samps.phd host “Current Continuation,” a little interview series with PL researchers. The first two episodes are with @ranjitjhala.bsky.social and @satnam6502.bsky.social. sigplan.org/cc/
Reposted by Anirudh Khatry
News🗞️
I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘
Excited to develop ideas about linguistic and conceptual generalization (recruitment details soon!)
I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘
Excited to develop ideas about linguistic and conceptual generalization (recruitment details soon!)
June 2, 2025 at 1:18 PM
News🗞️
I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘
Excited to develop ideas about linguistic and conceptual generalization (recruitment details soon!)
I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘
Excited to develop ideas about linguistic and conceptual generalization (recruitment details soon!)
Reposted by Anirudh Khatry
Evaluating language model responses on open-ended tasks is hard! 🤔
We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️.
EvalAgent identifies 👩🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇
We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️.
EvalAgent identifies 👩🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇
April 22, 2025 at 3:04 PM
Evaluating language model responses on open-ended tasks is hard! 🤔
We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️.
EvalAgent identifies 👩🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇
We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️.
EvalAgent identifies 👩🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇
🚀Meet CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]
April 23, 2025 at 5:00 PM
🚀Meet CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]
Reposted by Anirudh Khatry
A bit of a mess around the conflict of COLM with the ARR (and to lesser degree ICML) reviews release. We feel this is creating a lot of pressure and uncertainty. So, we are pushing our deadlines:
Abstracts due March 22 AoE (+48hr)
Full papers due March 28 AoE (+24hr)
Plz RT 🙏
Abstracts due March 22 AoE (+48hr)
Full papers due March 28 AoE (+24hr)
Plz RT 🙏
March 20, 2025 at 6:20 PM
A bit of a mess around the conflict of COLM with the ARR (and to lesser degree ICML) reviews release. We feel this is creating a lot of pressure and uncertainty. So, we are pushing our deadlines:
Abstracts due March 22 AoE (+48hr)
Full papers due March 28 AoE (+24hr)
Plz RT 🙏
Abstracts due March 22 AoE (+48hr)
Full papers due March 28 AoE (+24hr)
Plz RT 🙏
Reposted by Anirudh Khatry
Come work with me!
We are looking to bring on more top talent to our language modeling workstream at @ai2.bsky.social building the open ecosystem. We are hiring:
* Research scientists
* Senior research engineers
* Post docs (Young investigators)
* Pre docs
job-boards.greenhouse.io/thealleninst...
We are looking to bring on more top talent to our language modeling workstream at @ai2.bsky.social building the open ecosystem. We are hiring:
* Research scientists
* Senior research engineers
* Post docs (Young investigators)
* Pre docs
job-boards.greenhouse.io/thealleninst...
The Allen Institute for AI
job-boards.greenhouse.io
February 25, 2025 at 1:07 AM
Come work with me!
We are looking to bring on more top talent to our language modeling workstream at @ai2.bsky.social building the open ecosystem. We are hiring:
* Research scientists
* Senior research engineers
* Post docs (Young investigators)
* Pre docs
job-boards.greenhouse.io/thealleninst...
We are looking to bring on more top talent to our language modeling workstream at @ai2.bsky.social building the open ecosystem. We are hiring:
* Research scientists
* Senior research engineers
* Post docs (Young investigators)
* Pre docs
job-boards.greenhouse.io/thealleninst...
Reposted by Anirudh Khatry
🌟Job ad🌟 We (@gregdnlp.bsky.social, @mattlease.bsky.social and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!
Seeking candidates (within three years of the award of their PhD) for a postdoctoral position with the Explorable Universe research group to perform research on developing next-generation generative AI copilots & agents to aid astronomy research. Info here www.cosmicai.org/jobs/postdoc...
February 25, 2025 at 10:09 PM
🌟Job ad🌟 We (@gregdnlp.bsky.social, @mattlease.bsky.social and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!
Reposted by Anirudh Khatry
encourage postdocs to apply 👇
@soldaini.net, myself and others from @ai2.bsky.social have been helping in project & also learning a ton---continued pretraining, creating domain-specific training data & evals---to build foundation models that scientists can use. promising area for open source LMs!
@soldaini.net, myself and others from @ai2.bsky.social have been helping in project & also learning a ton---continued pretraining, creating domain-specific training data & evals---to build foundation models that scientists can use. promising area for open source LMs!
🌟Job ad🌟 We (@gregdnlp.bsky.social, @mattlease.bsky.social and I) are hiring a postdoc fellow within the CosmicAI Institute, to do galactic work with LLMs and generative AI! If you would like to push the frontiers of foundation models to help solve myths of the universe, please apply!
Seeking candidates (within three years of the award of their PhD) for a postdoctoral position with the Explorable Universe research group to perform research on developing next-generation generative AI copilots & agents to aid astronomy research. Info here www.cosmicai.org/jobs/postdoc...
February 25, 2025 at 11:24 PM
encourage postdocs to apply 👇
@soldaini.net, myself and others from @ai2.bsky.social have been helping in project & also learning a ton---continued pretraining, creating domain-specific training data & evals---to build foundation models that scientists can use. promising area for open source LMs!
@soldaini.net, myself and others from @ai2.bsky.social have been helping in project & also learning a ton---continued pretraining, creating domain-specific training data & evals---to build foundation models that scientists can use. promising area for open source LMs!
Reposted by Anirudh Khatry
three things are certain in life: death, taxes, and Claude switching to concise mode during US business hours
February 10, 2025 at 5:17 PM
three things are certain in life: death, taxes, and Claude switching to concise mode during US business hours
Kudos to Usneek Singh. It was a pleasure to collaborate on this paper with the amazing folks at PROSE!
Excited to share that our work on validating synthetic data for spreadsheet formula generation from natural language will be presented at NAACL Findings 2025 arxiv.org/abs/2407.10657 congratulations to the lead author, Usneek Singh, who i had the pleasure of working with at MSFT.
An Empirical Study of Validating Synthetic Data for Formula Generation
Large language models (LLMs) can be leveraged to help with writing formulas in spreadsheets, but resources on these formulas are scarce, impacting both the base performance of pre-trained models and l...
arxiv.org
January 30, 2025 at 5:09 AM
Kudos to Usneek Singh. It was a pleasure to collaborate on this paper with the amazing folks at PROSE!
Reposted by Anirudh Khatry
@ayushkhaitan.bluesky.social, Amitayush Thakur, and I are organizing an #AI4Math panel at the Joint Mathematics Meeting this month. Please spread the word among your math friends! We will post a summary of the discussion after the event.
Looking forward to the #jmm2025 panel on the "Use of AI tools for Mathematics research" that we are co-organizing with @swarat.bsky.social and Amitayush Thakur. The panelists are Alex Kontorovich, Rishi Mehta, Emily Wenger and Kaiyu Yang. See you there!
January 4, 2025 at 3:08 AM
@ayushkhaitan.bluesky.social, Amitayush Thakur, and I are organizing an #AI4Math panel at the Joint Mathematics Meeting this month. Please spread the word among your math friends! We will post a summary of the discussion after the event.
Reposted by Anirudh Khatry
Huge congrats to @prasannsinghal.bsky.social for being one of the 8 CRA Outstanding Undergraduate Researcher Award winners! It has been an absolute privilege to work with Prasann during his time at UT. (And he's applying for PhD programs this year...hint hint...)
Prasann's work 🧵
Prasann's work 🧵
January 3, 2025 at 2:37 PM
Huge congrats to @prasannsinghal.bsky.social for being one of the 8 CRA Outstanding Undergraduate Researcher Award winners! It has been an absolute privilege to work with Prasann during his time at UT. (And he's applying for PhD programs this year...hint hint...)
Prasann's work 🧵
Prasann's work 🧵
Reposted by Anirudh Khatry
@andersmoeller.bsky.social and I are co-chairing OOPSLA'26 and soliciting PC nominations. If you'd like to serve on the OOPSLA PC next year or know anyone (e.g., recent graduate) who you think would do a good job, please nominate them here: forms.gle/NVnzjcmbshoL...
forms.gle
December 23, 2024 at 8:43 PM
@andersmoeller.bsky.social and I are co-chairing OOPSLA'26 and soliciting PC nominations. If you'd like to serve on the OOPSLA PC next year or know anyone (e.g., recent graduate) who you think would do a good job, please nominate them here: forms.gle/NVnzjcmbshoL...
Reposted by Anirudh Khatry
The legendary Putnam math competition had its 85th edition yesterday. Coincidentally, George Tsoukalas will present our paper on PutnamBench, a next-generation #AI4Math benchmark, at #NeurIPS2024 this week: arxiv.org/abs/2407.11214.
If you work on frontier AI for math/reasoning, talk to George!
If you work on frontier AI for math/reasoning, talk to George!
December 8, 2024 at 8:04 PM
The legendary Putnam math competition had its 85th edition yesterday. Coincidentally, George Tsoukalas will present our paper on PutnamBench, a next-generation #AI4Math benchmark, at #NeurIPS2024 this week: arxiv.org/abs/2407.11214.
If you work on frontier AI for math/reasoning, talk to George!
If you work on frontier AI for math/reasoning, talk to George!
Reposted by Anirudh Khatry
I'll be at #NeurIPS2024 w/
- @fcyin.bsky.social's LoFiT: using interp to improve fine-tuning (Weds pm poster & MINT spotlight talk Sun)
- @thomlake.bsky.social's analysis of Overton pluralism (Pluralistic alignment Sat)
Please reach out to me to chat about interp, factuality, reasoning, &c!
- @fcyin.bsky.social's LoFiT: using interp to improve fine-tuning (Weds pm poster & MINT spotlight talk Sun)
- @thomlake.bsky.social's analysis of Overton pluralism (Pluralistic alignment Sat)
Please reach out to me to chat about interp, factuality, reasoning, &c!
December 8, 2024 at 8:38 PM
I'll be at #NeurIPS2024 w/
- @fcyin.bsky.social's LoFiT: using interp to improve fine-tuning (Weds pm poster & MINT spotlight talk Sun)
- @thomlake.bsky.social's analysis of Overton pluralism (Pluralistic alignment Sat)
Please reach out to me to chat about interp, factuality, reasoning, &c!
- @fcyin.bsky.social's LoFiT: using interp to improve fine-tuning (Weds pm poster & MINT spotlight talk Sun)
- @thomlake.bsky.social's analysis of Overton pluralism (Pluralistic alignment Sat)
Please reach out to me to chat about interp, factuality, reasoning, &c!
Reposted by Anirudh Khatry
Excited to visit Columbia next week!
November 22, 2024 at 2:30 AM
Excited to visit Columbia next week!
Reposted by Anirudh Khatry
I did a starter pack of ML/AI people at @utaustin.bsky.social Please distribute and feel free to self nominate!
go.bsky.app/QLQznZg
go.bsky.app/QLQznZg
November 22, 2024 at 9:25 AM
I did a starter pack of ML/AI people at @utaustin.bsky.social Please distribute and feel free to self nominate!
go.bsky.app/QLQznZg
go.bsky.app/QLQznZg
Yay!!! I’m one of the cool people!
Here is my starter pack of PL folks -- please come and join the fun! go.bsky.app/6kzdn3x
November 16, 2024 at 5:47 PM
Yay!!! I’m one of the cool people!
Reposted by Anirudh Khatry
We got an 🥂 Outstanding Paper Award!! Cannot be more grateful 🥹 This is super validating for our long pursuit of computational work on QUD.
Congrats to the amazing @yatingwu.bsky.social, Ritika Mangla, Alex Dimakis, @gregdnlp.bsky.social
Congrats to the amazing @yatingwu.bsky.social, Ritika Mangla, Alex Dimakis, @gregdnlp.bsky.social
Wednesday at #EMNLP: @yatingwu.bsky.social will present our work connecting curiosity and questions in discourse. We built strong models to predict salience, outperforming large LLMs.
👉[Oral] Discourse+Phonology+Syntax2 10:30-12:00 @ Flagler
also w/ Ritika Mangla @gregdnlp.bsky.social Alex Dimakis
👉[Oral] Discourse+Phonology+Syntax2 10:30-12:00 @ Flagler
also w/ Ritika Mangla @gregdnlp.bsky.social Alex Dimakis
November 15, 2024 at 1:12 PM
We got an 🥂 Outstanding Paper Award!! Cannot be more grateful 🥹 This is super validating for our long pursuit of computational work on QUD.
Congrats to the amazing @yatingwu.bsky.social, Ritika Mangla, Alex Dimakis, @gregdnlp.bsky.social
Congrats to the amazing @yatingwu.bsky.social, Ritika Mangla, Alex Dimakis, @gregdnlp.bsky.social
Reposted by Anirudh Khatry
I won't be at EMNLP, but come and see:
🔍 Detecting factual errors from LLMs (Liyan Tang)
🛠️ Detect, critique, & refine pipeline (Manya Wadhwa and Lucy Zhao)
🏭 Synthetic data generation (Abhishek Divekar)
📄 Fact-checking (Aniruddh Sriram) at FEVER
t.co/fQbl0G7m23
(1st real post in the bluer skies!)
🔍 Detecting factual errors from LLMs (Liyan Tang)
🛠️ Detect, critique, & refine pipeline (Manya Wadhwa and Lucy Zhao)
🏭 Synthetic data generation (Abhishek Divekar)
📄 Fact-checking (Aniruddh Sriram) at FEVER
t.co/fQbl0G7m23
(1st real post in the bluer skies!)
November 13, 2024 at 3:46 AM
I won't be at EMNLP, but come and see:
🔍 Detecting factual errors from LLMs (Liyan Tang)
🛠️ Detect, critique, & refine pipeline (Manya Wadhwa and Lucy Zhao)
🏭 Synthetic data generation (Abhishek Divekar)
📄 Fact-checking (Aniruddh Sriram) at FEVER
t.co/fQbl0G7m23
(1st real post in the bluer skies!)
🔍 Detecting factual errors from LLMs (Liyan Tang)
🛠️ Detect, critique, & refine pipeline (Manya Wadhwa and Lucy Zhao)
🏭 Synthetic data generation (Abhishek Divekar)
📄 Fact-checking (Aniruddh Sriram) at FEVER
t.co/fQbl0G7m23
(1st real post in the bluer skies!)
Reposted by Anirudh Khatry
Wednesday at #EMNLP: @yatingwu.bsky.social will present our work connecting curiosity and questions in discourse. We built strong models to predict salience, outperforming large LLMs.
👉[Oral] Discourse+Phonology+Syntax2 10:30-12:00 @ Flagler
also w/ Ritika Mangla @gregdnlp.bsky.social Alex Dimakis
👉[Oral] Discourse+Phonology+Syntax2 10:30-12:00 @ Flagler
also w/ Ritika Mangla @gregdnlp.bsky.social Alex Dimakis
November 13, 2024 at 5:45 AM
Wednesday at #EMNLP: @yatingwu.bsky.social will present our work connecting curiosity and questions in discourse. We built strong models to predict salience, outperforming large LLMs.
👉[Oral] Discourse+Phonology+Syntax2 10:30-12:00 @ Flagler
also w/ Ritika Mangla @gregdnlp.bsky.social Alex Dimakis
👉[Oral] Discourse+Phonology+Syntax2 10:30-12:00 @ Flagler
also w/ Ritika Mangla @gregdnlp.bsky.social Alex Dimakis