A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]
Apply to Wisconsin CS to research
- Societal impact of AI
- NLP ←→ CSS and cultural analytics
- Computational sociolinguistics
- Human-AI interaction
- Culturally competent and inclusive NLP
with me!
lucy3.github.io/prospective-...
Apply to Wisconsin CS to research
- Societal impact of AI
- NLP ←→ CSS and cultural analytics
- Computational sociolinguistics
- Human-AI interaction
- Culturally competent and inclusive NLP
with me!
lucy3.github.io/prospective-...
Thank you to the organizers for putting it together!
Thank you to the organizers for putting it together!
AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results.
SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
AstroVisBench tests how well LLMs implement scientific workflows in astronomy and visualize results.
SOTA models like Gemini 2.5 Pro & Claude 4 Opus only match ground truth scientific utility 16% of the time. 🧵
I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘
Excited to develop ideas about linguistic and conceptual generalization (recruitment details soon!)
I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘
Excited to develop ideas about linguistic and conceptual generalization (recruitment details soon!)
We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️.
EvalAgent identifies 👩🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇
We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️.
EvalAgent identifies 👩🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]
Abstracts due March 22 AoE (+48hr)
Full papers due March 28 AoE (+24hr)
Plz RT 🙏
Abstracts due March 22 AoE (+48hr)
Full papers due March 28 AoE (+24hr)
Plz RT 🙏
We are looking to bring on more top talent to our language modeling workstream at @ai2.bsky.social building the open ecosystem. We are hiring:
* Research scientists
* Senior research engineers
* Post docs (Young investigators)
* Pre docs
job-boards.greenhouse.io/thealleninst...
We are looking to bring on more top talent to our language modeling workstream at @ai2.bsky.social building the open ecosystem. We are hiring:
* Research scientists
* Senior research engineers
* Post docs (Young investigators)
* Pre docs
job-boards.greenhouse.io/thealleninst...
@soldaini.net, myself and others from @ai2.bsky.social have been helping in project & also learning a ton---continued pretraining, creating domain-specific training data & evals---to build foundation models that scientists can use. promising area for open source LMs!
@soldaini.net, myself and others from @ai2.bsky.social have been helping in project & also learning a ton---continued pretraining, creating domain-specific training data & evals---to build foundation models that scientists can use. promising area for open source LMs!
Prasann's work 🧵
Prasann's work 🧵
If you work on frontier AI for math/reasoning, talk to George!
If you work on frontier AI for math/reasoning, talk to George!
- @fcyin.bsky.social's LoFiT: using interp to improve fine-tuning (Weds pm poster & MINT spotlight talk Sun)
- @thomlake.bsky.social's analysis of Overton pluralism (Pluralistic alignment Sat)
Please reach out to me to chat about interp, factuality, reasoning, &c!
- @fcyin.bsky.social's LoFiT: using interp to improve fine-tuning (Weds pm poster & MINT spotlight talk Sun)
- @thomlake.bsky.social's analysis of Overton pluralism (Pluralistic alignment Sat)
Please reach out to me to chat about interp, factuality, reasoning, &c!
go.bsky.app/QLQznZg
go.bsky.app/QLQznZg
Congrats to the amazing @yatingwu.bsky.social, Ritika Mangla, Alex Dimakis, @gregdnlp.bsky.social
👉[Oral] Discourse+Phonology+Syntax2 10:30-12:00 @ Flagler
also w/ Ritika Mangla @gregdnlp.bsky.social Alex Dimakis
Congrats to the amazing @yatingwu.bsky.social, Ritika Mangla, Alex Dimakis, @gregdnlp.bsky.social
🔍 Detecting factual errors from LLMs (Liyan Tang)
🛠️ Detect, critique, & refine pipeline (Manya Wadhwa and Lucy Zhao)
🏭 Synthetic data generation (Abhishek Divekar)
📄 Fact-checking (Aniruddh Sriram) at FEVER
t.co/fQbl0G7m23
(1st real post in the bluer skies!)
🔍 Detecting factual errors from LLMs (Liyan Tang)
🛠️ Detect, critique, & refine pipeline (Manya Wadhwa and Lucy Zhao)
🏭 Synthetic data generation (Abhishek Divekar)
📄 Fact-checking (Aniruddh Sriram) at FEVER
t.co/fQbl0G7m23
(1st real post in the bluer skies!)
👉[Oral] Discourse+Phonology+Syntax2 10:30-12:00 @ Flagler
also w/ Ritika Mangla @gregdnlp.bsky.social Alex Dimakis
👉[Oral] Discourse+Phonology+Syntax2 10:30-12:00 @ Flagler
also w/ Ritika Mangla @gregdnlp.bsky.social Alex Dimakis