Lightnews — Scholar-powered news

Reposted by Prasann Singhal

Kanishka Misra 🌊

@kanishka.bsky.social

News🗞️

I will return to UT Austin as an Assistant Professor of Linguistics this fall, and join its vibrant community of Computational Linguists, NLPers, and Cognitive Scientists!🤘

Excited to develop ideas about linguistic and conceptual generalization (recruitment details soon!)

Picture of the UT Tower taken by me on my first day at UT as a postdoc in 2023!

June 2, 2025 at 1:18 PM

Reposted by Prasann Singhal

Greg Durrett

@gregdnlp.bsky.social

Check out @juand-r.bsky.social and @wenxuand.bsky.social 's work on improving generator-validator gaps in LLMs! I really like the formulation of the G-V gap we present, and I was pleasantly surprised by how well the ranking-based training closed the gap. Looking forward to following up in this area!

Juan Diego Rodriguez @EurIPS @juand-r.bsky.social · Apr 16

One of the ways that LLMs can be inconsistent is the "generator-validator gap," where LLMs deem their own answers incorrect.

🎯 We demonstrate that ranking-based discriminator training can significantly reduce this gap, and improvements on one task often generalize to others!

🧵👇

A visualization of the generator-validator gap, where the LM likelihoods of for the generator and discriminator forms of questions are poorly correlated.

Aligning the validator and generator rankings can fix it!

April 16, 2025 at 6:18 PM

Reposted by Prasann Singhal

Greg Durrett

@gregdnlp.bsky.social

Check out Ramya et al.'s work on understanding discourse similarities in LLM-generated text! We see this as an important step in quantifying the "sameyness" of LLM text, which we think will be a step towards fixing it!

Ramya Namuduri @ramyanamuduri.bsky.social · Apr 21

Have that eerie feeling of déjà vu when reading model-generated text 👀, but can’t pinpoint the specific words or phrases 👀?

✨We introduce QUDsim, to quantify discourse similarities beyond lexical, syntactic, and content overlap.

April 21, 2025 at 10:10 PM

Reposted by Prasann Singhal

Greg Durrett

@gregdnlp.bsky.social

Check out Manya's work on evaluation for open-ended tasks! The criteria from EvalAgent can be plugged into LLM-as-a-judge or used for refinement. Great tool with a ton of potential, and there's LOTS to do here for making LLMs better at writing!

Manya Wadhwa @manyawadhwa.bsky.social · Apr 22

Evaluating language model responses on open-ended tasks is hard! 🤔

We introduce EvalAgent, a framework that identifies nuanced and diverse criteria 📋✍️.

EvalAgent identifies 👩‍🏫🎓 expert advice on the web that implicitly address the user’s prompt 🧵👇

April 22, 2025 at 4:30 PM

Reposted by Prasann Singhal

Anirudh Khatry

@anirudhkhatry.bsky.social

🚀Meet CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]

April 23, 2025 at 5:00 PM

Reposted by Prasann Singhal

Greg Durrett

@gregdnlp.bsky.social

Check out Anirudh's work on a new benchmark for C-to-Rust transpilation! 100 realistic-scale C projects, plus target Rust interfaces + Rust tests that let us validate the transpiled code beyond what prior benchmarks allow.

Anirudh Khatry @anirudhkhatry.bsky.social · Apr 23

🚀Meet CRUST-Bench, a dataset for C-to-Rust transpilation for full codebases 🛠️
A dataset of 100 real-world C repositories across various domains, each paired with:
🦀 Handwritten safe Rust interfaces.
🧪 Rust test cases to validate correctness.
🧵[1/6]

April 23, 2025 at 6:37 PM

Reposted by Prasann Singhal

Jessy Li

@jessyjli.bsky.social

We got an 🥂 Outstanding Paper Award!! Cannot be more grateful 🥹 This is super validating for our long pursuit of computational work on QUD.

Congrats to the amazing @yatingwu.bsky.social, Ritika Mangla, Alex Dimakis, @gregdnlp.bsky.social

Jessy Li @jessyjli.bsky.social · Nov 13

Wednesday at #EMNLP: @yatingwu.bsky.social will present our work connecting curiosity and questions in discourse. We built strong models to predict salience, outperforming large LLMs.

👉[Oral] Discourse+Phonology+Syntax2 10:30-12:00 @ Flagler

also w/ Ritika Mangla @gregdnlp.bsky.social Alex Dimakis

November 15, 2024 at 1:12 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news