Lightnews — Scholar-powered news

Reposted by Mor Geva

Yoav Gur Arieh

@yoav.ml

🧠 To reason over text and track entities, we find that language models use three types of 'pointers'!

They were thought to rely only on a positional one—but when many entities appear, that system breaks down.

Our new paper shows what these pointers are and how they interact 👇

October 8, 2025 at 2:56 PM

Reposted by Mor Geva

Sohee Yang

@soheeyang.bsky.social

🚨 New Paper 🚨
How effectively do reasoning models reevaluate their thought? We find that:
- Models excel at identifying unhelpful thoughts but struggle to recover from them
- Smaller models can be more robust
- Self-reevaluation ability is far from true meta-cognitive awareness
1/N 🧵

June 13, 2025 at 4:15 PM

Reposted by Mor Geva

Yoav Gur Arieh

@yoav.ml

New Paper Alert! Can we precisely erase conceptual knowledge from LLM parameters?
Most methods are shallow, coarse, or overreach, adversely affecting related or general knowledge.

We introduce🪝𝐏𝐈𝐒𝐂𝐄𝐒 — a general framework for Precise In-parameter Concept EraSure. 🧵 1/

May 29, 2025 at 4:22 PM

Reposted by Mor Geva

Marius Mosbach

@mariusmosbach.bsky.social

Checkout Benno's notes about our impact of interpretability paper 👇.

Also, we are organizing a workshop at #ICML2025 which is inspired by some of the questions discussed in the paper: actionable-interpretability.github.io

April 15, 2025 at 11:11 PM

Reposted by Mor Geva

Sarah Wiegreffe

@sarah-nlp.bsky.social

Have work on the actionable impact of interpretability findings? Consider submitting to our Actionable Interpretability workshop at ICML! See below for more info.

Website: actionable-interpretability.github.io
Deadline: May 9

Mor Geva @megamor2.bsky.social · Mar 31

🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io

@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social

Paper submission deadline: May 9th!

April 3, 2025 at 5:58 PM

Mor Geva

@megamor2.bsky.social

🎉 Our Actionable Interpretability workshop has been accepted to #ICML2025! 🎉
> Follow @actinterp.bsky.social
> Website actionable-interpretability.github.io

@talhaklay.bsky.social @anja.re @mariusmosbach.bsky.social @sarah-nlp.bsky.social @iftenney.bsky.social

Paper submission deadline: May 9th!

March 31, 2025 at 4:59 PM

Mor Geva

@megamor2.bsky.social

📣 📣 Looking for ethics reviewers for COLM 2025!
Please sign up and share the form below 👇
forms.gle/3a52jbDNB9bd...

COLM 2025 Ethics Reviewer Sign Up

Ethics reviewing of papers for COLM 2025 starts in May. We will share more details later. In the meantime, please sign up.

forms.gle

February 24, 2025 at 2:02 PM

Mor Geva

@megamor2.bsky.social

Communication between LLM agents can be super noisy! One rogue agent can easily drag the whole system into failure 😱

We find that (1) it's possible to detect rogue agents early on
(2) interventions can boost system performance by up to 20%!

Thread with details and paper link below!

ohav.bsky.social @ohav.bsky.social · Feb 13

"One bad apple can spoil the bunch 🍎", and that's doubly true for language agents!
Our new paper shows how monitoring and intervention can prevent agents from going rogue, boosting performance by up to 20%. We're also releasing a new multi-agent environment 🕵️‍♂️

February 13, 2025 at 2:30 PM

Mor Geva

@megamor2.bsky.social

How can we interpret LLM features at scale? 🤔

Current pipelines use activating inputs, which is costly and ignores how features causally affect model outputs!
We propose efficient output-centric methods that better predict the steering effect of a feature.

New preprint led by @yoav.ml 🧵1/

January 28, 2025 at 7:34 PM

Reposted by Mor Geva

Fazl Barez

@fbarez.bsky.social

🚨 New Paper Alert: Open Problem in Machine Unlearning for AI Safety 🚨

Can AI truly "forget"? While unlearning promises data removal, controlling emergent capabilities is a inherent challenge. Here's why it matters: 👇

Paper: arxiv.org/pdf/2501.04952
1/8

January 10, 2025 at 4:58 PM

Mor Geva

@megamor2.bsky.social

What's in an attention head? 🤯

We present an efficient framework – MAPS – for inferring the functionality of attention heads in LLMs ✨directly from their parameters✨

A new preprint with Amit Elhelo 🧵 (1/10)

December 18, 2024 at 5:55 PM

Reposted by Mor Geva

ACL

@aclmeeting.bsky.social

We invite nominations to join the ACL2025 PC as reviewer or area chair(AC). Review process through ARR Feb cycle. Tentative timeline: Review 1-20 Mar 2025, Rebuttal is 26-31 Mar 2025. ACs must be available throughout the Feb cycle. Nominations by 20 Dec 2024:
shorturl.at/TaUh9 #NLProc #ACL2025NLP

Volunteer to join ACL 2025 Programme Committee

Use this form to express your interest in joining the ACL 2025 programme committee as a reviewer or area chair (AC). The review period is 1st to 20th of March 2025. ACs need to be available for variou...

forms.gle

December 16, 2024 at 12:28 AM

Mor Geva

@megamor2.bsky.social

📣📣 Wanna be an Area Chair or a Reviewer for @aclmeeting.bsky.social or know someone who would?

Nominations and self-nominations go here 👇

docs.google.com/forms/d/e/1F...

Volunteer to join ACL 2025 Programme Committee

Use this form to express your interest in joining the ACL 2025 programme committee as a reviewer or area chair (AC). The review period is 1st to 20th of March 2025. ACs need to be available for variou...

docs.google.com

December 6, 2024 at 6:01 AM

Reposted by Mor Geva

Yoav Artzi

@yoavartzi.com

I am seriously behind uploading Learning Machines videos, but I did want to get @jonathanberant.bsky.social's out sooner than later. It's not only a great talk, it also gives a remarkably broad overview and contextualization, so it's an excellent way to ramp up on post-training
youtu.be/2AthqCX3h8U

Jonathan Berant (Tel Aviv University / Google) / Towards Robust Language Model Post-training

YouTube video by Yoav Artzi

youtu.be

December 2, 2024 at 3:45 AM

Reposted by Mor Geva

Max Bartolo

@maxbartolo.bsky.social

Sparks of multi-hop reasoning ✨

Sohee Yang @soheeyang.bsky.social · Nov 27

🚨 New Paper 🚨
Can LLMs perform latent multi-hop reasoning without exploiting shortcuts? We find the answer is yes – they can recall and compose facts not seen together in training or guessing the answer, but success greatly depends on the type of the bridge entity (80% for country, 6% for year)! 1/N

November 29, 2024 at 9:41 AM

Mor Geva

@megamor2.bsky.social

Post a photo of yourself from a different era

November 28, 2024 at 5:38 PM

Reposted by Mor Geva

Sohee Yang

@soheeyang.bsky.social

🚨 New Paper 🚨
Can LLMs perform latent multi-hop reasoning without exploiting shortcuts? We find the answer is yes – they can recall and compose facts not seen together in training or guessing the answer, but success greatly depends on the type of the bridge entity (80% for country, 6% for year)! 1/N

November 27, 2024 at 5:26 PM

Reposted by Mor Geva

Geoffrey Irving

@girving.bsky.social

It is promising to use natural language latent reasoning to interpret LLMs. But we need confidence that the latent reasoning is interpretable and faithful, such that human oversight of the reasoning trace counts as meaningful supervision.

Here are 2 reasons this may be hard. 🧵

November 24, 2024 at 7:00 PM

Reposted by Mor Geva

Lucy Li

@lucy3.bsky.social

mech interp: bsky.app/starter-pack...
women in nlp: bsky.app/starter-pack...
nlp #1: bsky.app/starter-pack...
nlp #2: bsky.app/starter-pack...
ml/data/tech: bsky.app/starter-pack...
robotics & ai: bsky.app/starter-pack...

November 19, 2024 at 7:23 PM

Mor Geva

@megamor2.bsky.social

First post here, let's see how it goes 🦋

Looking for an emergency reviewer 🚨🚨
For an ARR submission about tool-usage in LLMs, should be submitted within the next 30 hours.
If you have reviewed before for ARR/*CL conferences before and interested, please DM me 🙏 #NLProc #NLP

November 20, 2024 at 5:31 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news