Lightnews — Scholar-powered news

Graham Neubig

@gneubig.bsky.social

810 followers 30 following 2 posts

Associate professor at CMU, studying natural language processing and machine learning. Co-founder All Hands AI

Posts Replies Media Videos

Graham Neubig

@gneubig.bsky.social

Where does one language model outperform the other?

We examine this from first principles, performing unsupervised discovery of "abilities" that one model has and the other does not.

Results show interesting differences between model classes, sizes and pre-/post-training.

Lindia Tjuatja @lindiatjuatja.bsky.social · Jun 9

When it comes to text prediction, where does one LM outperform another? If you've ever worked on LM evals, you know this question is a lot more complex than it seems. In our new #acl2025 paper, we developed a method to find fine-grained differences between LMs:

🧵1/9

June 9, 2025 at 6:33 PM

Reposted by Graham Neubig

Ramon Astudillo

@ramon-astudillo.bsky.social

Nice contribution to the understanding of Long CoT induction arxiv.org/abs/2502.03373 by Edward Yeo and colleagues (advised by @gneubig.bsky.social and @xiangyue96.bsky.social ). Its hard not to see this as mostly a negative result on induction on the 8B scale. 👇

Demystifying Long Chain-of-Thought Reasoning in LLMs

Scaling inference compute enhances reasoning in large language models (LLMs), with long chains-of-thought (CoTs) enabling strategies like backtracking and error correction. Reinforcement learning (RL)...

arxiv.org

February 8, 2025 at 7:29 PM

Reposted by Graham Neubig

Xuhui Zhou

@nlpxuhui.bsky.social

LLM agents can code—but can they ask clarifying questions? 🤖💬
Tired of coding agents wasting time and API credits, only to output broken code? What if they asked first instead of guessing? 🚀

(New work led by Sanidhya Vijay: www.linkedin.com/in/sanidhya-...)

February 19, 2025 at 7:46 PM

Graham Neubig

@gneubig.bsky.social

We are now done with all classes for CMU CS11-711 Advanced NLP!

Slides: phontron.com/class/anlp-f...
Videos: youtube.com/playlist?lis...

Hope this is useful to people 😀

Schedule

The weekly event schedule.

phontron.com

November 27, 2024 at 10:26 PM

Reposted by Graham Neubig

Akari Asai

@akariasai.bsky.social

1/ Introducing ᴏᴘᴇɴꜱᴄʜᴏʟᴀʀ: a retrieval-augmented LM to help scientists synthesize knowledge 📚
@uwnlp.bsky.social & Ai2
With open models & 45M-paper datastores, it outperforms proprietary systems & match human experts.
Try out our demo!
openscholar.allen.ai

November 19, 2024 at 4:30 PM

Reposted by Graham Neubig

Lindia Tjuatja

@lindiatjuatja.bsky.social

💬 Have you or a loved one compared LM probabilities to human linguistic acceptability judgments? You may be overcompensating for the effect of frequency and length!
🌟 In our new paper, we rethink how we should be controlling for these factors 🧵:

Screenshot of the paper title "What Goes Into a LM Acceptability Judgment? Rethinking the Impact of Frequency and Length"

November 20, 2024 at 6:08 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news