LightNews — Scholar-powered news

Reposted by Martin Gubri

Sam Rose

@samwho.dev

If you want to get up to speed on what all the benchmarks mean, I wrote a bunch of digests for the popular ones over on the ngrok blog. Designed for people that are interested but not enough to go read all the papers.

ngrok.com/blog/ai-benc...

What those AI benchmark numbers mean | ngrok blog

An explanation of 14 benchmarks you're likely to see when new models are released.

ngrok.com

February 5, 2026 at 8:06 PM

Martin Gubri

@mgubri.bsky.social

New paper out!🎉

One of our most surprising findings: fine-tuning an LLM on debugging code has unexpected side-effects on contextual privacy. The model learns from printing variables that internal state are ok to share, then generalises this to social situations🤯

A🧵below👇

February 3, 2026 at 5:11 PM

Martin Gubri

@mgubri.bsky.social

🎉Thrilled to share that both of my #ICLR2026 submissions were accepted (2/2)!

🪩 DISCO, Efficient Benchmarking: bsky.app/profile/arub...
🩺 Dr.LLM, Dynamic Layer Routing: www.linkedin.com/posts/ahmed-...

Huge thanks to my co-authors, especially first authors @arubique.bsky.social & Ahmed Heakl!

Alexander Rubinstein @arubique.bsky.social · Oct 10

🪩 Evaluate your LLMs on benchmarks like MMLU at 1% cost.

In our new paper, we show that outputs on a small subset of test samples that maximise diversity in model responses are predictive of the full dataset performance.

Project page: arubique.github.io/disco-site/

More below 🧵👇

January 28, 2026 at 1:48 PM

Martin Gubri

@mgubri.bsky.social

🧵 Many hidden gems about LLM benchmark contamination in the GAPERON paper!

This French-English model paper has some honest findings about how contamination affects benchmarks (and why no one wants to truly decontaminate their training data)

Thread 👇

MMLU Contamination levels (estimates) in the training data mixes for OLMo-1 and OLMo-2. Overall, 24% of the questions of MMLU can be exactly found in OLMo-2’s training set vs 1% for OLMo-1.

January 23, 2026 at 5:49 PM

Martin Gubri

@mgubri.bsky.social

Delighted to announce that 3.5 years after my first first-author paper was accepted at UAI 2022, I've been appointed Area Chair for UAI 2026! 😊
UAI was my first in-person conference right after COVID 1/2

December 19, 2025 at 9:48 AM

Martin Gubri

@mgubri.bsky.social

Our #EMNLP2025 paper Leaky Thoughts 🫗 shows that Large Reasoning Models (LRMs) can unintentionally leak sensitive information hidden in their internal thoughts.

📍 Come chat with Tommaso at our poster on Friday 7th, 10:30–12:00 in Hall C3
📄 aclanthology.org/2025.emnlp-m...

November 4, 2025 at 9:46 PM

Martin Gubri

@mgubri.bsky.social

🪩 New paper out!

Evaluating large models on benchmarks like MMLU is expensive. DISCO cuts costs by up to 99% while still predicting well performance.

🔍 The trick: use a small subset of samples where models disagree the most. These are the most informative.

Join the dance party below 👇

October 13, 2025 at 9:29 AM

Martin Gubri

@mgubri.bsky.social

🎉 Delighted to announce that our 🫗Leaky Thoughts paper about contextual privacy with reasoning models is accepted to #EMNLP main!
Huge congrats to the amazing team Tommaso Green, Haritz Puerto @coallaoh.bsky.social @oodgnas.bsky.social

Parameter Lab @parameterlab.bsky.social · Aug 21

🫗 An LLM's "private" reasoning may leak your sensitive data!

🎉 Excited to share our paper "Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers" was accepted at #EMNLP main!

1/2

Overall diagram about contextual privacy & LRMs

August 21, 2025 at 3:16 PM

Reposted by Martin Gubri

Elisabeth Bik

@elisabethbik.bsky.social

Fantastic new paper by @reeserichardson.bsky.social et al.

An enormous amount of work showing the extent of coordinated scientific fraud and involvement of some editors.
The number of fraudulent publications grows at a rate far outpacing that of legitimate science.
www.pnas.org/doi/10.1073/...

August 4, 2025 at 9:27 PM

Martin Gubri

@mgubri.bsky.social

📢 New paper out: Does SEO work for LLM-based conversational search?

We introduce C-SEO Bench, a benchmark to test if conversational SEO methods actually help.
Our finding? They don't. But traditional SEO still works because LLMs favour content already ranked higher in the prompt.

Parameter Lab @parameterlab.bsky.social · Jun 23

🔎Does Conversational SEO actually work? Our new benchmark has an answer!
Excited to announce our new paper: C-SEO Bench: Does Conversational SEO Work?

🌐 RTAI: researchtrend.ai/papers/2506....
📄 Paper: arxiv.org/abs/2506.11097
💻 Code: github.com/parameterlab...
📊 Data: huggingface.co/datasets/par...

June 23, 2025 at 4:41 PM

Martin Gubri

@mgubri.bsky.social

The mood on a Friday evening

Meme: 'EMNLP' crashing in 'The week-end after NeurIPS deadline'

May 16, 2025 at 3:56 PM

Reposted by Martin Gubri

Parameter Lab

@parameterlab.bsky.social

Excited to share that our paper "Scaling Up Membership Inference: When and How Attacks Succeed on LLMs" will be presented next week at #NAACL2025!
🖼️ Catch us at Poster Session 8 - APP: NLP Applications
🗓️ May 2, 11:00 AM - 12:30 PM
🗺️ Hall 3
Hope to see you there!

Martin Gubri @mgubri.bsky.social · Nov 19

📄 Excited to share our latest paper on the scale required for successful membership inference in LLMs! We investigate a continuum from single sentences to large document collections. Huge thanks to an incredible team: Haritz Puerto, @coallaoh.bsky.social and @oodgnas.bsky.social!

April 26, 2025 at 10:11 AM

Martin Gubri

@mgubri.bsky.social

A Bluesky filter to recommend only posts about papers from your followers. This is what I was missing to use Bluesky!

Nikhil Garg @nkgarg.bsky.social · Mar 10

*Please repost* @sjgreenwood.bsky.social and I just launched a new personalized feed (*please pin*) that we hope will become a "must use" for #academicsky. The feed shows posts about papers filtered by *your* follower network. It's become my default Bluesky experience bsky.app/profile/pape...

March 14, 2025 at 8:12 AM

Martin Gubri

@mgubri.bsky.social

I am pleased to announce that our paper on the scale of LLM membership inference from @parameterlab.bsky.social has been accepted for publication at #NAACL2025 as Findings!

Martin Gubri @mgubri.bsky.social · Nov 19

📄 Excited to share our latest paper on the scale required for successful membership inference in LLMs! We investigate a continuum from single sentences to large document collections. Huge thanks to an incredible team: Haritz Puerto, @coallaoh.bsky.social and @oodgnas.bsky.social!

January 23, 2025 at 2:04 PM

Reposted by Martin Gubri

Parameter Lab

@parameterlab.bsky.social

🎉We’re pleased to share the release of the models from our Apricot🍑 paper, accepted at ACL 2024!
At Parameter Lab, we believe openness and reproducibility are essential for advancing science, and we've put in our best effort to ensure it.
🤗 huggingface.co/collections/...
🧵 bsky.app/profile/dnns...

November 20, 2024 at 11:55 PM

Martin Gubri

@mgubri.bsky.social

📄 Excited to share our latest paper on the scale required for successful membership inference in LLMs! We investigate a continuum from single sentences to large document collections. Huge thanks to an incredible team: Haritz Puerto, @coallaoh.bsky.social and @oodgnas.bsky.social!

November 19, 2024 at 2:23 PM

Martin Gubri

@mgubri.bsky.social

Have a look at the 🍑 Apricot paper that we presented at ACL earlier this year. This project was a wonderful collaboration with @dnnslmr.bsky.social!

Dennis Ulmer @EMNLP @dnnslmr.bsky.social · Mar 12

Obtaining calibrated confidence scores from LLMs is hard, especially for black-box models. So, can we maybe predict them directly from the generated text? 🤔 Internship work at
Parameter Lab with Martin Gubri, Sangdoo Yun, Hwaran Lee, Seong Joon Oh! arxiv.org/abs/2403.059... [1/6]

November 18, 2024 at 4:57 PM

Reposted by Martin Gubri

Joe Stacey

@joestacey.bsky.social

After going to NAACL, ACL and #EMNLP2024 this year, here are a few tips I’ve picked up about attending #NLP conferences.

Would love to hear any other tips if you have them!

This proved very popular on another (more evil) social media platform, so sharing here also 🙂

My 10 tips:

November 18, 2024 at 12:31 PM

Martin Gubri

@mgubri.bsky.social

🌟 Pleased to join Bluesky! As a first post, allow me to share my latest first-author paper, TRAP 🪤, presented at #ACL24 (findings).

🦹💥 We explore how to detect if an LLM was stolen or leaked🤖💥
We showcase how to use adversarial prompt as #fingerprint for #LLM.
A thread 🧵
⬇️⬇️⬇️

November 18, 2024 at 3:47 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news