Lightnews — Scholar-powered news

Martin Gubri

@mgubri.bsky.social

110 followers 420 following 37 posts

Research Lead @parameterlab.bsky.social working on Trustworthy AI
Speaking 🇫🇷, English and 🇨🇱 Spanish | Living in Tübingen 🇩🇪 | he/him

https://gubri.eu

Posts Replies Media Videos

Martin Gubri

@mgubri.bsky.social

🪩 New paper out!

Evaluating large models on benchmarks like MMLU is expensive. DISCO cuts costs by up to 99% while still predicting well performance.

🔍 The trick: use a small subset of samples where models disagree the most. These are the most informative.

Join the dance party below 👇

October 13, 2025 at 9:29 AM

Martin Gubri

@mgubri.bsky.social

There are more details in Appendix A.

July 21, 2025 at 10:27 PM

Martin Gubri

@mgubri.bsky.social

This NVIDIA position paper has a clear definition of an SLM: arxiv.org/abs/2506.02153
They consider <10B.
Personally, I would not consider 13B models to be SLMs (not even 7B). They require quite a lot of resources without using aggressive efficient inference techniques (like 4 bits quantization).

July 21, 2025 at 10:24 PM

Martin Gubri

@mgubri.bsky.social

The mood on a Friday evening

Meme: 'EMNLP' crashing in 'The week-end after NeurIPS deadline'

May 16, 2025 at 3:56 PM

Martin Gubri

@mgubri.bsky.social

📄 Excited to share our latest paper on the scale required for successful membership inference in LLMs! We investigate a continuum from single sentences to large document collections. Huge thanks to an incredible team: Haritz Puerto, @coallaoh.bsky.social and @oodgnas.bsky.social!

November 19, 2024 at 2:23 PM

Martin Gubri

@mgubri.bsky.social

🛡️Nevertheless, the third party can deploy the reference LLM with changes, so we explore the robustness of our identification:
- TRAP is robust to generation hyperparameters (usual ranges)
- TRAP is not robust to some system prompts

November 18, 2024 at 3:47 PM

Martin Gubri

@mgubri.bsky.social

TRAP beats the perplexity baseline using less output tokens (3-18 tokens vs. 150 tokens). And perplexity identification is sensitive to the type of prompt.

November 18, 2024 at 3:47 PM

Martin Gubri

@mgubri.bsky.social

It turns out that this suffix is specific to the reference model. So we can use it as a fingerprint.
- The suffix forces the ref LLM to output the target number 95-100% of the time
- The suffix is specific to the ref LLM (<1% average transfer rate to another LLM)

November 18, 2024 at 3:47 PM

Martin Gubri

@mgubri.bsky.social

In practice, we ask the LLM for a random number and try to force its answer using a suffix prompt. We first sample a random target number. Then we tune the suffix so the reference LLM output this specific number. We repurpose GCG originally designed for jailbreaking.

November 18, 2024 at 3:47 PM

Martin Gubri

@mgubri.bsky.social

☝️So, we need more advanced techniques, like model fingerprinting, to reliably identity an LLM identity.
🪤 That's why we propose TRAP (Targeted Random Adversarial Prompt).
TRAP uses adversarial prompt suffixes to reliably force a specific LLM to answer in a pre-defined way.

November 18, 2024 at 3:47 PM

Martin Gubri

@mgubri.bsky.social

🎭 Naive identity prompting, i.e., simply asking the model for its identity, does not work here❌
- Some LLMs self-identify incorrectly
- Some are correct, but we can disguise them! For example, it's easy to make GPT-4 self-identify as Anthropic's Claude or as Meta's Llama-2 :)

November 18, 2024 at 3:47 PM

Martin Gubri

@mgubri.bsky.social

🥷Our problem: does this application use my LLM?
An LLM (close or open) can be deployed silently by a third party to power an application. So, we propose BBIV to detect a reference LLM with:
▫️white-box access to the reference LLM
▪️black-box access to the unidentified LLM

Does this third party app use our ref LLM?

November 18, 2024 at 3:47 PM

Martin Gubri

@mgubri.bsky.social

🌟 Pleased to join Bluesky! As a first post, allow me to share my latest first-author paper, TRAP 🪤, presented at #ACL24 (findings).

🦹💥 We explore how to detect if an LLM was stolen or leaked🤖💥
We showcase how to use adversarial prompt as #fingerprint for #LLM.
A thread 🧵
⬇️⬇️⬇️

November 18, 2024 at 3:47 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news