Martin Gubri
banner
mgubri.bsky.social
Martin Gubri
@mgubri.bsky.social
Research Lead @parameterlab.bsky.social working on Trustworthy AI
Speaking 🇫🇷, English and 🇨🇱 Spanish | Living in Tübingen 🇩🇪 | he/him

https://gubri.eu
🪩 New paper out!

Evaluating large models on benchmarks like MMLU is expensive. DISCO cuts costs by up to 99% while still predicting well performance.

🔍 The trick: use a small subset of samples where models disagree the most. These are the most informative.

Join the dance party below 👇
October 13, 2025 at 9:29 AM
There are more details in Appendix A.
July 21, 2025 at 10:27 PM
This NVIDIA position paper has a clear definition of an SLM: arxiv.org/abs/2506.02153
They consider <10B.
Personally, I would not consider 13B models to be SLMs (not even 7B). They require quite a lot of resources without using aggressive efficient inference techniques (like 4 bits quantization).
July 21, 2025 at 10:24 PM
The mood on a Friday evening
May 16, 2025 at 3:56 PM
📄 Excited to share our latest paper on the scale required for successful membership inference in LLMs! We investigate a continuum from single sentences to large document collections. Huge thanks to an incredible team: Haritz Puerto, @coallaoh.bsky.social and @oodgnas.bsky.social!
November 19, 2024 at 2:23 PM
🛡️Nevertheless, the third party can deploy the reference LLM with changes, so we explore the robustness of our identification:
- TRAP is robust to generation hyperparameters (usual ranges)
- TRAP is not robust to some system prompts
November 18, 2024 at 3:47 PM
TRAP beats the perplexity baseline using less output tokens (3-18 tokens vs. 150 tokens). And perplexity identification is sensitive to the type of prompt.
November 18, 2024 at 3:47 PM
It turns out that this suffix is specific to the reference model. So we can use it as a fingerprint.
- The suffix forces the ref LLM to output the target number 95-100% of the time
- The suffix is specific to the ref LLM (<1% average transfer rate to another LLM)
November 18, 2024 at 3:47 PM
In practice, we ask the LLM for a random number and try to force its answer using a suffix prompt. We first sample a random target number. Then we tune the suffix so the reference LLM output this specific number. We repurpose GCG originally designed for jailbreaking.
November 18, 2024 at 3:47 PM
☝️So, we need more advanced techniques, like model fingerprinting, to reliably identity an LLM identity.
🪤 That's why we propose TRAP (Targeted Random Adversarial Prompt).
TRAP uses adversarial prompt suffixes to reliably force a specific LLM to answer in a pre-defined way.
November 18, 2024 at 3:47 PM
🎭 Naive identity prompting, i.e., simply asking the model for its identity, does not work here❌
- Some LLMs self-identify incorrectly
- Some are correct, but we can disguise them! For example, it's easy to make GPT-4 self-identify as Anthropic's Claude or as Meta's Llama-2 :)
November 18, 2024 at 3:47 PM
🥷Our problem: does this application use my LLM?
An LLM (close or open) can be deployed silently by a third party to power an application. So, we propose BBIV to detect a reference LLM with:
▫️white-box access to the reference LLM
▪️black-box access to the unidentified LLM
November 18, 2024 at 3:47 PM
🌟 Pleased to join Bluesky! As a first post, allow me to share my latest first-author paper, TRAP 🪤, presented at #ACL24 (findings).

🦹💥 We explore how to detect if an LLM was stolen or leaked🤖💥
We showcase how to use adversarial prompt as #fingerprint for #LLM.
A thread 🧵
⬇️⬇️⬇️
November 18, 2024 at 3:47 PM