Speaking 🇫🇷, English and 🇨🇱 Spanish | Living in Tübingen 🇩🇪 | he/him
https://gubri.eu
Evaluating large models on benchmarks like MMLU is expensive. DISCO cuts costs by up to 99% while still predicting well performance.
🔍 The trick: use a small subset of samples where models disagree the most. These are the most informative.
Join the dance party below 👇
Evaluating large models on benchmarks like MMLU is expensive. DISCO cuts costs by up to 99% while still predicting well performance.
🔍 The trick: use a small subset of samples where models disagree the most. These are the most informative.
Join the dance party below 👇
They consider <10B.
Personally, I would not consider 13B models to be SLMs (not even 7B). They require quite a lot of resources without using aggressive efficient inference techniques (like 4 bits quantization).
They consider <10B.
Personally, I would not consider 13B models to be SLMs (not even 7B). They require quite a lot of resources without using aggressive efficient inference techniques (like 4 bits quantization).
- TRAP is robust to generation hyperparameters (usual ranges)
- TRAP is not robust to some system prompts
- TRAP is robust to generation hyperparameters (usual ranges)
- TRAP is not robust to some system prompts
- The suffix forces the ref LLM to output the target number 95-100% of the time
- The suffix is specific to the ref LLM (<1% average transfer rate to another LLM)
- The suffix forces the ref LLM to output the target number 95-100% of the time
- The suffix is specific to the ref LLM (<1% average transfer rate to another LLM)
🪤 That's why we propose TRAP (Targeted Random Adversarial Prompt).
TRAP uses adversarial prompt suffixes to reliably force a specific LLM to answer in a pre-defined way.
🪤 That's why we propose TRAP (Targeted Random Adversarial Prompt).
TRAP uses adversarial prompt suffixes to reliably force a specific LLM to answer in a pre-defined way.
- Some LLMs self-identify incorrectly
- Some are correct, but we can disguise them! For example, it's easy to make GPT-4 self-identify as Anthropic's Claude or as Meta's Llama-2 :)
- Some LLMs self-identify incorrectly
- Some are correct, but we can disguise them! For example, it's easy to make GPT-4 self-identify as Anthropic's Claude or as Meta's Llama-2 :)
An LLM (close or open) can be deployed silently by a third party to power an application. So, we propose BBIV to detect a reference LLM with:
▫️white-box access to the reference LLM
▪️black-box access to the unidentified LLM
An LLM (close or open) can be deployed silently by a third party to power an application. So, we propose BBIV to detect a reference LLM with:
▫️white-box access to the reference LLM
▪️black-box access to the unidentified LLM
🦹💥 We explore how to detect if an LLM was stolen or leaked🤖💥
We showcase how to use adversarial prompt as #fingerprint for #LLM.
A thread 🧵
⬇️⬇️⬇️
🦹💥 We explore how to detect if an LLM was stolen or leaked🤖💥
We showcase how to use adversarial prompt as #fingerprint for #LLM.
A thread 🧵
⬇️⬇️⬇️