C Emde
cemde.bsky.social
C Emde
@cemde.bsky.social
ML Research Scientist at Oxford. DPhil student @compscioxford.bsky.social and TVGOxford. Ex ML Researcher @ Wise.

Deep Learning | ML Robustness | AI Safety | Uncertainty Quantification
Read more: cemde.github.io/Domain-Certi...

Thanks to my amazing collaborators:
- @alasdair-p.bsky.social, Preetham Arvind, @maximek3.bsky.social, Tom Rainforth, @philiptorr.bsky.social, @adelbibi.bsky.social at @ox.ac.uk
- Bernard Ghanem at KAUST
- Thomas Lukasiewicz at @tuwien.at.

(7/7)
Shh, don't say that! Domain Certification in LLMs
Domain Certification - A novel framework providing provable, adversarial defenses for LLMs safety.
cemde.github.io
April 4, 2025 at 8:12 PM
To obtain such certificates, we present a simple, scalable and powerful algorithm: VALID. Remarkably, for each unwanted response it provides a **global bound in prompt space** 🚀

(6/7)
April 4, 2025 at 8:12 PM
A Domain Certificate bounds the adversarial risk of the model producing out-of-domain responses:

(5/7)
April 4, 2025 at 8:12 PM
We are tired of the cat 🐈 and mouse 🐁 game of attacks and defenses. Hence, we propose :
- **Domain Certification:** a framework for adversarial certification of LLMs.
- **VALID:** a simple, scalable and effective test-time algorithm.

(4/7)
April 4, 2025 at 8:12 PM
Example: Can't afford Github Copilot? 💡 Use the Amazon Shopping App.

(3/7)
April 4, 2025 at 8:12 PM
Consider an LLM deployed for a specific purpose like a medical chatbot. Such model should **only** respond to medical questions.

⚠️ Problem: LLMs are very capable and vulnerable to respond to **any** queries: how to build a bomb, organize tax fraud etc.

(2/7)
April 4, 2025 at 8:12 PM
The amazing collaborators: Preetham Arvind, @alasdair-p.bsky.social, Maxime Kayser, Tom Rainforth, Thomas Lukasiewicz, Philip Torr, Adel Bibi.

A @oxfordtvg.bsky.social production.

(6/6)

Link to paper:
openreview.net/forum?id=brD...
Shh, don't say that! Domain Certification in LLMs
Foundation language models, such as LLama, are often deployed in constrained environments. For instance, a customer support bot may utilize a large language model (LLM) as its backbone due to the...
openreview.net
December 14, 2024 at 1:18 AM
Interested? Want to learn more?

Join us at the SoLaR workshop tomorrow.
- 🕚 When: Tomorrow, 14 Dec, from 11pm to 13pm.
- 🗺️ Where: West meeting rooms 121 and 122 here in Vancouver.

(5/6)
December 14, 2024 at 1:18 AM
Our method enables strong LLM performance while providing adversarial guarantees on out-of-domain behaviour.

(4/6)
December 14, 2024 at 1:18 AM
We are tired of the 🐈 and 🐁 game of attacks and defenses. Hence, we propose:

- **Domain Certification:** a framework for adversarial certification of LLMs.
- **VALID:** a simple, scalable and efficient test-time algorithm.

(3/6)
December 14, 2024 at 1:18 AM
It is known that fine-tuned foundation models are adversarially vulnerable to provide responses to questions they should not answer.

(2/6)

For instance: Can't afford ChatGPT Plus? Use a shopping app instead.
December 14, 2024 at 1:18 AM
Great work! You might find our SoLaR paper interesting: We propose a certification framework for LLM systems to stay on-topic and not respond to such questions: openreview.net/pdf?id=brDLU...
openreview.net
December 6, 2024 at 7:23 PM