Lightnews — Scholar-powered news

C Emde

@cemde.bsky.social

ML Research Scientist at Oxford. DPhil student @compscioxford.bsky.social and TVGOxford. Ex ML Researcher @ Wise.

Deep Learning | ML Robustness | AI Safety | Uncertainty Quantification

Posts Replies Media Videos

C Emde

@cemde.bsky.social

Read more: cemde.github.io/Domain-Certi...

Thanks to my amazing collaborators:
- @alasdair-p.bsky.social, Preetham Arvind, @maximek3.bsky.social, Tom Rainforth, @philiptorr.bsky.social, @adelbibi.bsky.social at @ox.ac.uk
- Bernard Ghanem at KAUST
- Thomas Lukasiewicz at @tuwien.at.

(7/7)

Shh, don't say that! Domain Certification in LLMs

Domain Certification - A novel framework providing provable, adversarial defenses for LLMs safety.

cemde.github.io

April 4, 2025 at 8:12 PM

C Emde

@cemde.bsky.social

To obtain such certificates, we present a simple, scalable and powerful algorithm: VALID. Remarkably, for each unwanted response it provides a **global bound in prompt space** 🚀

(6/7)

April 4, 2025 at 8:12 PM

C Emde

@cemde.bsky.social

A Domain Certificate bounds the adversarial risk of the model producing out-of-domain responses:

(5/7)

April 4, 2025 at 8:12 PM

C Emde

@cemde.bsky.social

We are tired of the cat 🐈 and mouse 🐁 game of attacks and defenses. Hence, we propose :
- **Domain Certification:** a framework for adversarial certification of LLMs.
- **VALID:** a simple, scalable and effective test-time algorithm.

(4/7)

April 4, 2025 at 8:12 PM

C Emde

@cemde.bsky.social

Example: Can't afford Github Copilot? 💡 Use the Amazon Shopping App.

(3/7)

April 4, 2025 at 8:12 PM

C Emde

@cemde.bsky.social

Consider an LLM deployed for a specific purpose like a medical chatbot. Such model should **only** respond to medical questions.

⚠️ Problem: LLMs are very capable and vulnerable to respond to **any** queries: how to build a bomb, organize tax fraud etc.

(2/7)

April 4, 2025 at 8:12 PM

C Emde

@cemde.bsky.social

The amazing collaborators: Preetham Arvind, @alasdair-p.bsky.social, Maxime Kayser, Tom Rainforth, Thomas Lukasiewicz, Philip Torr, Adel Bibi.

A @oxfordtvg.bsky.social production.

(6/6)

Link to paper:
openreview.net/forum?id=brD...

Shh, don't say that! Domain Certification in LLMs

Foundation language models, such as LLama, are often deployed in constrained environments. For instance, a customer support bot may utilize a large language model (LLM) as its backbone due to the...

openreview.net

December 14, 2024 at 1:18 AM

C Emde

@cemde.bsky.social

Interested? Want to learn more?

Join us at the SoLaR workshop tomorrow.
- 🕚 When: Tomorrow, 14 Dec, from 11pm to 13pm.
- 🗺️ Where: West meeting rooms 121 and 122 here in Vancouver.

(5/6)

December 14, 2024 at 1:18 AM

C Emde

@cemde.bsky.social

Our method enables strong LLM performance while providing adversarial guarantees on out-of-domain behaviour.

(4/6)

December 14, 2024 at 1:18 AM

C Emde

@cemde.bsky.social

We are tired of the 🐈 and 🐁 game of attacks and defenses. Hence, we propose:

- **Domain Certification:** a framework for adversarial certification of LLMs.
- **VALID:** a simple, scalable and efficient test-time algorithm.

(3/6)

December 14, 2024 at 1:18 AM

C Emde

@cemde.bsky.social

It is known that fine-tuned foundation models are adversarially vulnerable to provide responses to questions they should not answer.

(2/6)

For instance: Can't afford ChatGPT Plus? Use a shopping app instead.

December 14, 2024 at 1:18 AM

C Emde

@cemde.bsky.social

Great work! You might find our SoLaR paper interesting: We propose a certification framework for LLM systems to stay on-topic and not respond to such questions: openreview.net/pdf?id=brDLU...

openreview.net

December 6, 2024 at 7:23 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news