Lightnews — Scholar-powered news

C Emde

@cemde.bsky.social

ML Research Scientist at Oxford. DPhil student @compscioxford.bsky.social and TVGOxford. Ex ML Researcher @ Wise.

Deep Learning | ML Robustness | AI Safety | Uncertainty Quantification

Posts Replies Media Videos

C Emde

@cemde.bsky.social

To obtain such certificates, we present a simple, scalable and powerful algorithm: VALID. Remarkably, for each unwanted response it provides a **global bound in prompt space** 🚀

(6/7)

April 4, 2025 at 8:12 PM

C Emde

@cemde.bsky.social

A Domain Certificate bounds the adversarial risk of the model producing out-of-domain responses:

(5/7)

April 4, 2025 at 8:12 PM

C Emde

@cemde.bsky.social

Example: Can't afford Github Copilot? 💡 Use the Amazon Shopping App.

(3/7)

April 4, 2025 at 8:12 PM

C Emde

@cemde.bsky.social

Consider an LLM deployed for a specific purpose like a medical chatbot. Such model should **only** respond to medical questions.

⚠️ Problem: LLMs are very capable and vulnerable to respond to **any** queries: how to build a bomb, organize tax fraud etc.

(2/7)

April 4, 2025 at 8:12 PM

C Emde

@cemde.bsky.social

Our method enables strong LLM performance while providing adversarial guarantees on out-of-domain behaviour.

(4/6)

December 14, 2024 at 1:18 AM

C Emde

@cemde.bsky.social

It is known that fine-tuned foundation models are adversarially vulnerable to provide responses to questions they should not answer.

(2/6)

For instance: Can't afford ChatGPT Plus? Use a shopping app instead.

December 14, 2024 at 1:18 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news