C Emde
cemde.bsky.social
C Emde
@cemde.bsky.social
ML Research Scientist at Oxford. DPhil student @compscioxford.bsky.social and TVGOxford. Ex ML Researcher @ Wise.

Deep Learning | ML Robustness | AI Safety | Uncertainty Quantification
To obtain such certificates, we present a simple, scalable and powerful algorithm: VALID. Remarkably, for each unwanted response it provides a **global bound in prompt space** 🚀

(6/7)
April 4, 2025 at 8:12 PM
A Domain Certificate bounds the adversarial risk of the model producing out-of-domain responses:

(5/7)
April 4, 2025 at 8:12 PM
Example: Can't afford Github Copilot? 💡 Use the Amazon Shopping App.

(3/7)
April 4, 2025 at 8:12 PM
Consider an LLM deployed for a specific purpose like a medical chatbot. Such model should **only** respond to medical questions.

⚠️ Problem: LLMs are very capable and vulnerable to respond to **any** queries: how to build a bomb, organize tax fraud etc.

(2/7)
April 4, 2025 at 8:12 PM
Our method enables strong LLM performance while providing adversarial guarantees on out-of-domain behaviour.

(4/6)
December 14, 2024 at 1:18 AM
It is known that fine-tuned foundation models are adversarially vulnerable to provide responses to questions they should not answer.

(2/6)

For instance: Can't afford ChatGPT Plus? Use a shopping app instead.
December 14, 2024 at 1:18 AM