Gabriel Chua
banner
gabrielchua.bsky.social
Gabriel Chua
@gabrielchua.bsky.social
Machine Learning at GovTech

gabrielchua.me
transfer learning went wrong
November 27, 2024 at 12:58 AM
tldr: we propose a flexible, data-free approach to build better guardrails for LLMs, especially in pre-production. It's easy to get started

this work is part of a broader suite on work on responsible ai here at GovTech - would love to chat if you're in this space.
November 27, 2024 at 12:57 AM
We’ve open-sourced our 2 classifiers & the dataset (almost 50M tokens)

These classifier are:
- fast ⚡
- accurate & give well-calibrated probabilities ⚖️ (so that we can have differentiated responses)
- zero-shot 🔎 (i.e., teams can use this out of the box)

huggingface.co/collections/...
November 27, 2024 at 12:57 AM
This approach works surprisingly well, and we apply it to the "off-topic" prompt detection.

The goal is to classify whether a user-prompt is irrelevant with respect to the system prompt. 🎯
November 27, 2024 at 12:57 AM
Here, we explore a data-free guardrail development methodology leveraging LLMs to guard LLMs.
November 27, 2024 at 12:57 AM
Common approaches rely on curated examples or custom classifiers. The problem?
⚠️ High false-positive rates
⚠️ Poor adaptability to new misuse types
⚠️ Require real-world data, which is often unavailable during pre-production
November 27, 2024 at 12:57 AM