Rishub Jain
shubadubadub.bsky.social
Rishub Jain
@shubadubadub.bsky.social
Works at Google DeepMind on Safe+Ethical AI
This was all based on work by the Google DeepMind Rater Assist team, the absolute best team ever 🙂
December 24, 2024 at 12:01 AM
We achieved human-AI complementarity by combining Hybridization and Rater Assistance, but continuous research is needed as the future of rating changes. Making progress in this space will require cross-disciplinary work. Let’s build these collaborations now! If you’re interested, please reach out.
December 24, 2024 at 12:01 AM
Importantly, the best type of rater assistance depends a lot on how much raters over-rely on the assistant. Just showing directly quoted evidence helps more than showing this alongside the AI’s reasoning, judgments, and confidence, in our slice of data where humans > AI.
December 24, 2024 at 12:01 AM
Hybridization can also enable impactful Rater Assistance. Prior HCI work has shown that achieving complementarity can be hard in settings where AI > Humans. Our hybridization identifies a slice of data where humans > AI. Here, rater assistance helps!
December 24, 2024 at 12:01 AM
Combining judgements from human raters and AI raters working in isolation, called Hybridization, can be a useful technique to achieve complementarity.

We’ve found confidence-based hybridization (using AI ratings when it's confident, and human ratings otherwise) achieves complementarity!
December 24, 2024 at 12:01 AM
Achieving complementarity can be quite hard! A key issue is over-reliance: how do we get humans to appropriately use AI, and not just default to its outputs? And, this problem gets worse in settings where AI > Humans. But there is hope!
December 24, 2024 at 12:01 AM
Rater Assistance is not so useful if the combined Human-AI team doesn’t outperform humans or AI alone. This restated goal can be referred to as trying to achieve Human-AI Complementarity. Fundamentally, this is a Human-Computer Interaction (HCI) problem!
December 24, 2024 at 12:01 AM
This is the field of Amplified Oversight (a subfield of Scalable Oversight). Much of the past work in this field, such as critiques, debate, and iterative amplification, has focused on Rater Assistance - assisting and enabling human raters to properly evaluate AI outputs.
December 24, 2024 at 12:01 AM
As AI is able to perform increasingly challenging tasks, how do we make sure we’re able to properly evaluate its outputs so that we can accurately align the model to human values via e.g. RLHF? Relying on humans alone for this will be hard on tasks such as summarizing 1M pages.
December 24, 2024 at 12:01 AM
Read our blog for the full details deepmindsafetyresearch.medium.com/human-ai-com...

Here’s a quick summary:
December 24, 2024 at 12:01 AM