Lightnews — Scholar-powered news

Rishub Jain

@shubadubadub.bsky.social

This was all based on work by the Google DeepMind Rater Assist team, the absolute best team ever 🙂

December 24, 2024 at 12:01 AM

Rishub Jain

@shubadubadub.bsky.social

We achieved human-AI complementarity by combining Hybridization and Rater Assistance, but continuous research is needed as the future of rating changes. Making progress in this space will require cross-disciplinary work. Let’s build these collaborations now! If you’re interested, please reach out.

December 24, 2024 at 12:01 AM

Rishub Jain

@shubadubadub.bsky.social

Importantly, the best type of rater assistance depends a lot on how much raters over-rely on the assistant. Just showing directly quoted evidence helps more than showing this alongside the AI’s reasoning, judgments, and confidence, in our slice of data where humans > AI.

December 24, 2024 at 12:01 AM

Rishub Jain

@shubadubadub.bsky.social

Hybridization can also enable impactful Rater Assistance. Prior HCI work has shown that achieving complementarity can be hard in settings where AI > Humans. Our hybridization identifies a slice of data where humans > AI. Here, rater assistance helps!

December 24, 2024 at 12:01 AM

Rishub Jain

@shubadubadub.bsky.social

Combining judgements from human raters and AI raters working in isolation, called Hybridization, can be a useful technique to achieve complementarity.

We’ve found confidence-based hybridization (using AI ratings when it's confident, and human ratings otherwise) achieves complementarity!

December 24, 2024 at 12:01 AM

Rishub Jain

@shubadubadub.bsky.social

Achieving complementarity can be quite hard! A key issue is over-reliance: how do we get humans to appropriately use AI, and not just default to its outputs? And, this problem gets worse in settings where AI > Humans. But there is hope!

December 24, 2024 at 12:01 AM

Rishub Jain

@shubadubadub.bsky.social

Rater Assistance is not so useful if the combined Human-AI team doesn’t outperform humans or AI alone. This restated goal can be referred to as trying to achieve Human-AI Complementarity. Fundamentally, this is a Human-Computer Interaction (HCI) problem!

December 24, 2024 at 12:01 AM

Rishub Jain

@shubadubadub.bsky.social

This is the field of Amplified Oversight (a subfield of Scalable Oversight). Much of the past work in this field, such as critiques, debate, and iterative amplification, has focused on Rater Assistance - assisting and enabling human raters to properly evaluate AI outputs.

December 24, 2024 at 12:01 AM

Rishub Jain

@shubadubadub.bsky.social

As AI is able to perform increasingly challenging tasks, how do we make sure we’re able to properly evaluate its outputs so that we can accurately align the model to human values via e.g. RLHF? Relying on humans alone for this will be hard on tasks such as summarizing 1M pages.

December 24, 2024 at 12:01 AM

Rishub Jain

@shubadubadub.bsky.social

Read our blog for the full details deepmindsafetyresearch.medium.com/human-ai-com...

Here’s a quick summary:

December 24, 2024 at 12:01 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news