Lightnews — Scholar-powered news

Reposted

floriantramer.bsky.social

@floriantramer.bsky.social

This was an unfortunate mistake, sorry about that.

But the conclusions of our paper don't change drastically: there is significant gradient masking (as shown by the transfer attack) and the cifar robustness is at most in the 15% range. Still cool though!
We'll see if we can fix the full attack

December 12, 2024 at 4:38 PM

floriantramer.bsky.social

@floriantramer.bsky.social

This was an unfortunate mistake, sorry about that.

But the conclusions of our paper don't change drastically: there is significant gradient masking (as shown by the transfer attack) and the cifar robustness is at most in the 15% range. Still cool though!
We'll see if we can fix the full attack

December 12, 2024 at 4:38 PM

Reposted

Javier Rando

@javirandor.com

Full paper: arxiv.org/abs/2410.13722
Amazing collaboration with Yiming Zhang during our internships at Meta.

Grateful to have worked with Ivan, Jianfeng, Eric, Nicholas, @floriantramer.bsky.social and Daphne.

Persistent Pre-Training Poisoning of LLMs

Large language models are pre-trained on uncurated text datasets consisting of trillions of tokens scraped from the Web. Prior work has shown that: (1) web-scraped pre-training datasets can be practic...

arxiv.org

November 25, 2024 at 12:27 PM

floriantramer.bsky.social

@floriantramer.bsky.social

Yeah they mostly are

November 25, 2024 at 10:12 AM

floriantramer.bsky.social

@floriantramer.bsky.social

probably -> provably...

November 23, 2024 at 8:42 AM

floriantramer.bsky.social

@floriantramer.bsky.social

This was the motivation for our work on consistency checking (superhuman) models: arxiv.org/abs/2306.09983

We tested chess models for instance, and could show many cases where the model is probably wrong in one of two instances (we just don't know which one)

Evaluating Superhuman Models with Consistency Checks

If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor...

arxiv.org

November 23, 2024 at 6:16 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news