Lightnews — Scholar-powered news

Hillary Sanders

@meatlearner.bsky.social

Machine-learner, meat-learner, research scientist, AI Safety thinker. Model trainer, skeptical adorer of statistics.

Co-author of: Malware Data Science

Posts Replies Media Videos

Hillary Sanders

@meatlearner.bsky.social

I was surprised at how clear-cut and blatant it was. I mean, two times in a row, closed fingers, correct angle.

Meanwhile, Musk has recently issued public support for the far-right wing AfD party, often described as anti-semetic / extremist.

www.cnn.com/2024/12/20/m...

That + no apology...

Elon Musk endorses far-right German political party, wading deeper into global politics | CNN Business

Musk, the billionaire Trump ally who is playing a public role in the incoming administration, posted in support Friday of Alternative for Germany, or AfD, after the German government collapsed this we...

www.cnn.com

January 22, 2025 at 12:29 AM

Hillary Sanders

@meatlearner.bsky.social

Nice! Would love to be added (11 yrs in AI, co-author of Malware Data Science, love them NNs)

January 7, 2025 at 4:46 PM

Hillary Sanders

@meatlearner.bsky.social

Am I reading this right? Techniques to make the model safe again had almost no effect on non-small models :o.

December 3, 2024 at 9:46 PM

Hillary Sanders

@meatlearner.bsky.social

December 3, 2024 at 9:46 PM

Hillary Sanders

@meatlearner.bsky.social

December 3, 2024 at 9:11 PM

Hillary Sanders

@meatlearner.bsky.social

A response to X is going to be (usually) written by someone socially, politically near X's author, vs some other random piece of content Y.

It's extremely hard to take out sycophancy out of an LLM, trained the way we train them.

December 3, 2024 at 9:00 PM

Hillary Sanders

@meatlearner.bsky.social

Say a model learns strategy x to minimize training loss --> Later, min(test loss) involves strategy y, but the model regardless sticks with strat x (inner misalignment).

Assuming outer misalignment, x can be seen as safer than y.

That being said, the better the model, the less this will happen.

November 24, 2024 at 5:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news