Lightnews — Scholar-powered news

jimrandomh.bsky.social

@jimrandomh.bsky.social

The last time I checked in, the most promising technique we had was Sparse Autoencoders (www.lesswrong.com/tag/sparse-a...). This is very much on the "kinda-sorta working" side, not actually-working.

Sparse Autoencoders (SAEs) - LessWrong

Sparse Autoencoders (SAEs) are an unsupervised technique for decomposing the activations of a neural network into a sum of interpretable components (often referred to as features). Sparse Autoencoders...

www.lesswrong.com

January 21, 2025 at 1:55 AM

jimrandomh.bsky.social

@jimrandomh.bsky.social

In theory, if we had neural-net interpretability that fully worked, as opposed to kinda-sorta working, this would be resolve many of the hard parts of AI alignment, and it would then be safe to go ahead and build God.

January 21, 2025 at 1:55 AM

jimrandomh.bsky.social

@jimrandomh.bsky.social

You can convert a neural network to a smaller neural network (or a program), but not losslessly. This is a pretty active area of research within Mechanistic Interpretability, because ideally the simplified network will be more amenable to reverse-engineering.

Sparse Autoencoders (SAEs) - LessWrong

Sparse Autoencoders (SAEs) are an unsupervised technique for decomposing the activations of a neural network into a sum of interpretable components (often referred to as features). Sparse Autoencoders...

www.lesswrong.com

January 21, 2025 at 1:55 AM

jimrandomh.bsky.social

@jimrandomh.bsky.social

That's not an American you were talking to, that's a Belgian. Or possibly a Russian troll pretending to be a Belgian; it's hard to tell, but a keyword-search for posts he's made with the keyword "Ukraine" are not inconsistent with that hypothesis.

December 16, 2024 at 6:55 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news