Goodfire and Rakuten used sparse auto encoders (SAEs, a mech interp thing) to detect PII
an SAE is a secondary model trained on the primary LLM, they used a random forest. This 2nd model tells you what “features” were activated
www.goodfire.ai/research/rak...
Goodfire and Rakuten used sparse auto encoders (SAEs, a mech interp thing) to detect PII
an SAE is a secondary model trained on the primary LLM, they used a random forest. This 2nd model tells you what “features” were activated
www.goodfire.ai/research/rak...
🐸🐸🐸
🐸🐸🐸
a tiny 3B-A0.5B MoE OCR model that runs fast on a single A100 40GB with very high precision and excellent compression
why it’s cool — they use images as a way to compress text and get around the O(n^2)
huggingface.co/deepseek-ai/...
a tiny 3B-A0.5B MoE OCR model that runs fast on a single A100 40GB with very high precision and excellent compression
why it’s cool — they use images as a way to compress text and get around the O(n^2)
huggingface.co/deepseek-ai/...
1. (text) context gets longer as you add words
2. long context is quadratic
3. you can fit lots of words in an image
4. if you use encoder-decoder architecture, your tokens encode a ton of information
1. (text) context gets longer as you add words
2. long context is quadratic
3. you can fit lots of words in an image
4. if you use encoder-decoder architecture, your tokens encode a ton of information
If you ask it to “imagine hearing,” its representation becomes more like that of an auditory model.
3/9
If you ask it to “imagine hearing,” its representation becomes more like that of an auditory model.
3/9
Even if the images themselves are not directly stored inside the model itself, their data was directly used to generate the model.
It is, at the very least, derivative.
Even if the images themselves are not directly stored inside the model itself, their data was directly used to generate the model.
It is, at the very least, derivative.
notblackmagic.com/snippets/hd-...
Feedback is welcome!
In the future, with time, I want to test the encoders out and compare performance and quality.
notblackmagic.com/snippets/hd-...
Feedback is welcome!
In the future, with time, I want to test the encoders out and compare performance and quality.
late interaction models do embedding vector index queries and reranking at the same time leading to far higher accuracy
huggingface.co/NeuML/colber...
late interaction models do embedding vector index queries and reranking at the same time leading to far higher accuracy
huggingface.co/NeuML/colber...
A legitimate way for people in the US to watch Naruto.
/ 1
A legitimate way for people in the US to watch Naruto.
/ 1