priya22.github.io
arxiv.org/html/2506.03...
arxiv.org/html/2506.03...
Phase transitions! We love to see them during LM training. Syntactic attention structure, induction heads, grokking; they seem to suggest the model has learned a discrete, interpretable concept. Unfortunately, they’re pretty rare—or are they?
Phase transitions! We love to see them during LM training. Syntactic attention structure, induction heads, grokking; they seem to suggest the model has learned a discrete, interpretable concept. Unfortunately, they’re pretty rare—or are they?
arxiv.org/abs/2503.14481
arxiv.org/abs/2503.14481
www.youtube.com/watch?v=AyH7...
www.youtube.com/watch?v=AyH7...
Again, thanks to my amazing advisors @jennhu.bsky.social and @kmahowald.bsky.social for their guidance and support! (8/8)
Again, thanks to my amazing advisors @jennhu.bsky.social and @kmahowald.bsky.social for their guidance and support! (8/8)
Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model?
We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness.
arxiv.org/abs/2502.14829
Ever wonder whether verbalized CoTs correspond to the internal reasoning process of the model?
We propose a novel parametric faithfulness approach, which erases information contained in CoT steps from the model parameters to assess CoT faithfulness.
arxiv.org/abs/2502.14829
"LLMs are trained on texts, not truths. Each text bears traces of its context, including its genre … audience and the history and local politics of its place of origin. A correct sentence in one [context] might be … nonsensical in another." Excellent. We're getting there!
"LLMs are trained on texts, not truths. Each text bears traces of its context, including its genre … audience and the history and local politics of its place of origin. A correct sentence in one [context] might be … nonsensical in another." Excellent. We're getting there!
www.let-all.com/blog/2025/03...
www.let-all.com/blog/2025/03...
If you're at a university, you can get all the Hugging Face PRO features at a discount. 🤗
Link here: huggingface.co/docs/hub/aca... 1/
If you're at a university, you can get all the Hugging Face PRO features at a discount. 🤗
Link here: huggingface.co/docs/hub/aca... 1/
Preprint: arxiv.org/abs/2405.05966
Preprint: arxiv.org/abs/2405.05966
When does CoT help? It turns out that gains are mainly on math and symbolic reasoning.
Check out our paper for a deep dive into MMLU, hundreds of experiments, and a meta-analysis of CoT across 3 conferences covering over 100 papers! arxiv.org/abs/2409.12183
When does CoT help? It turns out that gains are mainly on math and symbolic reasoning.
Check out our paper for a deep dive into MMLU, hundreds of experiments, and a meta-analysis of CoT across 3 conferences covering over 100 papers! arxiv.org/abs/2409.12183
🔗 www.urban.org/sites/defaul...
#dataviz
🔗 www.urban.org/sites/defaul...
#dataviz
LLMs are trained on just one tokenization per word, but they still understand alternative tokenizations. We show that this can be exploited to bypass safety filters without changing the text itself.
#AI #LLMs #tokenization #alignment
LLMs are trained on just one tokenization per word, but they still understand alternative tokenizations. We show that this can be exploited to bypass safety filters without changing the text itself.
#AI #LLMs #tokenization #alignment
epoch.ai/gradient-upd...
epoch.ai/gradient-upd...
arxiv.org/abs/2502.19190
arxiv.org/abs/2502.19190
This also moves forward AI *trustworthiness* and provides another form of AI *transparency*. More later!
This also moves forward AI *trustworthiness* and provides another form of AI *transparency*. More later!