Our new paper, 📎“Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations,” analyzes hundreds of evaluation reports and reveals major blind spots ‼️🧵 (1/7)
Our new paper, 📎“Who Evaluates AI’s Social Impacts? Mapping Coverage and Gaps in First and Third Party Evaluations,” analyzes hundreds of evaluation reports and reveals major blind spots ‼️🧵 (1/7)
openai.com/index/buildi...
openai.com/index/buildi...
more than two-thirds is from the far-right.
more than two-thirds is from the far-right.
fortune.com/2025/08/14/w...
fortune.com/2025/08/14/w...
@eleutherai.bsky.social and the UK AISI joined forces to see what would happen, pretraining three 6.9B models for 500B tokens and producing 15 total models to study
@eleutherai.bsky.social and the UK AISI joined forces to see what would happen, pretraining three 6.9B models for 500B tokens and producing 15 total models to study
Philadelphians are trying to preserve or archive these sites before it could be too late.
so I made alerts for all my advisees and now I get an email when they have a paper out
maybe folks already do this and I'm late to the game but honestly those alerts feel great, esp when it's a long-gone advisee
so I made alerts for all my advisees and now I get an email when they have a paper out
maybe folks already do this and I'm late to the game but honestly those alerts feel great, esp when it's a long-gone advisee
This is also making me wonder about the list of models to hold the title "most powerful open source LLM in the world." GPT-2 > GPT-Neo > GPT-J > FairSeq Dense > GPT-NeoX-20B > MPT-7B > Falcon-40B > ??? > DeepSeek-R1
This is also making me wonder about the list of models to hold the title "most powerful open source LLM in the world." GPT-2 > GPT-Neo > GPT-J > FairSeq Dense > GPT-NeoX-20B > MPT-7B > Falcon-40B > ??? > DeepSeek-R1
Everyone loves causal interp. It’s coherently defined! It makes testable predictions about mechanistic interventions! But what if we had a different objective: predicting model behavior not under mechanistic interventions, but on unseen input data?
Our first talk is by @catherinearnett.bsky.social on tokenizers, their limitations, and how to improve them.
>>
Scientists scramble to save threatened federal research databases pubs.aip.org/physicstoday...
Scientists scramble to save threatened federal research databases pubs.aip.org/physicstoday...
I'm really surprised I can't find any papers that dig into this; it's usually a side comment. Do you know any?
I'm really surprised I can't find any papers that dig into this; it's usually a side comment. Do you know any?