Interested in language, reasoning, semantics and cognitive science. One day we'll have more efficient, interpretable and robust models!
Other interests: math, philosophy, cinema
https://www.juandiego-rodriguez.com/
🎯 We demonstrate that ranking-based discriminator training can significantly reduce this gap, and improvements on one task often generalize to others!
🧵👇
Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.
Yet they often assign higher probability to ungrammatical strings than to grammatical strings.
How can both things be true? 🧵👇
Language models (LMs) are remarkably good at generating novel well-formed sentences, leading to claims that they have mastered grammar.
Yet they often assign higher probability to ungrammatical strings than to grammatical strings.
How can both things be true? 🧵👇
🎥🔗 Livestream Link: aiscienceconference.caltech.edu
At 10:30am PST / 12:30pm CT, we’ll be awarding the Margot and Tom Pritzker Prize for AI in Science Research Excellence
🎥🔗 Livestream Link: aiscienceconference.caltech.edu
At 10:30am PST / 12:30pm CT, we’ll be awarding the Margot and Tom Pritzker Prize for AI in Science Research Excellence
www.reuters.com/investigatio...
www.reuters.com/investigatio...
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
Also delighted the ACL community continues to recognize unabashedly linguistic topics like filler-gaps... and the huge potential for LMs to inform such topics!
Also delighted the ACL community continues to recognize unabashedly linguistic topics like filler-gaps... and the huge potential for LMs to inform such topics!
www.liberalcurrents.com/deflating-hy...
www.liberalcurrents.com/deflating-hy...
We trained 3 models - 1.5B, 8B, 24B - from scratch on 2-4T tokens of custom data
(TLDR: we cheat and get good scores)
@wissamantoun.bsky.social @rachelbawden.bsky.social @bensagot.bsky.social @zehavoc.bsky.social
We trained 3 models - 1.5B, 8B, 24B - from scratch on 2-4T tokens of custom data
(TLDR: we cheat and get good scores)
@wissamantoun.bsky.social @rachelbawden.bsky.social @bensagot.bsky.social @zehavoc.bsky.social
It was all made up. The file was fine. There was no problem. WTF Claude
It was all made up. The file was fine. There was no problem. WTF Claude
I know white tech bros disagree as they continue to collapse our worlds.
news.sky.com/story/the-x-...
news.sky.com/story/the-x-...
In new work yesterday, @arnabsensharma.bsky.social et al identify a data type for *predicates*.
bsky.app/profile/arn...