We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
We introduce Oolong, a dataset of simple-to-verify information aggregation questions over long inputs. No model achieves >50% accuracy at 128K on Oolong!
🧵
One is: why is it that certain very small clusters of words are *clearly written by an LLM*? What is the quality of that writing?
One is: why is it that certain very small clusters of words are *clearly written by an LLM*? What is the quality of that writing?
www.wired.com/story/i-hate...
www.wired.com/story/i-hate...
This is how AI is killing translation work:
This is how AI is killing translation work:
www.thebookseller.com/news/ai-like...
www.thebookseller.com/news/ai-like...
We look at claims of "emergent capabilities" & "emergent intelligence" in LLMs from the perspective of what emergence means in complexity science.
arxiv.org/pdf/2506.11135
We look at claims of "emergent capabilities" & "emergent intelligence" in LLMs from the perspective of what emergence means in complexity science.
arxiv.org/pdf/2506.11135
iterative process of thinking, outlining, writing, and refining to ensure coherence and quality” — arxiv.org/abs/2506.04180
don’t know where to begin with this
iterative process of thinking, outlining, writing, and refining to ensure coherence and quality” — arxiv.org/abs/2506.04180
don’t know where to begin with this