William Jurayj
williamjurayj.bsky.social
William Jurayj
@williamjurayj.bsky.social
PhD student at Johns Hopkins CLSP (@jhuclsp.bsky.social).
Researching natural and formal language processing.

williamjurayj.com
and here I was thinking you were out at the Opera 🤯
March 1, 2025 at 11:38 PM
It's been a joy working with @jeff-cheng.bsky.social & Ben Van Durme on this project. And huge thanks to @alexmartin314.bsky.social, @miriamsw.bsky.social, @marcmarone.com, @orionweller.bsky.social, and everyone else who gave very helpful feedback over the past weeks.
February 20, 2025 at 3:14 PM
To our knowledge this is the first work to raise this point in the new area of LLM test-time scaling, but the community has been aware of this for a long time. E.g., the Watson effort on Jeopardy, and a push by Jordan Boyd-Graber to reward systems that hold back dubious answers.
February 20, 2025 at 3:14 PM
We propose the standard evaluation format of “Jeopardy odds”: win a point when you’re right, lose a point when you’re wrong. Here we see compute scaling distinctions that were hidden when evaluating under a zero-risk setting. Selection functions matter!
February 20, 2025 at 3:14 PM
We test DeepSeek-R1 and find that scaling test-time compute can substantially increase a model’s confidence in correct answers, drawing a wider gap between correct and incorrect answers.
February 20, 2025 at 3:14 PM
You might look into behavior cloning agents, which is a pretty robust space (e.g. arxiv.org/abs/2209.05451)

I could be misunderstanding what you're looking for though, since this feels very different from the CogAI/SOAR items you point to.
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Transformers have revolutionized vision and natural language processing with their ability to scale with large datasets. But in robotic manipulation, data is both limited and expensive. Can manipulati...
arxiv.org
December 31, 2024 at 12:54 AM
In many ways, the Vision Pro hits on both categories.
November 27, 2024 at 6:00 PM
At this point, I would probably buy a cellular phone that they made
November 27, 2024 at 5:54 PM
I think 17th century English were more likely to be enjoying Tea than Coffee
November 27, 2024 at 5:52 PM
👋
November 25, 2024 at 4:24 PM
Did you recently visit an Apple store?
November 25, 2024 at 4:21 AM
I saw this happen live, it was tragic
November 25, 2024 at 4:10 AM