GPT-4o and ChatGPT this morning, finally gave me the selfie with a bear I've always wanted simonwillison.net/2025/Mar/25/...
GPT-4o and ChatGPT this morning, finally gave me the selfie with a bear I've always wanted simonwillison.net/2025/Mar/25/...
No loving parent would ever write such a disgusting thing as this, and it speaks volumes.
No loving parent would ever write such a disgusting thing as this, and it speaks volumes.
LLMs use fixed strategies for all questions.
This is inefficient for complex reasoning. This paper introduces self-taught lookahead (STL). It improves value estimation in language models by learning ...
LLMs use fixed strategies for all questions.
This is inefficient for complex reasoning. This paper introduces self-taught lookahead (STL). It improves value estimation in language models by learning ...
The result is a big drop in accuracy for most models, though Reasoners (o3 & DeepSeek) hold up much better arxiv.org/pdf/2502.12896
The result is a big drop in accuracy for most models, though Reasoners (o3 & DeepSeek) hold up much better arxiv.org/pdf/2502.12896
Have we really thought what this will do to libraries and databases? What about those in analytical and research professions?
www.theverge.com/news/604902/...
Have we really thought what this will do to libraries and databases? What about those in analytical and research professions?
www.theverge.com/news/604902/...
Nice way to test when #AI can replace human evaluators & judges, it compares if #llms align better with group consensus than individual human evaluators do
#GPT-4 & #Gemini pass the test in 8/10 tasks, but struggle with deep contextual understanding […]
[Original post on mstdn.social]
Nice way to test when #AI can replace human evaluators & judges, it compares if #llms align better with group consensus than individual human evaluators do
#GPT-4 & #Gemini pass the test in 8/10 tasks, but struggle with deep contextual understanding […]
[Original post on mstdn.social]