My concern is we don't have answers to many of the open questions or limitations even with vanilla LLMs, so throwing a bunch of them together in more complex ways seems... dumb?
My concern is we don't have answers to many of the open questions or limitations even with vanilla LLMs, so throwing a bunch of them together in more complex ways seems... dumb?
What made o3 so much better than previous models on this benchmark?
anokas.substack.com/p/llms-strug...
What made o3 so much better than previous models on this benchmark?
anokas.substack.com/p/llms-strug...
firstmonday.org/ojs/index.ph...
firstmonday.org/ojs/index.ph...
'Moreover, as in humans, age is a key determinant of cognitive decline: “older” chatbots, like older patients, tend to perform worse on the MoCA test.'
"With the exception of ChatGPT 4o, almost all large language models subjected to the MoCA test showed signs of mild cognitive impairment."
www.bmj.com/content/387/...
'Moreover, as in humans, age is a key determinant of cognitive decline: “older” chatbots, like older patients, tend to perform worse on the MoCA test.'
knightcolumbia.org/blog/we-look...
(Cross-posted to AI Snake Oil aisnakeoil.com/p/we-looked-...)
knightcolumbia.org/blog/we-look...
(Cross-posted to AI Snake Oil aisnakeoil.com/p/we-looked-...)
Claude gave us Za'atar Roasted Cauliflower with Whipped Feta and ChatGPT gave us Stuffed Acorn Squash with Quinoa, Kale, and Goat Cheese. Which was better? 🧵
Claude gave us Za'atar Roasted Cauliflower with Whipped Feta and ChatGPT gave us Stuffed Acorn Squash with Quinoa, Kale, and Goat Cheese. Which was better? 🧵
The video is up now...
www.youtube.com/watch?v=YKMZ...
The video is up now...
www.youtube.com/watch?v=YKMZ...