Shiwali Mohan
shiwali.bsky.social
Shiwali Mohan
@shiwali.bsky.social
Founder | AI Scientist | Intelligent Agents & Multi-Agent Systems| Agent Frameworks & Architectures | Human-Agent Collaboration | Cognitive Science
#AI #ML evals measure accuracy on benchmarks, telling us how algorithms compare with each other. But, not much about how an #IntelligentSystem should be built. How do we make evals more informative? (1/8)

📖 Paper: arxiv.org/abs/2402.00234
🎥Talk: drive.google.com/file/d/1m79W...
March 24, 2025 at 7:28 PM
At #AAAI2025? Looking for #AI #ML research beyond #GenAI hype & doom? Excited about #AI running on your laptop? Listen to my colleague Wiktor Piotrowski talk about #OpenWorldLearning #OWL at 9:30 am on Feb 28th (Journal Track).

arxiv.org/abs/2306.06272
#AIPlanning #MBR #CognitiveSystems #KRR
February 26, 2025 at 8:14 PM
Reposted by Shiwali Mohan
Wait, are the AnthropicAI people seriously claiming to “unlock a rich theoretical landscape” for AI evaluation by proposing the use of…. error bars? And this secret trove of deep statistical insight starts with “use the Central Limit Theorem”?

Befuddling
November 27, 2024 at 10:09 PM
When your experiments show that your #AI is more human than humans, it is not that you have built #AGI or #SuperIntelligence, it is that you don't know how to evaluate, experiment, and measure.
November 27, 2024 at 9:28 PM