Lightnews — Scholar-powered news

Shiwali Mohan

@shiwali.bsky.social

Posts Replies Media Videos

Shiwali Mohan

@shiwali.bsky.social

#AI #ML evals measure accuracy on benchmarks, telling us how algorithms compare with each other. But, not much about how an #IntelligentSystem should be built. How do we make evals more informative? (1/8)

📖 Paper: arxiv.org/abs/2402.00234
🎥Talk: drive.google.com/file/d/1m79W...

March 24, 2025 at 7:28 PM

Shiwali Mohan

@shiwali.bsky.social

At #AAAI2025? Looking for #AI #ML research beyond #GenAI hype & doom? Excited about #AI running on your laptop? Listen to my colleague Wiktor Piotrowski talk about #OpenWorldLearning #OWL at 9:30 am on Feb 28th (Journal Track).

arxiv.org/abs/2306.06272
#AIPlanning #MBR #CognitiveSystems #KRR

An architecture showing how a planning agent can be extended with metareasoning.

February 26, 2025 at 8:14 PM

Reposted by Shiwali Mohan

Jessica Hullman

@jessicahullman.bsky.social

Wait, are the AnthropicAI people seriously claiming to “unlock a rich theoretical landscape” for AI evaluation by proposing the use of…. error bars? And this secret trove of deep statistical insight starts with “use the Central Limit Theorem”?

Befuddling

November 27, 2024 at 10:09 PM

Shiwali Mohan

@shiwali.bsky.social

When your experiments show that your #AI is more human than humans, it is not that you have built #AGI or #SuperIntelligence, it is that you don't know how to evaluate, experiment, and measure.

November 27, 2024 at 9:28 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news