lukaszkucinski.bsky.social
@lukaszkucinski.bsky.social
Reposted
How would you evaluate a new causal discovery method? A new paper by Brouillard et al. challenges the common approaches and suggests a rethink. Here’s what they found 🧵👇
arxiv.org/abs/2412.01953

#CausalSky
The Landscape of Causal Discovery Data: Grounding Causal Discovery in Real-World Applications
Causal discovery aims to automatically uncover causal relationships from data, a capability with significant potential across many scientific disciplines. However, its real-world applications remain l...
arxiv.org
December 5, 2024 at 3:26 PM
Reposted
Tired of saturated benchmarks? Want scope for a significant leap in capabilities?

🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games!

BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come.

1/🧵
November 21, 2024 at 4:24 PM