Maciej Wołczyk
maciejwolczyk.bsky.social
Maciej Wołczyk
@maciejwolczyk.bsky.social
Research Scientist @ Google, alumni of Jagiellonian University and IDEAS NCBR.
Reposted by Maciej Wołczyk
Tired of saturated benchmarks? Want scope for a significant leap in capabilities?

🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games!

BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come.

1/🧵
November 21, 2024 at 4:24 PM
Reposted by Maciej Wołczyk
I have created a starter pack for the Polish ML community. Let's grow it together! Let me know if you know somebody who should be included.
go.bsky.app/TSkgFjK
November 21, 2024 at 8:13 AM