Eduardo Pignatelli
epignatelli.com
Eduardo Pignatelli
@epignatelli.com
Assistant professor (UK Lecturer) at @UCL. PhD at @UCL. Past architect. Previously ML Lead at @burohappold. RL, credit assignment, reward-genesis.
Great to see BALROG on @bsky.app as well!
Tired of saturated benchmarks? Want scope for a significant leap in capabilities?

🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games!

BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come.

1/🧵
November 25, 2024 at 3:00 PM
Reposted by Eduardo Pignatelli
Tired of saturated benchmarks? Want scope for a significant leap in capabilities?

🔥 Introducing BALROG: a Benchmark for Agentic LLM and VLM Reasoning On Games!

BALROG is a challenging benchmark for LLM agentic capabilities, designed to stay relevant for years to come.

1/🧵
November 21, 2024 at 4:24 PM