+ buttonpusher, switchpuller, hugdeserver
+ 語彙力まずい
+ has a website aerophonic.neocities.org
a benchmark that poses impossible tasks to see if LLMs cheat
github.com/safety-resea...
a benchmark that poses impossible tasks to see if LLMs cheat
github.com/safety-resea...