Ofir Press
ofirpress.bsky.social
Ofir Press
@ofirpress.bsky.social
I develop tough benchmarks for LMs and then I build agents to try and beat those benchmarks. Postdoc @ Princeton University.

https://ofir.io/about
I hope this trend continues into 2025.
Healthy competition & knowledge sharing through papers will drive even faster progress.

I can't wait for open source 40B models that get 40% on SWE-bench Lite and 6% on SciCode. Hardware won't be as much of a limiting factor as we thought.
December 18, 2024 at 8:10 PM
x.com
x.com
December 4, 2024 at 3:58 AM