Ori Press
oripress.bsky.social
Ori Press
@oripress.bsky.social
I yearn to deep learn
Graduate student at @bethgelab.bsky.social
oripress.com
Reposted by Ori Press
Do language models have algorithmic creativity?

To find out, we built AlgoTune, a benchmark challenging agents to optimize 100+ algorithms like gzip compression, AES encryption and PCA. Frontier models struggle, finding only surface-level wins. Lots of headroom here!
algotune.io
July 2, 2025 at 2:36 PM
Reposted by Ori Press
We are presenting CiteMe today at the 11AM poster session (East Exhibit Hall A-C, #3309)

CiteMe is a challenging benchmark for LM-based agents to find paper citations, moving beyond simple multiple-choice Q&A to real-world use cases.

Come by and say hi :)

citeme.ai
CiteME
CiteME is a benchmark designed to test the abilities of language models in finding papers that are cited in scientific texts.
citeme.ai
December 13, 2024 at 4:18 PM
Reposted by Ori Press
I'm on the academic job market!
I develop autonomous systems for: programming, research-level question answering, finding sec vulnerabilities & other useful+challenging tasks.
I do this by building frontier-pushing benchmarks and agents that do well on them.
See you at NeurIPS!
December 4, 2024 at 4:52 PM