Alon Jacoby
alon-j.bsky.social
Alon Jacoby
@alon-j.bsky.social
PhD student @ Penn
alonj.github.io
The Phi 4 Reasoning technical report is a good reminder that current models still suffer massive performance degradation when reasoning tasks get longer - even at just 3K tokens!
They use FlenQA (w/ @moshlevy.bsky.social ) to show their model improves here massively.
arxiv.org/abs/2504.21318
May 7, 2025 at 2:07 PM
Sometimes I want to track small changes in code without too much hassle, so I made pydift: replace "python script.py" with "pydift script.py", and diffs from previous runs will be saved automatically.
February 2, 2025 at 11:19 PM