Alon Jacoby
alon-j.bsky.social
Alon Jacoby
@alon-j.bsky.social
PhD student @ Penn
alonj.github.io
It's also a good reminder that even really impressive models can be surprisingly susceptible to very simple surface-level perturbations.
The original FlenQA paper here -
arxiv.org/abs/2402.14848
Same Task, More Tokens: the Impact of Input Length on the Reasoning Performance of Large Language Models
This paper explores the impact of extending input lengths on the capabilities of Large Language Models (LLMs). Despite LLMs advancements in recent times, their performance consistency across different...
arxiv.org
May 7, 2025 at 2:07 PM
Obviously, be sensible. If you're not willing to send your code to 3rd parties (OpenAI, Google, etc), don't use `-s` (or `--summary`). Everything else is done locally.
February 2, 2025 at 11:20 PM
If you specify '-s' when running the script, an LLM will summarize the diff (3 models implemented, but you can easily add more). If this is useful to you, because like me - you need a worse version of git - check out - github.com/alonj/pydift
or via
`pip install pydift`
GitHub - alonj/pydift
Contribute to alonj/pydift development by creating an account on GitHub.
github.com
February 2, 2025 at 11:20 PM
This is one of a few neat ideas in
@yulislavutsky.bsky.social 's work to learn robust representations in @neuripsconf.bsky.social '24. Definitely worth reading if you're also interested in robustness: neurips.cc/virtual/2024...
NeurIPS Poster Class Distribution Shifts in Zero-Shot Learning: Learning Robust RepresentationsNeurIPS 2024
neurips.cc
December 11, 2024 at 12:22 AM
Say we collected a multi-hop reasoning QA dataset. Inevitably, the samples will have some attributes that we didn't/can't control for (domain, length of text, difficulty, etc).
By taking small enough sub-samples, also inevitably, sometimes the minority attributes become majority.
December 11, 2024 at 12:22 AM
Yes, hi, hello.
December 4, 2024 at 7:26 PM