Shaily
banner
shaily99.bsky.social
Shaily
@shaily99.bsky.social
PhDing at LTI, CMU
Prev: Ai2, Google Research, MSR
Evaluating language technologies, regularly ranting, and probably procrastinating.
https://sites.google.com/view/shailybhatt/
Reposted by Shaily
We’re excited about Oolong as a challenging benchmark for information aggregation! Let us know which models we should benchmark next 👀

Paper: arxiv.org/abs/2511.02817
Dataset: huggingface.co/oolongbench
Code: github.com/abertsch72/o...
Leaderboard: oolongbench.github.io
Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities
As model context lengths continue to grow, concerns about whether models effectively use the full context length have persisted. While several carefully designed long-context evaluations have recently...
arxiv.org
November 7, 2025 at 5:07 PM
It cannot be defined, only experienced!
October 24, 2025 at 3:55 AM
I now have a notion page that i made when i did this ages ago and i blindly follow 2023 me and am grateful to her.
October 8, 2025 at 1:31 PM
You guys are writing at 9 AM !!!!!!!!!!!!!!!!!
September 29, 2025 at 9:57 PM
Reposted by Shaily
I've written really terrible paragraphs that have made me want to stop at 9AM in the morning.
September 26, 2025 at 2:08 PM
@danishpruthi.bsky.social
Kalika Bali at MSR (don't think she's on here)
September 21, 2025 at 12:39 AM