Amanda Bertsch
abertsch.bsky.social
Amanda Bertsch
@abertsch.bsky.social
PhD student @ CMU LTI. working on text generation + long context

https://www.cs.cmu.edu/~abertsch/
ooh, interesting! would the best xLSTM model to try be the xLSTM Large 7B ?
November 10, 2025 at 3:57 PM
Thank you so much!
November 8, 2025 at 12:10 PM
We’re excited about Oolong as a challenging benchmark for information aggregation! Let us know which models we should benchmark next 👀

Paper: arxiv.org/abs/2511.02817
Dataset: huggingface.co/oolongbench
Code: github.com/abertsch72/o...
Leaderboard: oolongbench.github.io
Oolong: Evaluating Long Context Reasoning and Aggregation Capabilities
As model context lengths continue to grow, concerns about whether models effectively use the full context length have persisted. While several carefully designed long-context evaluations have recently...
arxiv.org
November 7, 2025 at 5:07 PM
While long-context models can do many retrieval tasks impressively well, they have a long way to go to solve realistic information synthesis problems!

Oolong is joint work with Adithya Pratapa, Teruko Mitamura, @gneubig.bsky.social , and Matt Gormley.
November 7, 2025 at 5:07 PM
Models show varying error patterns. Claude and some GPT-family models underperform on tasks that require outputting dates; Gemini and Deepseek-R1 frequently over-reason and fail to return an answer at all on Oolong-synth, although Gemini is the best model on Oolong-real.
November 7, 2025 at 5:07 PM
Why is this so hard? Models must identify relevant sections of input, label or categorize these sections, and then accumulate information to make distributional-level decisions. Adding labels in-context or specifying more reasoning effort has limited benefit.
November 7, 2025 at 5:07 PM
Oolong has a synthetic setting that poses distributional questions over sets of classification examples and their metadata and a realistic setting using conversational data from game transcripts. Both splits require counting, temporal reasoning, and multi-step entity resolution.
November 7, 2025 at 5:07 PM
We'll be posting course content for anyone who would like to follow along!

The first four lecture videos are available now: youtube.com/playlist?lis...
September 12, 2025 at 5:14 PM
we also have a followup work, and ‪@emilyxiao.bsky.social will also be around the conference to discuss! bsky.app/profile/emil...
Many-shot ICL (thousands of examples+) can match fine-tuning on many tasks, but its high inference cost makes deployment impractical.

We introduce DBSA, a training-free framework that achieves the best efficiency even under high request volumes, while maintaining strong accuracy 🧵
April 30, 2025 at 12:06 AM
our paper
(arxiv.org/abs/2405.00200) studies properties + tradeoffs of using long-context models for ICL, and we're very excited that it won the Language Modeling SAC award this year!
In-Context Learning with Long-Context Models: An In-Depth Exploration
As model context lengths continue to increase, the number of demonstrations that can be provided in-context approaches the size of entire training datasets. We study the behavior of in-context learnin...
arxiv.org
April 30, 2025 at 12:05 AM
I think @siree.sh was also looking at this! No marker of arxiv category in the url, unfortunately :/
November 25, 2024 at 2:18 PM
and just realized this post is a full two weeks old but! bsky showed it to me now 🥲
November 25, 2024 at 7:17 AM