Emily Xiao
emilyxiao.bsky.social
Emily Xiao
@emilyxiao.bsky.social
Student @ CMU
Check out the paper and code for details! We hope DBSA pushes many-shot ICL closer to real-world viability. Happy to discuss!

arxiv.org/abs/2503.08640
github.com/millix19/dbsa

Thank you to my collaborators! Chin-Jou Li, Yilin Zhang,
@abertsch.bsky.social @gneubig.bsky.social
Efficient Many-Shot In-Context Learning with Dynamic Block-Sparse Attention
Many-shot in-context learning has recently shown promise as an alternative to finetuning, with the major advantage that the same model can be served for multiple tasks. However, this shifts the comput...
arxiv.org
March 18, 2025 at 3:49 PM
Some insights we found:
- preceding context + attention sink are both critical for making block-sparse attention work without additional training.
- grouping examples for encoding & retrieval also boosts performance vs. purely individual retrieval.

[5/n]
March 18, 2025 at 3:48 PM
Storage Cost?
Yes, caching thousands of examples can be large. However, it’s also easy to re-compute if needed—unlike fine-tuned parameters, which also requires substantial storage space for a large number of tasks and are often stored indefinitely.

[4/n]
March 18, 2025 at 3:46 PM
Results:
We evaluate DBSA with Llama models, and up to 90k context length. DBSA achieves comparable per request latency to fine-tuning while maintaining on average >95% of the best accuracy.

[3/n]
March 18, 2025 at 3:45 PM
Method:
- DBSA pre-encodes the many-shot examples with streaming block-sparse attention, allowing constant encoding time for new demos.
- During inference, it dynamically selects relevant KV chunks for each test query, given any retrieval method.

[2/n]
March 18, 2025 at 3:44 PM