Alexander Rubinstein
arubique.bsky.social
Alexander Rubinstein
@arubique.bsky.social
PhD student at the University of Tübingen and IMPRS-IS
Dive into the full paper here: arxiv.org/abs/2510.07959
Play with code: github.com/arubique/dis...

Big thanks to my collaborators: Benjamin Raible, Martin Gubri (@mgubri.bsky.social), and Seong Joon Oh (@coallaoh.bsky.social)!

🧵6/6
DISCO: Diversifying Sample Condensation for Efficient Model Evaluation
Evaluating modern machine learning models has become prohibitively expensive. Benchmarks such as LMMs-Eval and HELM demand thousands of GPU hours per model. Costly evaluation reduces inclusivity, slow...
arxiv.org
October 10, 2025 at 9:55 AM
DISCO achieves a superior efficiency–precision trade-off across various compression rates compared to baselines.

🧵5/6
October 10, 2025 at 9:53 AM
In addition to language, we apply DISCO to the vision domain, where it also allows for efficient model evaluation.

🧵4/6
October 10, 2025 at 9:53 AM
Model signature is an effective strategy for performance estimation. Condensing the dataset into the top-k diversifying samples (e.g., by Predictive Diversity Score (PDS)) allows DISCO to achieve the state of the art in test-set compression of popular language benchmarks.

🧵3/6
October 10, 2025 at 9:52 AM
First, we select a subset of an evaluation dataset with the most informative samples. Second, we predict the performance of unseen models from their signatures, i.e., outputs on the selected samples.

🧵2/6
October 10, 2025 at 9:52 AM
We aim at selecting a much smaller evaluation dataset than the original evaluation dataset, while keeping the estimated performances as close as possible.

🧵1/6
October 10, 2025 at 9:51 AM