Lightnews — Scholar-powered news

Adhiraj Ghosh@ACL2025

@adhirajghosh.bsky.social

1.5K followers 420 following 78 posts

ELLIS PhD, University of Tübingen | Data-centric Vision and Language @bethgelab.bsky.social

Website: adhirajghosh.github.io
Twitter: https://x.com/adhiraj_ghosh98

Posts Replies Media Videos

Adhiraj Ghosh@ACL2025

@adhirajghosh.bsky.social

Added you!

January 27, 2025 at 11:38 AM

Adhiraj Ghosh@ACL2025

@adhirajghosh.bsky.social

I feel like my “following” and “popular with friends” feeds are well tuned as I have complete control over them. Just that people still are posting less on bsky and are more active on Twitter. Once that changes (and I think it will), we’ll have the same experience as it is on Twitter right now.

January 12, 2025 at 11:34 PM

Adhiraj Ghosh@ACL2025

@adhirajghosh.bsky.social

Added you!

December 11, 2024 at 11:55 PM

Adhiraj Ghosh@ACL2025

@adhirajghosh.bsky.social

Sure!

December 10, 2024 at 9:27 PM

Adhiraj Ghosh@ACL2025

@adhirajghosh.bsky.social

Welcome, stranger

December 10, 2024 at 9:25 PM

Adhiraj Ghosh@ACL2025

@adhirajghosh.bsky.social

This extremely ambitious project would not have been possible without @dziadzio.bsky.social @bayesiankitten.bsky.social @vishaalurao.bsky.social @samuelalbanie.bsky.social and Matthias Bethge!
Special thanks to everyone at @bethgelab.bsky.social, Bo Li, Yujie Lu and Palzer Lama for all your help!

December 10, 2024 at 5:52 PM

Adhiraj Ghosh@ACL2025

@adhirajghosh.bsky.social

In summary, we release ONEBench as a valuable tool for comprehensively evaluating foundation models and generating customised benchmarks, in the hopes of sparking a restructuring how benchmarking is done. We plan on publishing the code, benchmark and metadata for capability probing very soon.

December 10, 2024 at 5:51 PM

Adhiraj Ghosh@ACL2025

@adhirajghosh.bsky.social

Finally, we probe open-ended capabilities by defining a query pool to test, as proof-of-concept, and generating personalised model rankings. Expanding ONEBench can only improve reliability and scale of these queries and we’re excited to extend this framework.
More insights like these in the paper!

December 10, 2024 at 5:50 PM

Adhiraj Ghosh@ACL2025

@adhirajghosh.bsky.social

Let's look under the hood! ONEBench comprises ONEBench-LLM, and ONEBench-LMM: the largest pool of evaluation samples for foundation models(~50K for LLMs and ~600K for LMMs), spanning various domains and tasks. ONEBench will be continually expanded to accommodate more models and datasets.

December 10, 2024 at 5:49 PM

Adhiraj Ghosh@ACL2025

@adhirajghosh.bsky.social

We compare our Plackett-Luce implementation to ELO and ELO-distribution based ranking methods, not only showing superior correlation to the aggregated mean model scores for each test set but also extremely stable correlations to missing datapoints and missing measurements, even up to 95% sparsity!

December 10, 2024 at 5:49 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news