Lightnews — Scholar-powered news

Atharva Kulkarni

@athrvkk.bsky.social

660 followers 500 following 11 posts

CS PhD at USC | Prev - CMU, Apple, IIIT Delhi | Robust, Generalizable, and Trustworthy NLP
https://athrvkk.github.io/

Posts Replies Media Videos

Atharva Kulkarni

@athrvkk.bsky.social

✅What works better?
Unsurprisingly, GPT4-based evaluators show the highest reliability with humans across settings 🌟
Using ensembles of multiple metrics is a promising avenue⭐️
Instruction tuning & mode-seeking decoding help reduce hallucinations📈

(5/n)

April 30, 2025 at 6:54 PM

Atharva Kulkarni

@athrvkk.bsky.social

Our findings highlight:
⚠️Many existing metrics show poor alignment with human judgments
⚠️The inter-metric correlation is also weak
⚠️The show limited generalization across datasets, tasks, and models
⚠️They do not consistent improvement with larger models

(4/n)

April 30, 2025 at 6:54 PM

Atharva Kulkarni

@athrvkk.bsky.social

Hallucinations in LLMs are real—and so are the problems with how we measure them 📉

Our latest work questions the generalizability of hallucination detection metrics across tasks, datasets, model sizes, training methods, and decoding strategies 💥

arxiv.org/abs/2504.18114

(1/n)

April 30, 2025 at 6:54 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news