Lightnews — Scholar-powered news

Kshitish Ghate

@kghate.bsky.social

110 followers 190 following 21 posts

PhD student @ UWCSE; MLT @ CMU-LTI; Responsible AI
https://kshitishghate.github.io/

Posts Replies Media Videos

Kshitish Ghate

@kghate.bsky.social

Work done with amazing collaborators 🙏
@andyliu.bsky.social @devanshrjain.bsky.social @taylor-sorensen.bsky.social @atoosakz.bsky.social @aylincaliskan.bsky.social @monadiab77.bsky.social @maartensap.bsky.social

October 14, 2025 at 3:59 PM

Kshitish Ghate

@kghate.bsky.social

Finding 3: All RMs exhibit style-over-substance bias. In value-style conflict scenarios:
• Models choose style-aligned responses 57-73% of the time
• Persists even with explicit instructions to prioritize values
• Consistent across all model sizes and types

October 14, 2025 at 3:59 PM

Kshitish Ghate

@kghate.bsky.social

Finding 2: The RMs we tested generally show intrinsic value and style-biased preferences for:
• Secular over traditional values
• Self-expression over survival values
• Verbose, confident, and formal/cold language

October 14, 2025 at 3:59 PM

Kshitish Ghate

@kghate.bsky.social

Finding 1: Even the best RMs struggle to identify which profile aspects matter for a given prompt query. GPT-4.1-Mini and Gemini-2.5-Flash have ~75% accuracy with full user profile context, while having >99% in the Oracle setting (only relevant info provided).

October 14, 2025 at 3:59 PM

Kshitish Ghate

@kghate.bsky.social

🚨New paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences?
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵

October 14, 2025 at 3:59 PM

Kshitish Ghate

@kghate.bsky.social

📊 Bias and downstream performance are linked: We find that intrinsic biases are consistently correlated with downstream task performance on the VTAB+ benchmark (r ≈ 0.3–0.8). Improved performance in CLIP models comes at the cost of skewing stereotypes in particular directions.

April 29, 2025 at 7:11 PM

Kshitish Ghate

@kghate.bsky.social

📌 Data is key: We find that the choice of pre-training dataset is the strongest predictor of associations, over and above architectural variations, dataset size & number of model parameters.

April 29, 2025 at 7:11 PM

Kshitish Ghate

@kghate.bsky.social

Excited to announce our #NAACL2025 Oral paper! 🎉✨

We carried out the largest systematic study so far to map the links between upstream choices, intrinsic bias, and downstream zero-shot performance across 131 CLIP Vision-language encoders, 26 datasets, and 55 architectures!

April 29, 2025 at 7:11 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news