Kshitish Ghate
kghate.bsky.social
Kshitish Ghate
@kghate.bsky.social
PhD student @ UWCSE; MLT @ CMU-LTI; Responsible AI
https://kshitishghate.github.io/
Finding 3: All RMs exhibit style-over-substance bias. In value-style conflict scenarios:
• Models choose style-aligned responses 57-73% of the time
• Persists even with explicit instructions to prioritize values
• Consistent across all model sizes and types
October 14, 2025 at 3:59 PM
Finding 2: The RMs we tested generally show intrinsic value and style-biased preferences for:
• Secular over traditional values
• Self-expression over survival values
• Verbose, confident, and formal/cold language
October 14, 2025 at 3:59 PM
Finding 1: Even the best RMs struggle to identify which profile aspects matter for a given prompt query. GPT-4.1-Mini and Gemini-2.5-Flash have ~75% accuracy with full user profile context, while having >99% in the Oracle setting (only relevant info provided).
October 14, 2025 at 3:59 PM
🚨New paper: Reward Models (RMs) are used to align LLMs, but can they be steered toward user-specific value/style preferences?
With EVALUESTEER, we find even the best RMs we tested exhibit their own value/style biases, and are unable to align with a user >25% of the time. 🧵
October 14, 2025 at 3:59 PM
📊 Bias and downstream performance are linked: We find that intrinsic biases are consistently correlated with downstream task performance on the VTAB+ benchmark (r ≈ 0.3–0.8). Improved performance in CLIP models comes at the cost of skewing stereotypes in particular directions.
April 29, 2025 at 7:11 PM
📌 Data is key: We find that the choice of pre-training dataset is the strongest predictor of associations, over and above architectural variations, dataset size & number of model parameters.
April 29, 2025 at 7:11 PM
Excited to announce our #NAACL2025 Oral paper! 🎉✨

We carried out the largest systematic study so far to map the links between upstream choices, intrinsic bias, and downstream zero-shot performance across 131 CLIP Vision-language encoders, 26 datasets, and 55 architectures!
April 29, 2025 at 7:11 PM