Dimos
antypasd.bsky.social
Dimos
@antypasd.bsky.social
Reposted by Dimos
NEW PAPER 📜

Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs

ArXiv: arxiv.org/abs/2503.05371
GitHub: github.com/groovychoons...
Extremely Unofficial Blog Post: zarasiddique.com/blog/shiftin...
Shifting Perspectives: Steering Vector Ensembles for Robust Bias Mitigation in LLMs
We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematic...
arxiv.org
March 13, 2025 at 11:44 AM