Dialz: A Python Toolkit for Steering Vectors
ArXiv: arxiv.org/abs/2505.06262
Docs: cardiffnlp.github.io/dialz/
Repo: github.com/cardiffnlp/d...
A Python package to help you create, apply and visualise steering vectors for anything you want - from sycophancy to bias.
Dialz: A Python Toolkit for Steering Vectors
ArXiv: arxiv.org/abs/2505.06262
Docs: cardiffnlp.github.io/dialz/
Repo: github.com/cardiffnlp/d...
A Python package to help you create, apply and visualise steering vectors for anything you want - from sycophancy to bias.
Paper:
𝘛𝘦𝘭𝘭 𝘔𝘦 𝘞𝘩𝘢𝘵 𝘠𝘰𝘶 𝘒𝘯𝘰𝘸 𝘈𝘣𝘰𝘶𝘵 𝘚𝘦𝘹𝘪𝘴𝘮: 𝘌𝘹𝘱𝘦𝘳𝘵-𝘓𝘓𝘔 𝘐𝘯𝘵𝘦𝘳𝘢𝘤𝘵𝘪𝘰𝘯 𝘚𝘵𝘳𝘢𝘵𝘦𝘨𝘪𝘦𝘴 𝘢𝘯𝘥 𝘊𝘰-𝘊𝘳𝘦𝘢𝘵𝘦𝘥 𝘋𝘦𝘧𝘪𝘯𝘪𝘵𝘪𝘰𝘯𝘴 𝘧𝘰𝘳 𝘡𝘦𝘳𝘰-𝘚𝘩𝘰𝘵 𝘚𝘦𝘹𝘪𝘴𝘮 𝘋𝘦𝘵𝘦𝘤𝘵𝘪𝘰𝘯
w: @indiiigo.bsky.social, @matteo-mls.bsky.social y.social & @gabriellalapesa.bsky.social
@ Findings #NAACL2025 !🤩
Paper:
𝘛𝘦𝘭𝘭 𝘔𝘦 𝘞𝘩𝘢𝘵 𝘠𝘰𝘶 𝘒𝘯𝘰𝘸 𝘈𝘣𝘰𝘶𝘵 𝘚𝘦𝘹𝘪𝘴𝘮: 𝘌𝘹𝘱𝘦𝘳𝘵-𝘓𝘓𝘔 𝘐𝘯𝘵𝘦𝘳𝘢𝘤𝘵𝘪𝘰𝘯 𝘚𝘵𝘳𝘢𝘵𝘦𝘨𝘪𝘦𝘴 𝘢𝘯𝘥 𝘊𝘰-𝘊𝘳𝘦𝘢𝘵𝘦𝘥 𝘋𝘦𝘧𝘪𝘯𝘪𝘵𝘪𝘰𝘯𝘴 𝘧𝘰𝘳 𝘡𝘦𝘳𝘰-𝘚𝘩𝘰𝘵 𝘚𝘦𝘹𝘪𝘴𝘮 𝘋𝘦𝘵𝘦𝘤𝘵𝘪𝘰𝘯
w: @indiiigo.bsky.social, @matteo-mls.bsky.social y.social & @gabriellalapesa.bsky.social
@ Findings #NAACL2025 !🤩
#MSCA #HorizonEurope #DEMINE
#MSCA #HorizonEurope #DEMINE