Zara Siddique
zarasiddique.bsky.social
Zara Siddique
@zarasiddique.bsky.social
Working on ethics and bias in NLP @CardiffNLP #NLP #NLProc
Loved giving my second tutorial on steering vectors at #CardiffNLPWorkshop Lots of enthusiastic participants! @cardiffnlp.bsky.social
July 26, 2025 at 7:35 AM
#CardiffNLPWorkshop off to a flying start with talks from Jennifer Foster and Marianna Apidianaki @cardiffnlp.bsky.social
July 14, 2025 at 10:37 AM
As part of our validation, we see if we can reduce stereotypicality in outputs from Mistral 7B, using GPT-4o as a judge. There is a notable reduction compared to baselines and prompting, which is cool.
May 14, 2025 at 10:29 AM
For those that are new to the topic, steering vectors are constructed using a set of paired sentences, where one elicits a 'positive' activation of neurons and the other elicits a 'negative' activation of neurons - by taking the difference, we isolate activations responsible for a certain 'concept'.
May 14, 2025 at 10:29 AM
🚨 NEW PAPER ALERT 🚨

Dialz: A Python Toolkit for Steering Vectors

ArXiv: arxiv.org/abs/2505.06262
Docs: cardiffnlp.github.io/dialz/
Repo: github.com/cardiffnlp/d...

A Python package to help you create, apply and visualise steering vectors for anything you want - from sycophancy to bias.
May 14, 2025 at 10:29 AM
Building on these promising results, we introduce Steering Vector Ensembles (SVE), a method that averages multiple individually optimized steering vectors, each targeting a specific bias axis such as age, race, or gender.
March 13, 2025 at 11:44 AM
We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematically identify effective contrastive pair datasets across nine bias axes.
March 13, 2025 at 11:44 AM
What is “Latin”?
December 30, 2024 at 12:16 PM
Had fun presenting my favourite #EMNLP2024 papers today at our secret reading group + bonus raccoon pics from Miami 🦝

Will follow up with favourite papers in blog post form soon!
December 3, 2024 at 5:12 PM
Had an amazing time at #EMNLP2024 and excited to connect with other researchers on here :)
November 20, 2024 at 1:30 PM