Lightnews — Scholar-powered news

Zara Siddique

@zarasiddique.bsky.social

150 followers 660 following 33 posts

Working on ethics and bias in NLP @CardiffNLP #NLP #NLProc

Posts Replies Media Videos

Zara Siddique

@zarasiddique.bsky.social

Did it.. work?!

October 23, 2025 at 7:12 PM

Zara Siddique

@zarasiddique.bsky.social

Shoutout to supervisors Liam Turner and Luis Espinosa-Anke and @cardiffnlp.bsky.social. I'm also interested in future collaborations on the topic so please message if you are interested :)

May 14, 2025 at 10:29 AM

Zara Siddique

@zarasiddique.bsky.social

I highly encourage people to play around, you can get started in just a few lines. Here's a Colab notebook:
tinyurl.com/yysmb45c
Note that the results from this Colab won't be the best because it's using a smaller model to reduce loading times. I would recommend using at least a 7B.

Dialz Tutorial - Zara Siddique - KnitTogether 2025.ipynb

Colab notebook

drive.google.com

May 14, 2025 at 10:29 AM

Zara Siddique

@zarasiddique.bsky.social

As part of our validation, we see if we can reduce stereotypicality in outputs from Mistral 7B, using GPT-4o as a judge. There is a notable reduction compared to baselines and prompting, which is cool.

May 14, 2025 at 10:29 AM

Zara Siddique

@zarasiddique.bsky.social

For those that are new to the topic, steering vectors are constructed using a set of paired sentences, where one elicits a 'positive' activation of neurons and the other elicits a 'negative' activation of neurons - by taking the difference, we isolate activations responsible for a certain 'concept'.

May 14, 2025 at 10:29 AM

Zara Siddique

@zarasiddique.bsky.social

Super interesting!

April 3, 2025 at 1:56 PM

Zara Siddique

@zarasiddique.bsky.social

I’d hire you

March 25, 2025 at 6:19 PM

Zara Siddique

@zarasiddique.bsky.social

Do it! When interviewers ask me about them it’s usually a good sign that it’s a nice workplace.

March 25, 2025 at 6:14 PM

Zara Siddique

@zarasiddique.bsky.social

The work presents the first systematic investigation of steering vectors for bias mitigation, and we demonstrate that SVE is a powerful and computationally efficient strategy for reducing bias in LLMs, with broader implications for enhancing AI safety.

March 13, 2025 at 11:44 AM

Zara Siddique

@zarasiddique.bsky.social

Building on these promising results, we introduce Steering Vector Ensembles (SVE), a method that averages multiple individually optimized steering vectors, each targeting a specific bias axis such as age, race, or gender.

March 13, 2025 at 11:44 AM

Zara Siddique

@zarasiddique.bsky.social

When optimized on the BBQ dataset, our individually tuned steering vectors achieve average improvements of 12.2%, 4.7%, and 3.2% over the baseline for Mistral, Llama, and Qwen, respectively.

March 13, 2025 at 11:44 AM

Zara Siddique

@zarasiddique.bsky.social

We present a novel approach to bias mitigation in large language models (LLMs) by applying steering vectors to modify model activations in forward passes. We employ Bayesian optimization to systematically identify effective contrastive pair datasets across nine bias axes.

March 13, 2025 at 11:44 AM

Zara Siddique

@zarasiddique.bsky.social

Super interesting work!

March 3, 2025 at 9:56 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news