Sekh (Sk) Mainul Islam
sekh-copenlu.bsky.social
Sekh (Sk) Mainul Islam
@sekh-copenlu.bsky.social
PhD Fellow at the CopeNLU Group, University of Copenhagen; working on explainable automatic fact-checking . Prev: NYU Abu Dhabi, IIT Kharagpur.
https://mainuliitkgp.github.io/
💡How is the CoT mechanism aligned with the knowledge interaction subspace?
📊 CoT maintains similar CK alignment compared to standard prompting for all the datasets, and also reduces PK alignment.
November 6, 2025 at 3:02 AM
💡 Can we find reasons for hallucinations based on PK-CK interactions?
📊 The gap between PK and CK is much higher for the examples with hallucinated spans than for the examples with no hallucinated spans across the sequence steps.
November 6, 2025 at 3:02 AM
💡 How do individual PK and CK contributions change over the NLE generation steps for different knowledge interactions?
📊 During most of the NLE generations, the model slightly prioritizes PK.
November 6, 2025 at 3:02 AM
🪛 We propose a novel rank-2 projection subspace that disentangles PK and CK contributions more accurately and use it for the first multi-step analysis of knowledge interactions across longer NLE sequences.
November 6, 2025 at 3:02 AM
💡 Is a rank-1 projection subspace enough for disentangling PK and CK contributions in all types of knowledge interaction scenarios?
📊 Different knowledge interactions are poorly captured by the rank-1 projection subspace in LLM model parameter
November 6, 2025 at 3:02 AM
📊 Key Takeaways:
3️⃣ Real & Fictional Bias Mitigation: Reduces both real-world stereotypes (e.g., “Italians are reckless drivers”) and fictional associations (e.g., “citizens of a fictional country have blue skin”), making it useful for both safety and interpretability research.
August 15, 2025 at 10:07 AM
📊 Key Takeaways:
2️⃣ Strong Generalization: Works on unseen biases during token-based fine-tuning.
August 15, 2025 at 10:07 AM
📊 Key Takeaways:
1️⃣ Consistent Bias Elicitation: BiasGym reliably surfaces biases for mechanistic analysis, enabling targeted debiasing without hurting downstream performance.
August 15, 2025 at 10:07 AM
🚀 Excited to share our new preprint: BiasGym: Fantastic LLM Biases and How to Find (and Remove) Them

📄 Read the paper: arxiv.org/abs/2508.08855
August 15, 2025 at 10:07 AM