Joel Mire
joelmire.bsky.social
Joel Mire
@joelmire.bsky.social
Master’s student @ltiatcmu.bsky.social. he/him
This looks incredible! Thanks for sharing the syllabus!
June 4, 2025 at 5:23 PM
This was joint work with my co-author Zubin Aysola; collaborators @dchechel.bsky.social, Nick Deas, and @chryssazrv.bsky.social; and advisor @maartensap.bsky.social at @ltiatcmu.bsky.social @scsatcmu.bsky.social @columbiauniversity.bsky.social, and the @istecnico.bsky.social! (10/10)
March 6, 2025 at 7:49 PM
Our work builds on sociolinguistic and NLP research on AAL and recent translation methods. Check out the paper for details! We hope others extend this work, e.g., to investigate or mitigate reward model biases against more dialects. (9/10)
March 6, 2025 at 7:49 PM
These results point to representational and quality-of-service harms for AAL speakers. ⚠️They also highlight complex ethical questions about the desired behavior of LLMs concerning AAL. (8/10)
March 6, 2025 at 7:49 PM
Finally, we show that the reward models strongly incentivize steering conversations toward WME, even when prompted with AAL. 🗣️🔄 (7/10)
March 6, 2025 at 7:49 PM
Also, for most models, rewards are negatively correlated with the predicted AAL-ness of a text (based on a pre-existing dialect detection tool). (6/10)
March 6, 2025 at 7:49 PM
Next, we show that most reward models predict lower rewards for AAL texts ⬇️ (5/10)
March 6, 2025 at 7:49 PM
First, we see a significant drop in performance (-4% accuracy on average) in assigning higher rewards to human-preferred completions when processing AAL texts vs. WME texts. 📉 (4/10)
March 6, 2025 at 7:49 PM
We introduce morphosyntactic & phonological features of AAL into WME texts from the RewardBench dataset using validated automatic translation methods. Then, we test 17 reward models for implicit anti-AAL dialect biases. 📊 (3/10)
March 6, 2025 at 7:49 PM
We develop a framework for evaluating dialect biases in reward models and conduct a case study on biases against African American Language (AAL) relative to White Mainstream English (WME). 🔍 (2/10)
March 6, 2025 at 7:49 PM