Dylan Hadfield-Menell
banner
dhadfieldmenell.bsky.social
Dylan Hadfield-Menell
@dhadfieldmenell.bsky.social
Assistant Prof of AI & Decision-Making @MIT EECS
I run the Algorithmic Alignment Group (https://algorithmicalignment.csail.mit.edu/) in CSAIL.

I work on value (mis)alignment in AI systems.

https://people.csail.mit.edu/dhm/
📢 Seeking PhD students for AI alignment research. Our lab investigates technical mechanisms for value learning, pre-training alignment, and regulatory frameworks. Come work with us if you want to bridge technical ML and legal/policy domains. Details in thread 🧵
December 2, 2024 at 2:39 PM
Genuine question for people who use Bluesky more frequently than I do. What are tips for getting things to work well without algorithmic recs? I spent a lot of time curating my recs on the other place and found it useful (mostly...). Any tools that let me do it here?
November 12, 2024 at 1:24 PM
I usually focus my platforms on my work. However, I did some writing to process some of my thoughts about the election and wanted to share them. I'm curious to hear anyone's thoughts and reactions.

tinyurl.com/dems-2024-ma...

🧵 The Democratic Party's Maginot Line (1/13)
[Shared] The Democratic Party's Maginot Line
The Democratic Party's Maginot Line ___ Dylan Hadfield-Menell November 8, 2024 In 1940, France faced Hitler's army with supreme confidence in the Maginot Line – a network of concrete fortifications, ...
tinyurl.com
November 8, 2024 at 2:34 PM
This is a really welcome development. This is the kind of action that we argued for in a policy brief on LLMs — the first goal of AI regulation has to be establishing a default where existing laws can not be dodged through automation.

www.ftc.gov/news-events/...

computing.mit.edu/ai-policy-br...
FTC Announces Crackdown on Deceptive AI Claims and Schemes
www.ftc.gov
September 26, 2024 at 2:13 PM
I’m doing some lecture prep for a course on AI & Society to cover interpretability, explanations, benchmarks, and evaluations.

What are your favorite papers in the space? Any suggestions for an advanced undergrad cohort?
September 21, 2024 at 6:25 PM
Reposted by Dylan Hadfield-Menell
My department (MIT Brain & Cognitive Sciences) is hiring a tenure-track faculty! We're especially interested in researchers who span multiple levels of analysis. Candidates from underrepresented backgrounds strongly encouraged to apply. Apply by November 1! academicjobsonline.org/ajo/jobs/25916
Massachusetts Institute of Technology, Department of Brain & Cognitive Sciences
Full service online faculty recruitment and application management system for academic institutions worldwide. We offer unique solutions tailored for academic communities.
academicjobsonline.org
October 20, 2023 at 12:30 AM
Reposted by Dylan Hadfield-Menell
Now published in Patterns, my paper on how to do metric design better. This is important everywhere - academics use simple metrics for tenure, governments often perform poorly using metrics for rules, and employees have targets that hurt their company.
Building less-flawed metrics: Understanding and creating better measurement and incentive systems
Design methods and consideration of desiderata for metrics have been proven useful when used, which is, at present, sporadically and inconsistently across a variety of fields. This perspective present...
www.cell.com
October 18, 2023 at 1:58 PM
Reposted by Dylan Hadfield-Menell
I especially enjoyed the part of this game where the CEO threatened to fire me because I banned someone and then I had to testify in front of congress. 10/10, fun experience, would recommend.
Good morning folks... today we're launching our new (free, browser-based) game Trust & Safety Tycoon. Please go check it out. Everyone thinks they know how trust & safety should work, but very few have actually done the job. Now's your chance! trustandsafety.fun
Trust & Safety Tycoon
Manage your team, set policies, make investments, and tackle the challenging world of Trust & Safety
trustandsafety.fun
October 17, 2023 at 2:11 PM
This looks like a great way to learn about the complexity involved in managing moderation
Good morning folks... today we're launching our new (free, browser-based) game Trust & Safety Tycoon. Please go check it out. Everyone thinks they know how trust & safety should work, but very few have actually done the job. Now's your chance! trustandsafety.fun
Trust & Safety Tycoon
Manage your team, set policies, make investments, and tackle the challenging world of Trust & Safety
trustandsafety.fun
October 17, 2023 at 3:48 PM
Reposted by Dylan Hadfield-Menell
Our lab has three paper talks at CSCW! But I want to highlight this one because @cqz.bsky.social is on the job market this year!! He works in crowdsourcing and human-AI systems. Make sure to check out his presentation on Wednesday. arxiv.org/abs/2305.01615
October 15, 2023 at 9:45 PM
Reposted by Dylan Hadfield-Menell
Ukrainian drone maker says their drones are autonomously making kill decisions. If this turns out to be true, it will be a turning point in war forever.

(Unfortunately this is behind a paywall so I cannot see the contents of the article)
www.newscientist.com/article/2397...
October 13, 2023 at 5:18 PM
Reposted by Dylan Hadfield-Menell
One of the reasons (and there are several) we see platforms keep making avoidable mistakes is that vanishingly little of the tech needed to do T&S work exists outside of big companies. We keep reinventing the same wheels.

Basically every platform has a bad usernames list. Why not open-source them?
July 13, 2023 at 3:23 PM
Reposted by Dylan Hadfield-Menell
In our paper studying creators' use of word filters against harassing comments, we find that a lot of creators wanted to build off of existing bad-word lists they trusted. Unfortunately, many popular lists like LDNOOBW have issues of bias. 1/n
https://arxiv.org/pdf/2202.08818.pdf
arxiv.org
July 13, 2023 at 3:57 PM
Reposted by Dylan Hadfield-Menell
Interesting tidbit from Meta staff at TrustCon just now: >90% of the CSAM Meta report to NCMEC is visually similar to content they’ve reported before.

The argument goes: The same bad content circulates again and again, so effective moderation requires you to get very good at similarity detection.
July 11, 2023 at 6:42 PM
Reposted by Dylan Hadfield-Menell
Bluesky is a public benefit corp with the mission “to develop and drive large-scale adoption of technologies for open and decentralized public conversation.”

The PBC status allows us to pursue our mission above profit, but we still need to make this open ecosystem sustainable.
July 5, 2023 at 9:11 PM
Reposted by Dylan Hadfield-Menell
We believe that a public commons is important for social media. These proposals for moderation and safety tooling have been in the works for a while, and we’re excited to share them for community discussion and feedback with you now.

https://blueskyweb.xyz/blog/6-23-2023-moderation-proposals
Moderation in a Public Commons
In this post, we share why we believe a public commons is important for social media, as well as some proposals for moderation and safety tooling.
blueskyweb.xyz
June 23, 2023 at 9:52 PM