Lightnews — Scholar-powered news

@dhadfieldmenell.bsky.social

📢 Seeking PhD students for AI alignment research. Our lab investigates technical mechanisms for value learning, pre-training alignment, and regulatory frameworks. Come work with us if you want to bridge technical ML and legal/policy domains. Details in thread 🧵

December 2, 2024 at 2:39 PM

Dylan Hadfield-Menell

@dhadfieldmenell.bsky.social

Genuine question for people who use Bluesky more frequently than I do. What are tips for getting things to work well without algorithmic recs? I spent a lot of time curating my recs on the other place and found it useful (mostly...). Any tools that let me do it here?

November 12, 2024 at 1:24 PM

Dylan Hadfield-Menell

@dhadfieldmenell.bsky.social

I usually focus my platforms on my work. However, I did some writing to process some of my thoughts about the election and wanted to share them. I'm curious to hear anyone's thoughts and reactions.

tinyurl.com/dems-2024-ma...

🧵 The Democratic Party's Maginot Line (1/13)

[Shared] The Democratic Party's Maginot Line

The Democratic Party's Maginot Line ___ Dylan Hadfield-Menell November 8, 2024 In 1940, France faced Hitler's army with supreme confidence in the Maginot Line – a network of concrete fortifications, ...

tinyurl.com

November 8, 2024 at 2:34 PM

Dylan Hadfield-Menell

@dhadfieldmenell.bsky.social

This is a really welcome development. This is the kind of action that we argued for in a policy brief on LLMs — the first goal of AI regulation has to be establishing a default where existing laws can not be dodged through automation.

www.ftc.gov/news-events/...

computing.mit.edu/ai-policy-br...

FTC Announces Crackdown on Deceptive AI Claims and Schemes

www.ftc.gov

September 26, 2024 at 2:13 PM

Dylan Hadfield-Menell

@dhadfieldmenell.bsky.social

I’m doing some lecture prep for a course on AI & Society to cover interpretability, explanations, benchmarks, and evaluations.

What are your favorite papers in the space? Any suggestions for an advanced undergrad cohort?

September 21, 2024 at 6:25 PM

Reposted by Dylan Hadfield-Menell

Roger Levy

@rplevy.bsky.social

My department (MIT Brain & Cognitive Sciences) is hiring a tenure-track faculty! We're especially interested in researchers who span multiple levels of analysis. Candidates from underrepresented backgrounds strongly encouraged to apply. Apply by November 1! academicjobsonline.org/ajo/jobs/25916

Massachusetts Institute of Technology, Department of Brain & Cognitive Sciences

Full service online faculty recruitment and application management system for academic institutions worldwide. We offer unique solutions tailored for academic communities.

academicjobsonline.org

October 20, 2023 at 12:30 AM

Reposted by Dylan Hadfield-Menell

David Manheim

@davidmanheim.alter.org.il

Now published in Patterns, my paper on how to do metric design better. This is important everywhere - academics use simple metrics for tenure, governments often perform poorly using metrics for rules, and employees have targets that hurt their company.

Building less-flawed metrics: Understanding and creating better measurement and incentive systems

Design methods and consideration of desiderata for metrics have been proven useful when used, which is, at present, sporadically and inconsistently across a variety of fields. This perspective present...

www.cell.com

October 18, 2023 at 1:58 PM

Reposted by Dylan Hadfield-Menell

Yoel Roth

@yoyoel.com

I especially enjoyed the part of this game where the CEO threatened to fire me because I banned someone and then I had to testify in front of congress. 10/10, fun experience, would recommend.

Mike Masnick @mmasnick.bsky.social · Oct 17

Good morning folks... today we're launching our new (free, browser-based) game Trust & Safety Tycoon. Please go check it out. Everyone thinks they know how trust & safety should work, but very few have actually done the job. Now's your chance! trustandsafety.fun

Trust & Safety Tycoon

Manage your team, set policies, make investments, and tackle the challenging world of Trust & Safety

trustandsafety.fun

October 17, 2023 at 2:11 PM

Dylan Hadfield-Menell

@dhadfieldmenell.bsky.social

This looks like a great way to learn about the complexity involved in managing moderation

Mike Masnick @mmasnick.bsky.social · Oct 17

Good morning folks... today we're launching our new (free, browser-based) game Trust & Safety Tycoon. Please go check it out. Everyone thinks they know how trust & safety should work, but very few have actually done the job. Now's your chance! trustandsafety.fun

Trust & Safety Tycoon

Manage your team, set policies, make investments, and tackle the challenging world of Trust & Safety

trustandsafety.fun

October 17, 2023 at 3:48 PM

Reposted by Dylan Hadfield-Menell

Amy Zhang

@axz.bsky.social

Our lab has three paper talks at CSCW! But I want to highlight this one because @cqz.bsky.social is on the job market this year!! He works in crowdsourcing and human-AI systems. Make sure to check out his presentation on Wednesday. arxiv.org/abs/2305.01615

October 15, 2023 at 9:45 PM

Reposted by Dylan Hadfield-Menell

Mark Riedl

@markriedl.bsky.social

Ukrainian drone maker says their drones are autonomously making kill decisions. If this turns out to be true, it will be a turning point in war forever.

(Unfortunately this is behind a paywall so I cannot see the contents of the article)
www.newscientist.com/article/2397...

October 13, 2023 at 5:18 PM

Reposted by Dylan Hadfield-Menell

Yoel Roth

@yoyoel.com

One of the reasons (and there are several) we see platforms keep making avoidable mistakes is that vanishingly little of the tech needed to do T&S work exists outside of big companies. We keep reinventing the same wheels.

Basically every platform has a bad usernames list. Why not open-source them?

July 13, 2023 at 3:23 PM

Reposted by Dylan Hadfield-Menell

Amy Zhang

@axz.bsky.social

In our paper studying creators' use of word filters against harassing comments, we find that a lot of creators wanted to build off of existing bad-word lists they trusted. Unfortunately, many popular lists like LDNOOBW have issues of bias. 1/n
https://arxiv.org/pdf/2202.08818.pdf

arxiv.org

July 13, 2023 at 3:57 PM

Reposted by Dylan Hadfield-Menell

Yoel Roth

@yoyoel.com

Interesting tidbit from Meta staff at TrustCon just now: >90% of the CSAM Meta report to NCMEC is visually similar to content they’ve reported before.

The argument goes: The same bad content circulates again and again, so effective moderation requires you to get very good at similarity detection.

July 11, 2023 at 6:42 PM

Reposted by Dylan Hadfield-Menell

Bluesky

@bsky.app

Bluesky is a public benefit corp with the mission “to develop and drive large-scale adoption of technologies for open and decentralized public conversation.”

The PBC status allows us to pursue our mission above profit, but we still need to make this open ecosystem sustainable.

July 5, 2023 at 9:11 PM

Reposted by Dylan Hadfield-Menell

Bluesky

@bsky.app

We believe that a public commons is important for social media. These proposals for moderation and safety tooling have been in the works for a while, and we’re excited to share them for community discussion and feedback with you now.

https://blueskyweb.xyz/blog/6-23-2023-moderation-proposals

Moderation in a Public Commons

In this post, we share why we believe a public commons is important for social media, as well as some proposals for moderation and safety tooling.

blueskyweb.xyz

June 23, 2023 at 9:52 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news