Lightnews — Scholar-powered news

Samidh

@samidh.bsky.social

This was a fun launch! It turns Zentropi into a Github for Content Labelers. You can share content policies with others and build off each other's work. It's the easiest way of deploying a fully customizable classifier. Check out the policies @dwillner.bsky.social created at zentropi.ai/u/dave

November 10, 2025 at 11:58 PM

Reposted by Samidh

Dave Willner

@dwillner.bsky.social

Content policies are usually private, one-off efforts. You build yours, I build mine, we don't share much about what works or why. This makes sense given products can (and should) set different policies based on their communities, but it leaves us reinventing the wheel. 🧵 1/5

November 10, 2025 at 8:10 PM

Reposted by Samidh

Dave Willner

@dwillner.bsky.social

We got really positive feedback on the TrustCon workshop we ran on writing good content policies for LLMs...so we're doing it again! If you're interested go sign up here, so we can start to figure out timing: forms.gle/tj7vf7ng8n7R...

Zentropi LLM Policy Writing Workshop Signup

forms.gle

August 27, 2025 at 6:11 PM

Samidh

@samidh.bsky.social

This response to the Raine tragedy from OpenAI does something remarkable: it has the humility to acknowledge that a *product failure* led to real-world harm. Despite horrific circumstances, it has a rare degree of honesty that I wish tech companies would show more often. openai.com/index/helpin...

Helping people when they need it most

How we think about safety for users experiencing mental or emotional distress, the limits of today’s systems, and the work underway to refine them.

openai.com

August 27, 2025 at 7:19 AM

Samidh

@samidh.bsky.social

We are opening up Zentropi.ai to everyone today so that anyone can build their own content labeler. What started as a crazy academic idea 2 years ago is now a real thing that companies are using in production to safeguard their AI-powered systems. Give it a shot! blog.zentropi.ai/zentropi-bui...

Zentropi: Build Your Own Content Labeler in Minutes, Not Months

We are officially opening up our build-your-own-content-labeler platform to everyone. Check it out at zentropi.ai.

blog.zentropi.ai

August 19, 2025 at 4:01 PM

Samidh

@samidh.bsky.social

Don't take our word for it! Go kick the tires at zentropi.ai and build your own content labeler (no subscription required!)

July 31, 2025 at 9:43 PM

Samidh

@samidh.bsky.social

@mmasnick.bsky.social I have a bluesky demo for you that you might want to see :)

Mike Masnick @mmasnick.bsky.social · Jul 31

Got to see a demo of this last week and definitely one of the most interesting new tools for the trust & safety field.

I have a dream that someday people would be able to use a tool like this to write their own labeler/policies for their individual accounts on something like Bluesky.

Dave Willner @dwillner.bsky.social · Jul 31

For 17 years working in trust and safety, I've watched talented people burn out on impossible tasks. The problem isn't the people, it's the systems. Traditional moderation requires months of retraining for every policy change. Only big companies can afford it, and even then it works poorly. 🧵 1/9

July 31, 2025 at 9:41 PM

Reposted by Samidh

Ms . Penny Oaken, SkyWitch 🧙‍♀️

@skywitches.net

Just tested this on a few that I know Reddit’s existing Hatred & Harassment automation has blindspots for; it built a focused, accurate labeler in under 10 minutes / a dozen examples, & the human readable criteria it built could be dropped into a training manual / erratum / used to build a regex

Dave Willner @dwillner.bsky.social · Jul 31

For 17 years working in trust and safety, I've watched talented people burn out on impossible tasks. The problem isn't the people, it's the systems. Traditional moderation requires months of retraining for every policy change. Only big companies can afford it, and even then it works poorly. 🧵 1/9

July 31, 2025 at 7:48 PM

Samidh

@samidh.bsky.social

So excited for #TrustCon this week! We will be publicly unveiling Zentropi, a platform that helps people instantly build their own content labelers. We'll be opening it up for early access and open sourcing the underlying language model we trained for the task so that it is accessible to everyone.

July 21, 2025 at 1:13 AM

Samidh

@samidh.bsky.social

I expect @dwillner.bsky.social to run around like a maniac again at #Trustcon this year as he shows off Zentropi -- our platform that makes it simple to build your own CoPE-powered content labeler.

Dave Willner @dwillner.bsky.social · Jul 18

A year ago at #TrustCon, I ran around like a maniac showing people something on my laptop. We'd just gotten CoPE - our policy interpretation model - working. It felt like a huge achievement, validating our ideas about LLM-powered labeling. 🧵 1/7

July 18, 2025 at 7:30 PM

Reposted by Samidh

Dave Willner

@dwillner.bsky.social

A year ago at #TrustCon, I ran around like a maniac showing people something on my laptop. We'd just gotten CoPE - our policy interpretation model - working. It felt like a huge achievement, validating our ideas about LLM-powered labeling. 🧵 1/7

July 18, 2025 at 7:21 PM

Reposted by Samidh

Hank Green

@hankgreen.bsky.social

Take back your attention.

January 21, 2025 at 5:12 PM

Samidh

@samidh.bsky.social

The splinternet accelerates. If this stands, look for more countries in 2025 to ban Facebook, Instagram, YouTube, etc. out of fears of American surveillance. www.bloomberg.com/news/article...

Supreme Court Upholds Law That Threatens US TikTok Ban

The Supreme Court unanimously upheld a law that threatens to shut down the wildly popular TikTok social media platform in the US as soon as Sunday, ruling that free speech rights must yield to concern...

www.bloomberg.com

January 17, 2025 at 8:39 PM

Samidh

@samidh.bsky.social

If someone on my team at Meta had ever said these kinds of words, I'm pretty sure I'd have had an obligation to notify HR. But maybe times have changed and "masculine energy" is now part of the performance review rubric. www.bloomberg.com/news/article...

Zuckerberg Says Most Companies Need More ‘Masculine Energy’

Mark Zuckerberg lamented the rise of “culturally neutered” companies that have sought to distance themselves from “masculine energy,” adding that it’s good if a culture “celebrates the aggression a bi...

www.bloomberg.com

January 12, 2025 at 8:01 PM

Samidh

@samidh.bsky.social

Will Meta's retreat from proactive/automated enforcement of its community standards lead to Instagram and Facebook becoming deeper cess pools? Well, it's complicated and the devil will be in the details. Let's look at the data and see what questions we should be asking... 🧵 1/12

January 10, 2025 at 6:37 PM

Reposted by Samidh

Dave Willner

@dwillner.bsky.social

Folks are mostly missing the forest for the trees re: the recent Meta announcement. Specifically, the focus on the move from fact checking to community notes. There two different changes that are a much bigger deal that have received less focus. 🧵 1/11

January 9, 2025 at 8:29 PM

Samidh

@samidh.bsky.social

Here's my "fact check" on Meta's announcement that it is terminating its fact checking program. The TL;DR: Pay more attention to product changes than to political pandering. THREAD...

January 7, 2025 at 5:43 PM

Reposted by Samidh

Dave Willner

@dwillner.bsky.social

@samidh.bsky.social and I have made a special-purpose language model for content classification that matches GPT-4's performance, but is much smaller/faster.

We've got a demo up on HuggingFace and weights are available to partners today - check out the link for more details, including how to help!

Introducing CoPE (the COntent Policy Evaluator)

Last year, Samidh Chakrabarti and I wrote a paper about how LLMs would, in the near future, fundamentally change content moderation - and about how existing frontier models were simply too expensive, ...

www.linkedin.com

December 19, 2024 at 9:17 PM

Samidh

@samidh.bsky.social

To complement the article @dwillner.bsky.social and I wrote on using LLMs for content moderation, we created a Content Policy Compiler GPT to transform your content policy doc into one that's more accurately interpretable by LLMs. Very rough, but give it a shot!

ChatGPT - Content Policy Compiler

Transforms any content policy for accurate LLM interpretation

chat.openai.com

February 13, 2024 at 9:51 PM

Reposted by Samidh

Dave Willner

@dwillner.bsky.social

Really excited to get this out the door - I think there's a huge amount of potential in this approach, but it's going to be a commensurately large amount of work to figure out. If you're doing similar experiments, please share what you've learned so we can all move faster together.

Samidh @samidh.bsky.social · Jan 29

As part of our work at the Stanford Cyber Policy Center, @dwillner.bsky.social and I wrote up a practical guide on how to effectively use LLMs for content moderation. It shows both the promise and limitations of the current generation of LLMs for at scale trust and safety work. Hope it is helpful!

Using LLMs for Policy-Driven Content Classification | TechPolicy.Press

Dave Willner and Samidh Chakrabarti

www.techpolicy.press

January 29, 2024 at 5:38 PM

Samidh

@samidh.bsky.social

As part of our work at the Stanford Cyber Policy Center, @dwillner.bsky.social and I wrote up a practical guide on how to effectively use LLMs for content moderation. It shows both the promise and limitations of the current generation of LLMs for at scale trust and safety work. Hope it is helpful!

Using LLMs for Policy-Driven Content Classification | TechPolicy.Press

Dave Willner and Samidh Chakrabarti

www.techpolicy.press

January 29, 2024 at 3:41 PM

Samidh

@samidh.bsky.social

In anticipation of the Trust & Safety Research conference this week, here is a provocative paper from Stanford on Embedding Societal Values into Social Media Algorithms. It provides a framework for more systematically shaping these systems for the better. tsjournal.org/index.php/jo...

View of Embedding Societal Values into Social Media Algorithms | Journal of Online Trust a...

Journal of Online Trust and Safety

tsjournal.org

September 25, 2023 at 10:50 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news