Samidh
@samidh.bsky.social
Co-Founder at Zentropi (Trustworthy AI). Formerly Meta Civic Integrity Founder, Google X and Google Civic Innovation Lead, and Groq CPO.
This was a fun launch! It turns Zentropi into a Github for Content Labelers. You can share content policies with others and build off each other's work. It's the easiest way of deploying a fully customizable classifier. Check out the policies @dwillner.bsky.social created at zentropi.ai/u/dave
November 10, 2025 at 11:58 PM
This was a fun launch! It turns Zentropi into a Github for Content Labelers. You can share content policies with others and build off each other's work. It's the easiest way of deploying a fully customizable classifier. Check out the policies @dwillner.bsky.social created at zentropi.ai/u/dave
Reposted by Samidh
Content policies are usually private, one-off efforts. You build yours, I build mine, we don't share much about what works or why. This makes sense given products can (and should) set different policies based on their communities, but it leaves us reinventing the wheel. 🧵 1/5
November 10, 2025 at 8:10 PM
Content policies are usually private, one-off efforts. You build yours, I build mine, we don't share much about what works or why. This makes sense given products can (and should) set different policies based on their communities, but it leaves us reinventing the wheel. 🧵 1/5
Reposted by Samidh
We got really positive feedback on the TrustCon workshop we ran on writing good content policies for LLMs...so we're doing it again! If you're interested go sign up here, so we can start to figure out timing: forms.gle/tj7vf7ng8n7R...
Zentropi LLM Policy Writing Workshop Signup
By popular demand, we will be hosting a virtual version of our sold-out TrustCon workshop on how to write high quality content policies with and for LLMs.
In this session, you will learn best practic...
forms.gle
August 27, 2025 at 6:11 PM
We got really positive feedback on the TrustCon workshop we ran on writing good content policies for LLMs...so we're doing it again! If you're interested go sign up here, so we can start to figure out timing: forms.gle/tj7vf7ng8n7R...
This response to the Raine tragedy from OpenAI does something remarkable: it has the humility to acknowledge that a *product failure* led to real-world harm. Despite horrific circumstances, it has a rare degree of honesty that I wish tech companies would show more often. openai.com/index/helpin...
Helping people when they need it most
How we think about safety for users experiencing mental or emotional distress, the limits of today’s systems, and the work underway to refine them.
openai.com
August 27, 2025 at 7:19 AM
This response to the Raine tragedy from OpenAI does something remarkable: it has the humility to acknowledge that a *product failure* led to real-world harm. Despite horrific circumstances, it has a rare degree of honesty that I wish tech companies would show more often. openai.com/index/helpin...
We are opening up Zentropi.ai to everyone today so that anyone can build their own content labeler. What started as a crazy academic idea 2 years ago is now a real thing that companies are using in production to safeguard their AI-powered systems. Give it a shot! blog.zentropi.ai/zentropi-bui...
Zentropi: Build Your Own Content Labeler in Minutes, Not Months
We are officially opening up our build-your-own-content-labeler platform to everyone. Check it out at zentropi.ai.
blog.zentropi.ai
August 19, 2025 at 4:01 PM
We are opening up Zentropi.ai to everyone today so that anyone can build their own content labeler. What started as a crazy academic idea 2 years ago is now a real thing that companies are using in production to safeguard their AI-powered systems. Give it a shot! blog.zentropi.ai/zentropi-bui...
Don't take our word for it! Go kick the tires at zentropi.ai and build your own content labeler (no subscription required!)
July 31, 2025 at 9:43 PM
Don't take our word for it! Go kick the tires at zentropi.ai and build your own content labeler (no subscription required!)
@mmasnick.bsky.social I have a bluesky demo for you that you might want to see :)
Got to see a demo of this last week and definitely one of the most interesting new tools for the trust & safety field.
I have a dream that someday people would be able to use a tool like this to write their own labeler/policies for their individual accounts on something like Bluesky.
I have a dream that someday people would be able to use a tool like this to write their own labeler/policies for their individual accounts on something like Bluesky.
For 17 years working in trust and safety, I've watched talented people burn out on impossible tasks. The problem isn't the people, it's the systems. Traditional moderation requires months of retraining for every policy change. Only big companies can afford it, and even then it works poorly. 🧵 1/9
July 31, 2025 at 9:41 PM
@mmasnick.bsky.social I have a bluesky demo for you that you might want to see :)
Reposted by Samidh
Just tested this on a few that I know Reddit’s existing Hatred & Harassment automation has blindspots for; it built a focused, accurate labeler in under 10 minutes / a dozen examples, & the human readable criteria it built could be dropped into a training manual / erratum / used to build a regex
For 17 years working in trust and safety, I've watched talented people burn out on impossible tasks. The problem isn't the people, it's the systems. Traditional moderation requires months of retraining for every policy change. Only big companies can afford it, and even then it works poorly. 🧵 1/9
July 31, 2025 at 7:48 PM
Just tested this on a few that I know Reddit’s existing Hatred & Harassment automation has blindspots for; it built a focused, accurate labeler in under 10 minutes / a dozen examples, & the human readable criteria it built could be dropped into a training manual / erratum / used to build a regex
So excited for #TrustCon this week! We will be publicly unveiling Zentropi, a platform that helps people instantly build their own content labelers. We'll be opening it up for early access and open sourcing the underlying language model we trained for the task so that it is accessible to everyone.
July 21, 2025 at 1:13 AM
So excited for #TrustCon this week! We will be publicly unveiling Zentropi, a platform that helps people instantly build their own content labelers. We'll be opening it up for early access and open sourcing the underlying language model we trained for the task so that it is accessible to everyone.
I expect @dwillner.bsky.social to run around like a maniac again at #Trustcon this year as he shows off Zentropi -- our platform that makes it simple to build your own CoPE-powered content labeler.
A year ago at #TrustCon, I ran around like a maniac showing people something on my laptop. We'd just gotten CoPE - our policy interpretation model - working. It felt like a huge achievement, validating our ideas about LLM-powered labeling. 🧵 1/7
July 18, 2025 at 7:30 PM
I expect @dwillner.bsky.social to run around like a maniac again at #Trustcon this year as he shows off Zentropi -- our platform that makes it simple to build your own CoPE-powered content labeler.
Reposted by Samidh
A year ago at #TrustCon, I ran around like a maniac showing people something on my laptop. We'd just gotten CoPE - our policy interpretation model - working. It felt like a huge achievement, validating our ideas about LLM-powered labeling. 🧵 1/7
July 18, 2025 at 7:21 PM
A year ago at #TrustCon, I ran around like a maniac showing people something on my laptop. We'd just gotten CoPE - our policy interpretation model - working. It felt like a huge achievement, validating our ideas about LLM-powered labeling. 🧵 1/7
Reposted by Samidh
Take back your attention.
January 21, 2025 at 5:12 PM
Take back your attention.
The splinternet accelerates. If this stands, look for more countries in 2025 to ban Facebook, Instagram, YouTube, etc. out of fears of American surveillance. www.bloomberg.com/news/article...
Supreme Court Upholds Law That Threatens US TikTok Ban
The Supreme Court unanimously upheld a law that threatens to shut down the wildly popular TikTok social media platform in the US as soon as Sunday, ruling that free speech rights must yield to concern...
www.bloomberg.com
January 17, 2025 at 8:39 PM
The splinternet accelerates. If this stands, look for more countries in 2025 to ban Facebook, Instagram, YouTube, etc. out of fears of American surveillance. www.bloomberg.com/news/article...
If someone on my team at Meta had ever said these kinds of words, I'm pretty sure I'd have had an obligation to notify HR. But maybe times have changed and "masculine energy" is now part of the performance review rubric. www.bloomberg.com/news/article...
Zuckerberg Says Most Companies Need More ‘Masculine Energy’
Mark Zuckerberg lamented the rise of “culturally neutered” companies that have sought to distance themselves from “masculine energy,” adding that it’s good if a culture “celebrates the aggression a bi...
www.bloomberg.com
January 12, 2025 at 8:01 PM
If someone on my team at Meta had ever said these kinds of words, I'm pretty sure I'd have had an obligation to notify HR. But maybe times have changed and "masculine energy" is now part of the performance review rubric. www.bloomberg.com/news/article...
Will Meta's retreat from proactive/automated enforcement of its community standards lead to Instagram and Facebook becoming deeper cess pools? Well, it's complicated and the devil will be in the details. Let's look at the data and see what questions we should be asking... 🧵 1/12
January 10, 2025 at 6:37 PM
Will Meta's retreat from proactive/automated enforcement of its community standards lead to Instagram and Facebook becoming deeper cess pools? Well, it's complicated and the devil will be in the details. Let's look at the data and see what questions we should be asking... 🧵 1/12
Reposted by Samidh
Folks are mostly missing the forest for the trees re: the recent Meta announcement. Specifically, the focus on the move from fact checking to community notes. There two different changes that are a much bigger deal that have received less focus. 🧵 1/11
January 9, 2025 at 8:29 PM
Folks are mostly missing the forest for the trees re: the recent Meta announcement. Specifically, the focus on the move from fact checking to community notes. There two different changes that are a much bigger deal that have received less focus. 🧵 1/11
Here's my "fact check" on Meta's announcement that it is terminating its fact checking program. The TL;DR: Pay more attention to product changes than to political pandering. THREAD...
January 7, 2025 at 5:43 PM
Here's my "fact check" on Meta's announcement that it is terminating its fact checking program. The TL;DR: Pay more attention to product changes than to political pandering. THREAD...
Reposted by Samidh
@samidh.bsky.social and I have made a special-purpose language model for content classification that matches GPT-4's performance, but is much smaller/faster.
We've got a demo up on HuggingFace and weights are available to partners today - check out the link for more details, including how to help!
We've got a demo up on HuggingFace and weights are available to partners today - check out the link for more details, including how to help!
Introducing CoPE (the COntent Policy Evaluator)
Last year, Samidh Chakrabarti and I wrote a paper about how LLMs would, in the near future, fundamentally change content moderation - and about how existing frontier models were simply too expensive, ...
www.linkedin.com
December 19, 2024 at 9:17 PM
@samidh.bsky.social and I have made a special-purpose language model for content classification that matches GPT-4's performance, but is much smaller/faster.
We've got a demo up on HuggingFace and weights are available to partners today - check out the link for more details, including how to help!
We've got a demo up on HuggingFace and weights are available to partners today - check out the link for more details, including how to help!
To complement the article @dwillner.bsky.social and I wrote on using LLMs for content moderation, we created a Content Policy Compiler GPT to transform your content policy doc into one that's more accurately interpretable by LLMs. Very rough, but give it a shot!
ChatGPT - Content Policy Compiler
Transforms any content policy for accurate LLM interpretation
chat.openai.com
February 13, 2024 at 9:51 PM
To complement the article @dwillner.bsky.social and I wrote on using LLMs for content moderation, we created a Content Policy Compiler GPT to transform your content policy doc into one that's more accurately interpretable by LLMs. Very rough, but give it a shot!
Reposted by Samidh
Really excited to get this out the door - I think there's a huge amount of potential in this approach, but it's going to be a commensurately large amount of work to figure out. If you're doing similar experiments, please share what you've learned so we can all move faster together.
As part of our work at the Stanford Cyber Policy Center, @dwillner.bsky.social and I wrote up a practical guide on how to effectively use LLMs for content moderation. It shows both the promise and limitations of the current generation of LLMs for at scale trust and safety work. Hope it is helpful!
Using LLMs for Policy-Driven Content Classification | TechPolicy.Press
Dave Willner and Samidh Chakrabarti
www.techpolicy.press
January 29, 2024 at 5:38 PM
Really excited to get this out the door - I think there's a huge amount of potential in this approach, but it's going to be a commensurately large amount of work to figure out. If you're doing similar experiments, please share what you've learned so we can all move faster together.
As part of our work at the Stanford Cyber Policy Center, @dwillner.bsky.social and I wrote up a practical guide on how to effectively use LLMs for content moderation. It shows both the promise and limitations of the current generation of LLMs for at scale trust and safety work. Hope it is helpful!
Using LLMs for Policy-Driven Content Classification | TechPolicy.Press
Dave Willner and Samidh Chakrabarti
www.techpolicy.press
January 29, 2024 at 3:41 PM
As part of our work at the Stanford Cyber Policy Center, @dwillner.bsky.social and I wrote up a practical guide on how to effectively use LLMs for content moderation. It shows both the promise and limitations of the current generation of LLMs for at scale trust and safety work. Hope it is helpful!
In anticipation of the Trust & Safety Research conference this week, here is a provocative paper from Stanford on Embedding Societal Values into Social Media Algorithms. It provides a framework for more systematically shaping these systems for the better. tsjournal.org/index.php/jo...
View of Embedding Societal Values into Social Media Algorithms
| Journal of Online Trust a...
Journal of Online Trust and Safety
tsjournal.org
September 25, 2023 at 10:50 PM
In anticipation of the Trust & Safety Research conference this week, here is a provocative paper from Stanford on Embedding Societal Values into Social Media Algorithms. It provides a framework for more systematically shaping these systems for the better. tsjournal.org/index.php/jo...