Willie Agnew
willie-agnew.bsky.social
Willie Agnew
@willie-agnew.bsky.social
Queer in AI 🏳️‍🌈 | postdoc at cmu HCII | ostem |william-agnew.com | views my own | he/they
Thank you Adrienne!!
December 19, 2025 at 5:51 AM
Finally, we foundat that existing watermark detectors often fail, leading to the inclusion of many watermarked images.

This paper was led by an outstanding undergraduate, Chung Peng Lee! 4/4
December 16, 2025 at 12:02 PM
We looked at Datacomp, a popular text-image dataset, and found that hundreds of millions of images have associated copyright information, and 60% of the 50 top domains have terms of service disallowing scraping. 3/
December 16, 2025 at 12:02 PM
AI depends on massive web scraped datasets, and a common refrain is that people consented to their data being used in AI by putting it on public websites. In this paper we show this story is more complicated and that many web creators have indicated if and how they want their data to be used! 2/
December 16, 2025 at 12:02 PM
6. Chatbots should not be syncophantic, and they should not form social relationships with their users. These features of chatbots currently let AI developers prey on vulnerable users, causing addiction and delusions. 7/7
December 13, 2025 at 6:26 PM
5. AI products being used for mental health must be designated as such, and AI products that aren't designated as such should never provide anyting remotely resembling mental health services. This would work well in tandem with recent regulations passed in Nevada and Illinois. 6/
December 13, 2025 at 6:26 PM
4. Evaluations of AI products for mental health must have an independant, third party evaluator. Funding and rules must be made such that this evaluator is actually independant, and not dependant on AI developer contracts or staffed by people revolving dooring between regulators and AI developers 5/
December 13, 2025 at 6:26 PM
3. Deployers of LLMs must be required to report if and how people are using them for mental health needs, and information about harms and malfunctions resulting from such use. We are currently forced to just trust developers when they say they've fixed the problem 4/
December 13, 2025 at 6:26 PM
2. Deployers of LLMs that are being used for therapy must provide API endpoints to enable external auditing. Too many of these chatbots are buried in apps that make them challenging to benchmark. 3/
December 13, 2025 at 6:26 PM
1. We argue that there is an urgent need for benchmarks that incorporate human clinical expertise. AI benchmarks will never be comprehensive or a substitute for clinical trials, but they can be used to quickly flag problematic deployed models or as a safety check before clinical trials. 2/
December 13, 2025 at 6:26 PM