evanshieh.bsky.social
@evanshieh.bsky.social
(4/n) Nearly three years ago, OpenAI described ChatGPT as a "low-key research preview". Until proven otherwise, continue to treat generative AI models accordingly.

(Thank you to organizers and reviewers at the AAAI Summer Symposium Series 2025 for supporting our work!)
August 5, 2025 at 12:13 AM
(3/n) This narrow focus overlooks the source of many AI harms reported to date: the models themselves. As generative AI expands in intimate consumer domains like therapy and education, we must demand access for independent researchers and public regulators to perform better real-world evaluations
August 5, 2025 at 12:12 AM
(2/n) So what do companies actually measure in technical reports? As the title of our paper suggests, self-audits focus more on nefarious actors than everyday consumers ("seeing red"), and they prefer consulting language models in situations where contextual human input is vital ("teaching parrots")
August 5, 2025 at 12:10 AM