Ben Edelman
benedelman.bsky.social
Ben Edelman
@benedelman.bsky.social
Thinking about how/why AI works/doesn't, and how to make it go well for us.

Currently: AI Agent Security @ US AI Safety Institute

benjaminedelman.com
Update: We are extending the MOSS workshop deadline to May 26th 4:59pm PDT (11:59pm UTC)
What if there were a workshop dedicated to *small-scale*, *reproducible* experiments? What if this were at ICML 2025? What if your submission (due May 22nd) could literally be a Jupyter notebook?? Pretty excited this is happening. Spread the word! sites.google.com/view/moss202...
May 20, 2025 at 3:19 PM
What if there were a workshop dedicated to *small-scale*, *reproducible* experiments? What if this were at ICML 2025? What if your submission (due May 22nd) could literally be a Jupyter notebook?? Pretty excited this is happening. Spread the word! sites.google.com/view/moss202...
May 8, 2025 at 1:51 PM
1/ Excited to share a new blog post from the U.S. AI Safety Institute!

AI agents are becoming more capable, but they are vulnerable to prompt injections in external content – an agent may be given task A, but then be “hijacked” and perform malicious task B instead.

www.nist.gov/news-events/...
Technical Blog: Strengthening AI Agent Hijacking Evaluations
Large AI models are increasingly used to power agentic systems, or “agents,” which can automate complex tasks on behalf of users.
www.nist.gov
January 17, 2025 at 9:41 PM
For years, this mysterious undulating loop has lived at the top of my personal homepage.
December 8, 2024 at 11:04 PM
I defended my PhD dissertation back in May. I didn't have time to share it widely then (newborn baby), but I think some of you might enjoy it, especially the opening chapters: benjaminedelman.com/assets/disse...
December 2, 2024 at 12:21 AM
0/ I'd like to kick off my presence here with a question: why does learning work in practice? Why is the world such that we can we learn to predict things from other things in a computationally efficient way; why is "simplicity bias" empirically useful? Some explanations:
November 29, 2024 at 3:19 PM